Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import span to speed up #68

Open
wants to merge 1 commit into
base: cppflow2
Choose a base branch
from
Open

Conversation

ivanallen
Copy link

Data copy is bad for performance. I import span(c++20) for speeding up data access.

@ljn917
Copy link
Contributor

ljn917 commented Nov 4, 2020

https://en.cppreference.com/w/cpp/language/extending_std
It is undefined behavior to add declarations or definitions to namespace std or to any namespace nested within std, with a few exceptions noted below

I don't think defining std::span satisfies any of the exceptions. (Please point it out if I am wrong here.) It may cause name conflict with some compilers.

With that being said, I do agree to avoid data copy to improve performance. One way is to define span in another namespace, e.g. cppflow::span and uses std::span (typedef) when possible. The other way would be defining access functions on cppflow::tensor, e.g. operator[] and iterators.

@ivanallen
Copy link
Author

Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20.

@ljn917
Copy link
Contributor

ljn917 commented Nov 4, 2020 via email

@ivanallen
Copy link
Author

https://en.cppreference.com/w/cpp/utility/feature_test https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html Maybe check __cpp_lib_span, and use std::span if it is available, otherwise use the self defined version.

On Wed, Nov 4, 2020, 00:33 Allen @.***> wrote: Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHSCRT3UCRFAXPJQ7D3SODRSRANCNFSM4TJQ7XVA .

Okay, I have updated my code to support c++20 feature :)


#set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-omit-frame-pointer -fsanitize=thread")
#set(CMAKE_LINKER_FLAGS "${CMAKE_LINKER_FLAGS} -fno-omit-frame-pointer -fsanitize=thread")

add_executable(example main.cpp)
target_include_directories(example PRIVATE ../../include $ENV{HOME}/libtensorflow2/include)
target_link_libraries(example "${TENSORFLOW_LIB}" Threads::Threads)
target_link_libraries (example "${TENSORFLOW_LIB}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep Threads::Threads. This does not compile on LInux because pthread is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert all changes to this file? It makes this commit cleaner and focuses on the span header. You can create separate testing files to show it works with and without C++20.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_link_libraries (example "${TENSORFLOW_LIB}") is required because it reports link error on my macos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your cmake version? Threads::Threads was introduced in cmake 3.1, and cmake 3.10 is requested here. This specific test case is related to threading, so Threads::Threads has to be linked.

I strongly recommend creating a separate test case for the span class and do not touch unrelated test case.

} // end namespace std

#ifdef __has_cpp_attribute
# if __has_cpp_attribute(__cpp_lib_span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work. According to https://en.cppreference.com/w/cpp/feature_test, "Library features" section:
The following macros are defined if the header <version> or any of the corresponding headers in the table below is included.
which means __cpp_lib_span is only defined if <span> or <version> is included. We need to test __has_include(<span>) and include it before testing __cpp_lib_span. There is example code in the bottom of the previous link.


#set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-omit-frame-pointer -fsanitize=thread")
#set(CMAKE_LINKER_FLAGS "${CMAKE_LINKER_FLAGS} -fno-omit-frame-pointer -fsanitize=thread")

add_executable(example main.cpp)
target_include_directories(example PRIVATE ../../include $ENV{HOME}/libtensorflow2/include)
target_link_libraries(example "${TENSORFLOW_LIB}" Threads::Threads)
target_link_libraries (example "${TENSORFLOW_LIB}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert all changes to this file? It makes this commit cleaner and focuses on the span header. You can create separate testing files to show it works with and without C++20.

@ivanallen ivanallen force-pushed the cppflow2 branch 3 times, most recently from ee71c98 to 1bafe0f Compare November 5, 2020 07:02
@ivanallen
Copy link
Author

Add test for span in c++17 and c++20 and test passed.

@@ -0,0 +1,25 @@
#include <thread>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can remove <thread>

@ivanallen
Copy link
Author

You're really serious. LOL. I got an `std::runtime_error' undefined when i remove . Fixed it by the way.

@@ -205,7 +206,7 @@ namespace cppflow {

// Convert to correct type
const auto T_data = static_cast<T*>(raw_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested it yet, but res_tensor is a temporary variable. Returning a data pointer from a temporary variable is alarming for me. Can we use this->tf_tensor to hold the resolved concrete tensor? Does it deallocate the original tf_tensor and corrupt the data if this->tfe_handle is created from it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right! It indeed confuses people. Actually, res_tensor and this->tf_tensor hold the same buffer, but sometimes res_tensor uses a new buffer.

I think we should use the raw data of this->tf_tensor but not res_tensor.

@ljn917
Copy link
Contributor

ljn917 commented Nov 5, 2020

You're really serious. LOL. I got an `std::runtime_error' undefined when i remove . Fixed it by the way.

LOL. Thanks for the fix BTW. I am really looking forward to using this lib in my code, so I just want to make it pretty and clean. Thanks for bearing with me. We still need to look at the behavior of res_tensor though...

@ljn917
Copy link
Contributor

ljn917 commented Nov 5, 2020 via email

@ivanallen
Copy link
Author

I guess I need to go through the source code to see how TF manages memory internally. My first speculation is that TF_Tensor is RefCounted, so we should be safe to delete it after obtaining an eager handle. (pure speculation)

On Thu, Nov 5, 2020, 2:52 AM Allen @.> wrote: @.* commented on this pull request. ------------------------------ In include/cppflow/tensor.h <#68 (comment)>: > @@ -205,7 +206,7 @@ namespace cppflow { // Convert to correct type const auto T_data = static_cast<T*>(raw_data); Yeah, you are right! It indeed confuses people. Actually, res_tensor and this->tf_tensor hold the same buffer, but sometimes res_tensor uses a new buffer. I think we should use the raw data of this->tf_tensor but not res_tensor. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHXJAUCQZ6CGTWNBJ4DSOJKSHANCNFSM4TJQ7XVA .

I looked tensorflow source:

TF_Tensor* TF_TensorFromTensor(const tensorflow::Tensor& src,
                               TF_Status* status) {
  if (src.dtype() == DT_RESOURCE) {
    // ...
    // This branch is unsafe.
    const string str = src.scalar<ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }
  if (src.dtype() != DT_STRING) {
    // This branch is safe.
    TensorBuffer* buf = TensorCApi::Buffer(src);
    buf->Ref();
    return new TF_Tensor{static_cast<TF_DataType>(src.dtype()), src.shape(),
                         buf};
  }

So the result is up to dtype.

@ivanallen
Copy link
Author

if (src.dtype() == DT_RESOURCE) {
    // ...
    // This branch is unsafe.
    const string str = src.scalar<ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }

Maybe it leads to a memory leak.

@ljn917
Copy link
Contributor

ljn917 commented Nov 5, 2020

if (src.dtype() == DT_RESOURCE) {
    // ...
    // This branch is unsafe.
    const string str = src.scalar<ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }

Maybe it leads to a memory leak.

Yes, I agree res_tensor leaks in our original code. Thanks for spotting it!

@ljn917
Copy link
Contributor

ljn917 commented Nov 5, 2020

@ivanallen @serizba

I think it is safe to store the resolved TF_Tensor* res_tensor into this->tf_tensor in tensor::get_data(). So we can avoid the memory leak, and also make sure res_tensor won't be deleted while we are accessing the returned data through span.

What happens under the hood is the following.
(1) When we call TFE_NewTensorHandle, a new copy of the source tensor is created and moved (std::move) into a new TensorHandle. This makes sure the source tensor can be deleted after a TensorHandle is obtained.
Edit: The underlying data of a tensorflow::Tensor is managed by a reference counted buffer. Copying the tensor here simply copies the buffer pointer, so we may see the same address with TFE_TensorHandleResolve.

(2) When we call TFE_TensorHandleResolve, the underlying TensorHandle::Resolve() is invoked. Depending on the context, a new concrete tensor may be copied from a device or a local copy/mirror is returned. We can just store this tensor into this->tf_tensor. In the original code, we definitely need to delete res_tensor at the end of tensor::get_data() to avoid memory leak.

@ivanallen For TF_TensorFromTensor, I think the dtype() dependent behavior is due to their different data representations, please see here. I don't know DT_RESOURCE, but DT_STRING may contain pointers to heap allocated buffer. Other data type should be just plain memory buffer.

@ivanallen
Copy link
Author

I use AddressSanitizer detected some memory leak and fixed it. The method get_data really has a memory leak. But we can directly use this->tf_tensor obtaining raw data. It should be safe and no memory leak.

@ljn917
Copy link
Contributor

ljn917 commented Nov 6, 2020

@ivanallen I noticed the leak messages with gcc's sanitizer before but did not get the time to trace it down. I think the bug fix should be in a separate PR for easier testing.

We cannot get data from this->tf_tensor, and calling TFE_TensorHandleResolve is a must. After TFE_TensorHandle is obtained, there is no guarantee that this->tf_tensor and this->tfe_handle share the same memory, e.g. the data can be placed on GPU and a new fetch is needed after some computations. All the eager ops are using this->tfe_handle. Please refer to my previous comment for more details. In addition, some of the constructor of tensor class (tensor::tensor(TFE_TensorHandle* handle)) does not even set a valid this->tf_tensor

What I meant was to change
auto res_tensor = TFE_TensorHandleResolve(this->tfe_handle.get(), context::get_status());
to
this->tf_tensor = TFE_TensorHandleResolve(this->tfe_handle.get(), context::get_status());
There should be no data corruption/leaking issue and all lifetime is managed correctly. However, we still need more testing on this.

@ivanallen
Copy link
Author

You know, the get_data method's type is const-qualified, so modify this->tf_tensor seems unreasonable unless you redesign the method sematic to non-const or letthis->tf_tensor mutable.

@ivanallen ivanallen mentioned this pull request Nov 7, 2020
@ljn917
Copy link
Contributor

ljn917 commented Nov 7, 2020 via email

@ljn917
Copy link
Contributor

ljn917 commented Nov 26, 2020

@ivanallen Why is this closed? I think it is still worth discussing at least.

@serizba I would also like to know your opinion on this before I started to test it.

@ivanallen
Copy link
Author

@ivanallen Why is this closed? I think it is still worth discussing at least.

@serizba I would also like to know your opinion on this before I started to test it.

sorry, I just update cppflow2 in my github. I will reopen this PR.

@ivanallen ivanallen reopened this Nov 26, 2020
@serizba
Copy link
Owner

serizba commented Nov 27, 2020

@ivanallen @ljn917

Sorry for not attending this PR before.

Regarding std::span sounds like a good idea, but what about not c++20 compilers?
For the res_tensor thing, I also think using a new res_tensor was a bad idea from my side, using this->tf_tensor seems more reasonable.

The PR is interesting, so no need for closing it :)

@ljn917
Copy link
Contributor

ljn917 commented Nov 28, 2020

@serizba

Regarding std::span sounds like a good idea, but what about not c++20 compilers?

We are adding a standard compatible span.h for pre-C++20 compilers.

For the res_tensor thing, I also think using a new res_tensor was a bad idea from my side, using this->tf_tensor seems more reasonable.

We were discussing converting this->tf_tensor to a private member which would serve as a local cache of the resolved tensor. Perhaps, we also need to wrap TFE_TensorHandleResolve in a get method.

@ljn917
Copy link
Contributor

ljn917 commented Dec 1, 2020

@ivanallen Finally, we are ready to work on this PR :)

First, could you clean up this PR a bit? Include the following:

  1. Rebase onto the updated cppflow2 branch. This should address the change in include/cppflow/context.h and include/cppflow/tensor.h
  2. Remove changes to the following files as they are not relevant:
    examples/eager_op_multithread/CMakeLists.txt
    include/cppflow/model.h

After that, I may push some tests to your branch.

@ivanallen
Copy link
Author

@ivanallen Finally, we are ready to work on this PR :)

First, could you clean up this PR a bit? Include the following:

  1. Rebase onto the updated cppflow2 branch. This should address the change in include/cppflow/context.h and include/cppflow/tensor.h
  2. Remove changes to the following files as they are not relevant:
    examples/eager_op_multithread/CMakeLists.txt
    include/cppflow/model.h

After that, I may push some tests to your branch.

Great! I have rebased my code.

@ljn917
Copy link
Contributor

ljn917 commented Dec 4, 2020

@ivanallen Thanks a lot, but I think we have more changes to make...

  1. The point (2) above has not been addressed, i.e. remove changes to the following files as they are not relevant:
    examples/eager_op_multithread/CMakeLists.txt
    include/cppflow/model.h
    Those changes can still be seen at https://github.com/serizba/cppflow/pull/68/files

  2. Make a directory named third_party in the include directory, and place the orignial Tristan Brindle's span implementation into this directory, i.e. the unmodified version. In this way, we can separate the third party libs and update them easier.

  3. Create a header named compat_span.h in the include directory, and put our switch logic here. Macros, e.g. TCB_SPAN_NAMESPACE_NAME should also be defined here.

  4. Define TCB_SPAN_NAMESPACE_NAME as cppflow::tcb to avoid namespace pollution, e.g. TCB_SPAN_NAMESPACE_NAME::byte.

  5. As we discussed before, __cpp_lib_span is only defined in <version> or <span> header. We need to use __has_include feature test macro.

  6. Use cppflow::span<const T> as the return type of tensor::get_data() function.

I believe you can also mark all the previous comments as resolved because most of them will be outdated.

@serizba
Copy link
Owner

serizba commented Sep 23, 2022

The objective of this PR is to avoid copying the data when retrieving the data with get_data. But, isn't #202 a simpler way of achieving this, and thus avoiding also the inconveniences of std::vector and std::span. Besides, if you need a std::span later on, you can also use the obtained raw_data for this.

What do you think?

@serizba serizba added the awaiting-answer Awaiting reply from the author of the issue label Sep 23, 2022
@ljn917
Copy link
Contributor

ljn917 commented Sep 28, 2022

In my opinion, they are just two different styles of doing it, ie. the C style and the C++ style. The efficiency/convenience of getting the raw data has improved a lot since this PR, so I don't think its priority is as high as before. In addition, the key issue of this PR is importing a third party std::span. To be honest, I am not a big fan of it. I would rather wait until C++20 becomes more accessible.

@serizba
Copy link
Owner

serizba commented Sep 29, 2022

Yes, that's also my impression. Perhaps for the moment I will include this with a precompiler directive when the C++20 is enabled. And I won't include the span implementation.

@serizba serizba removed the awaiting-answer Awaiting reply from the author of the issue label Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants