Skip to content

Add pinned memory#9216

Merged
chengduoZH merged 5 commits intoPaddlePaddle:developfrom
chengduoZH:feature/add_pinned_memory
Mar 26, 2018
Merged

Add pinned memory#9216
chengduoZH merged 5 commits intoPaddlePaddle:developfrom
chengduoZH:feature/add_pinned_memory

Conversation

@chengduoZH
Copy link
Contributor

WIP

The current CUDA Runtime Documentation states:
Asynchronous(Memcpy):
- For transfers from device memory to pageable host memory, the function will return only once the copy has completed.

@chengduoZH chengduoZH force-pushed the feature/add_pinned_memory branch 3 times, most recently from c4d0515 to e0156c1 Compare March 20, 2018 05:19
@chengduoZH chengduoZH force-pushed the feature/add_pinned_memory branch 4 times, most recently from 999ea6c to ef027b3 Compare March 20, 2018 11:08
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why need to add use_pinned_ to tensor, if we just want to use pinned memory for copying, just put this in the CopyTensor implement should OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just put this in the CopyTensor implement

Do you mean, for CPU->GPU, to copy the data from CPU to pinned memory first and then copy from pinned memory to GPU?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No of course. I was thinking about GPU->CPU copy. For CPU->GPU case, it seems this is the only way now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use the pinned memory just as a staging area but not use it directly? The computations of CPU can access the data of the pinned memory directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, can we set use_pinned_ as a global GFLAG, so that all allocations on host would use pinned memory, then we can test the overall performance boost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the memory copying between GPU and CPU is async, for GPU->CPU case, before reading data from the pinned memory we should ensure the copy has completed, so we should add sync operation. So using pinned memory is a little complex.
I plan to only put the input data into pinned memory and test the overall performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still WIP, can I review and merge this?

@chengduoZH chengduoZH force-pushed the feature/add_pinned_memory branch from ef027b3 to eaa90d3 Compare March 20, 2018 12:04
void* Alloc<platform::CUDAPlace>(platform::CUDAPlace place, size_t size,
bool use_pinned) {
void* ptr;
if (use_pinned) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alloc on CUDAPlace with use_pinned=false will return a pointer on device, but when calling with use_pinned=true will return a pointer on host, this is a little bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe we should add a new place (CUDAPinnedPlace).

#else
holder_.reset(new PlaceholderImpl<platform::CUDAPlace>(
boost::get<platform::CUDAPlace>(place), size, type));
boost::get<platform::CUDAPlace>(place), size, type, use_pinned));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holder_ pointer will be on host here may cause error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this information!

@chengduoZH chengduoZH changed the title [WIP] Add pinned memory Mar 26, 2018
@chengduoZH chengduoZH force-pushed the feature/add_pinned_memory branch from ba1178e to 9e99446 Compare March 26, 2018 11:13
Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++, we can add the new place type in next PR.

@chengduoZH chengduoZH merged commit 2e4a398 into PaddlePaddle:develop Mar 26, 2018
@chengduoZH chengduoZH mentioned this pull request Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants