Add the parsing part for the profiling tool#7043
Conversation
|
The profiling report looks like: And this pull request is ready for review. @qingqing01 @reyoung |
paddle/platform/profiler.cc
Outdated
| void PopEvent(const std::string& name, DeviceContext* dev_ctx) { | ||
| GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id, | ||
| dev_ctx); | ||
| } |
There was a problem hiding this comment.
Need to refine RecordEvent::RecordEvent() and RecordEvent::~RecordEvent() to reuse these two functions.
paddle/platform/profiler.cc
Outdated
| std::cout << "Place: GPU" << std::endl; | ||
| #else | ||
| std::cout << "Place: CPU" << std::endl; | ||
| #endif |
There was a problem hiding this comment.
If compiling with GPU, the users also can run tasks on CPU only. So here should not use the PADDLE_WITH_CUDA macro. You can use g_state to determine.
There was a problem hiding this comment.
Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.
paddle/platform/profiler.cc
Outdated
| double event_time = rit->CudaElapsedMs(events[i][j]); | ||
| #else | ||
| double event_time = rit->CpuElapsedMs(events[i][j]); | ||
| #endif |
There was a problem hiding this comment.
The same problem with above.
paddle/platform/profiler.cc
Outdated
| pushed_events.erase((++rit).base()); | ||
| } else { | ||
| std::cout << "Warning: can not find the start marker of event " | ||
| << events[i][j].name(); |
| << std::setw(data_width) << event_item.max_time | ||
| << std::setw(data_width) << event_item.ave_time << std::endl; | ||
| } | ||
| } |
There was a problem hiding this comment.
Maybe the ParseEvents and print can be two functions.
kuke
left a comment
There was a problem hiding this comment.
Followed all the comments. Thanks!
paddle/platform/profiler.cc
Outdated
| void PopEvent(const std::string& name, DeviceContext* dev_ctx) { | ||
| GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id, | ||
| dev_ctx); | ||
| } |
paddle/platform/profiler.cc
Outdated
| std::cout << "Place: GPU" << std::endl; | ||
| #else | ||
| std::cout << "Place: CPU" << std::endl; | ||
| #endif |
There was a problem hiding this comment.
Done. Use another global variable g_profiler_place because g_state has been set to kDisabled when parsing events.
paddle/platform/profiler.cc
Outdated
| double event_time = rit->CudaElapsedMs(events[i][j]); | ||
| #else | ||
| double event_time = rit->CpuElapsedMs(events[i][j]); | ||
| #endif |
paddle/platform/profiler.cc
Outdated
| pushed_events.erase((++rit).base()); | ||
| } else { | ||
| std::cout << "Warning: can not find the start marker of event " | ||
| << events[i][j].name(); |
| << std::setw(data_width) << event_item.max_time | ||
| << std::setw(data_width) << event_item.ave_time << std::endl; | ||
| } | ||
| } |
paddle/platform/profiler.cc
Outdated
| #include "paddle/platform/profiler.h" | ||
| #include <iomanip> | ||
| #include <map> | ||
| #include "gflags/gflags.h" |
There was a problem hiding this comment.
The gflags is not used. This header can be removed.
paddle/platform/profiler.cc
Outdated
| return a.max_time > b.max_time; | ||
| default: | ||
| return a.ave_time > b.ave_time; | ||
| } |
There was a problem hiding this comment.
Maybe this lambda expression can be selected before the for loop in line 187, and the sort_domain can be saved at the same time. Then pass the sort_domain as a string argument to PrintProfilingReport function.
paddle/platform/profiler.cc
Outdated
| #include "paddle/platform/profiler.h" | ||
| #include <iomanip> | ||
| #include <map> | ||
| #include "gflags/gflags.h" |
paddle/platform/profiler.cc
Outdated
| return a.max_time > b.max_time; | ||
| default: | ||
| return a.ave_time > b.ave_time; | ||
| } |
| return ms; | ||
| #else | ||
| PADDLE_THROW("CUDA is not enabled"); | ||
| #endif |
There was a problem hiding this comment.
I can not comment on line 110 but only write the comments here.
line 110 should be before line 109.
| // The profiler state, the initial value is ProfilerState::kDisabled | ||
| static ProfilerState g_state = ProfilerState::kDisabled; | ||
| // To record which timer the profiler used, CUDA or CPU. | ||
| static std::string g_profiler_place = ""; |
There was a problem hiding this comment.
g_profiler_place and g_state should be thread_local, just like g_thread_id.
| name_(std::move(name)), | ||
| thread_id_(thread_id), | ||
| has_cuda_(false) { | ||
| : kind_(kind), name_(name), thread_id_(thread_id), has_cuda_(false) { |
There was a problem hiding this comment.
Why do not you use std::move?
| double Event::CpuElapsedUs(const Event& e) const { | ||
| return (e.cpu_ns_ - cpu_ns_) / (1000.0); | ||
| double Event::CpuElapsedMs(const Event& e) const { | ||
| return (e.cpu_ns_ - cpu_ns_) / (1000000.0); |
There was a problem hiding this comment.
Ii will be better to replace / (1000000.0) with *0.000001.
See Part I in #6701
Please review this part after the Part I merged.