Executor interface design and implementation#4537
Executor interface design and implementation#4537tonyyang-svail merged 64 commits intoPaddlePaddle:developfrom
Conversation
70663eb to
714148a
Compare
714148a to
d4be973
Compare
paddle/framework/executor.cc
Outdated
There was a problem hiding this comment.
I think one of the most important thing for the executor is Run should be thread-safe (e.g., ok to do concurrent Runs). This is a must for inferencing.
There was a problem hiding this comment.
That's a good question. We must allow doing concurrent Runs in inference with only one copy of parameters in memory.
I am thinking whom and where to do parameters loading/saving. Our topology can be serialized to ProgramDesc, and what will the parameters serialized.
paddle/framework/executor.cc
Outdated
There was a problem hiding this comment.
Why do we need LinearListView since we already have GraphView?
There was a problem hiding this comment.
LinearListVieworganizes the topology in a linearlist, and operators will be executed sequentially.GraphVieworganizes the topology in a Graph, and further optimization can be applied based on Graph structure.
I think that we can haveLinearListViewat now, maybeGraphViewcan be implemented later.
paddle/framework/executor.cc
Outdated
| class LinearListView; | ||
| class GraphView; | ||
|
|
||
| // Immutable view of a ProgramDesc organized for efficient execution. |
There was a problem hiding this comment.
This has been implemented by framework/op_desc.h
paddle/framework/executor.h
Outdated
| virtual void Run() = 0; | ||
| }; | ||
|
|
||
| Executor* NewLocalExecutor(const platform::Place&, const ProgramDesc&, bool); |
There was a problem hiding this comment.
Rename and redesign NewLocalExecutor into
NewExecutor(const std::vector<Place>& places);- No need for the
bool optimizeparameter. ProgramDescis a parameter toExecutor::Run.
paddle/framework/executor.h
Outdated
|
|
||
| class Executor { | ||
| public: | ||
| virtual ~Executor() {} |
There was a problem hiding this comment.
@helinwang has a suggestion -- given that the construction of an executor could be expensive -- including the creation of thread pools, it would be reasonable to reuse an executor to run multiple ProgramDesc's.
Therefore we need the constructor:
Executor(const std::vector<Place>& places);and the Run method:
virtual void Run(const ProgramDesc& program, Scope* scope);| scope->NewVar(var.name()); | ||
| } | ||
|
|
||
| Scope& local_scope = scope->NewScope(); |
There was a problem hiding this comment.
It seems that we should drop local_scope after invoking Run?
There was a problem hiding this comment.
Looks like there is no easy way to do this. I will add a TODO on this.
| } | ||
| } | ||
| } | ||
| auto op = paddle::framework::OpRegistry::CreateOp(block.ops(i)); |
There was a problem hiding this comment.
I think it will be extremely slow if we create operator every time because the protobuf message will be parsed and copied.
There was a problem hiding this comment.
Let's keep the most straightforward implementation. (i.e. avoid any possible premature optimization). Once we get it running, we can go ahead to do profiling.
paddle/framework/executor.cc
Outdated
| } | ||
| } | ||
|
|
||
| std::vector<bool> Executor::Preprocess(const ProgramDesc& pdesc) { |
There was a problem hiding this comment.
Preprocess is a bad name. Maybe just Prune is OK. Since there are many preprocessing stages not only Prune
paddle/operators/feed_op.cc
Outdated
| public: | ||
| FeedOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker) | ||
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddAttr<int>("data_type", "output data type") |
There was a problem hiding this comment.
Please make sure that all Input, Output and Attribute names adhere to the naming convention. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md
paddle/operators/feed_op.cc
Outdated
|
|
||
| PADDLE_ENFORCE_GT(tensors.size(), static_cast<size_t>(col)); | ||
| auto in_dim = tensors[col].dims(); | ||
| ctx->SetOutputDim("Out", in_dim); |
There was a problem hiding this comment.
@QiJune should we use dims attribute to infer output shape?
There was a problem hiding this comment.
could you also add enforce (*tensors)[col].numel == product(dims). Chances are a user specifies the wrong the col.
paddle/operators/fetch_op.cc
Outdated
|
|
||
| auto input_dim = ctx->GetInputDim("Input"); | ||
| PADDLE_ENFORCE_GT(tensors->size(), col); | ||
| (*tensors)[col].Resize(input_dim); |
There was a problem hiding this comment.
@QiJune Same for fetch op, inferShape according to dims attribute.
There was a problem hiding this comment.
Then enforce (*tensors)[col].numel == product(dims)
There was a problem hiding this comment.
@tonyyang-svail
I have a discussion with @jacquesqiao, InferShape will be done in compile-time first. So, InferShape of FeedOp and FetchOp can not use run-time concepts, like GlobalScope.
FeedOp has an attribute dims to set its output tensor dims. And FecthOp does not need a dims attribute. dims of Tensors in fetch_result can be set from FetchOp's input tensor.
paddle/framework/executor.h
Outdated
| * @return | ||
| * vector<bool> Same size as ops. Indicates whether an op should be run. | ||
| */ | ||
| std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id); |
There was a problem hiding this comment.
Prune is a high level optimize part, which should be done before executor run. The executor takes a "no redundant op groups".
paddle/framework/executor.cc
Outdated
| std::vector<bool> should_run = Prune(pdesc, block_id); | ||
| PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size()); | ||
| for (size_t i = 0; i < should_run.size(); ++i) { | ||
| // if (should_run[i]) { |
There was a problem hiding this comment.
No comment out code, please.
paddle/framework/executor.cc
Outdated
| PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size()); | ||
| for (size_t i = 0; i < should_run.size(); ++i) { | ||
| // if (should_run[i]) { | ||
| if (true) { |
There was a problem hiding this comment.
It's better to add a constant value and assigned with true. Otherwise, this would be a magic value.
paddle/framework/executor_test.cc
Outdated
| USE_OP(fill_constant); | ||
| USE_OP(sgd); | ||
|
|
||
| using std::string; |
| }; | ||
|
|
||
| /* @Brief | ||
| * Pruning the graph |
There was a problem hiding this comment.
What is the purpose of Prune? I'd thought that it needs a target parameter, which could be either a variable or an operator, and it returns a new ProgramDesc that includes only dependent operators. But why doesn't the following code take a target parameter?
| explicit Executor(const std::vector<platform::Place>& places); | ||
| ~Executor(); | ||
|
|
||
| /* @Brief |
There was a problem hiding this comment.
I think C++ code is the document, and we don't really need to use Doxygen. Therefore, we can write much shorter comments. For this specific case Executor::Run, I don't even think that it needs a comment.
There was a problem hiding this comment.
I am approving this PR so it doesn't last too long time. But please consider my comments in #4537 (comment).
In my mind,
-
ProgramDesc shouldn't carry targets because a program includes all the instructions supposed to be executed.
-
The Prune function's signature should be
int/bool Prune( const ProgramDesc* input, const std::vector<std::string>& targets, ProgramDesc* output);
Fix #4523 #4557
Timeline(@tonyyang-svail):
Oct 3rd, Tuesday
vector<Tensor> Executor.Run). ItProgramDescandScopeOct 4th, Wednesday
Oct 5th, Thursday
FeedOp(by @QiJune )FeedOptoProgramDesc, so that it will be ran first.FetchOp(by @QiJune )Prune. Itconst ProgramDesc& input(UseFeedOpto findfeed, and useFetchOpto findtarget.vector<bool>indicates a op should be run or no.Oct 6th, Friday
PruneOct 7th, Saturday
executor_test.ccRun()Preprocessing()Oct 8th, Sunday
PruneAdd Backward ProtoBuf. (Several bugs found on backward There should be a fill_one_like_op as an starting point of backward pass #4627. Fixed)AppendBackward. Switch toProgramDescBind.Oct 9th, Monday
Integrate InitOp into. (More discussion needed for where to put init op)ProgramDesc