Executor interface design and implementation by QiJune · Pull Request #4537 · PaddlePaddle/Paddle

QiJune · 2017-09-30T06:20:02Z

Fix #4523 #4557

Timeline(@tonyyang-svail):

Oct 3rd, Tuesday

Implement vector<Tensor> Executor.Run). It
- Takes a ProgramDesc and Scope
- Creates a local scope
- Runs the whole graph
- fetches desired value
  - possible test: Init, add, then fetch

Oct 4th, Wednesday

Write FeedOp and FetchOp design Doc and get general feedback on it.

Oct 5th, Thursday

Reach a basic conclusion on FeedOp and FetchOp.
Implement FeedOp(by @QiJune )
- Note: Prepend FeedOp to ProgramDesc, so that it will be ran first.
Implement FetchOp(by @QiJune )
Implement first version of Prune. It
- Takes const ProgramDesc& input (UseFeedOp to find feed, and use FetchOp to find target.
- Returns a vector<bool> indicates a op should be run or no.

Oct 6th, Friday

Make GPU unit test work (by @QiJune )
Test Simple Graph on Prune

Oct 7th, Saturday

Oct 8th, Sunday

Test Complicated Graph on Prune
- ~~Add Backward ProtoBuf~~. (~~Several bugs found on backward There should be a fill_one_like_op as an starting point of backward pass #4627~~. Fixed)
- Keep consistent with AppendBackward. Switch to ProgramDescBind.

Oct 9th, Monday

Merge FeedOp and FetchOp design doc Create feed_op_and_fectch_op Desgin Doc #4599
~~Integrate InitOp into ProgramDesc~~. (More discussion needed for where to put init op)
Milestone: pass a simple test on forward and backward multiple times.

helinwang · 2017-10-02T00:28:25Z

paddle/framework/executor.cc

I think one of the most important thing for the executor is Run should be thread-safe (e.g., ok to do concurrent Runs). This is a must for inferencing.

That's a good question. We must allow doing concurrent Runs in inference with only one copy of parameters in memory.
I am thinking whom and where to do parameters loading/saving. Our topology can be serialized to ProgramDesc, and what will the parameters serialized.

helinwang · 2017-10-02T00:31:36Z

paddle/framework/executor.cc

Why do we need LinearListView since we already have GraphView?

LinearListView organizes the topology in a linearlist, and operators will be executed sequentially.

GraphView organizes the topology in a Graph, and further optimization can be applied based on Graph structure.
I think that we can have LinearListView at now, maybe GraphView can be implemented later.

reyoung · 2017-10-02T20:17:18Z

paddle/framework/executor.cc

+class LinearListView;
+class GraphView;
+
+// Immutable view of a ProgramDesc organized for efficient execution.


This has been implemented by framework/op_desc.h

wangkuiyi · 2017-10-02T20:48:46Z

paddle/framework/executor.h

+  virtual void Run() = 0;
+};
+
+Executor* NewLocalExecutor(const platform::Place&, const ProgramDesc&, bool);


Rename and redesign NewLocalExecutor into

NewExecutor(const std::vector<Place>& places);

No need for the bool optimize parameter.

ProgramDesc is a parameter to Executor::Run.

wangkuiyi · 2017-10-02T20:49:12Z

paddle/framework/executor.h

+
+class Executor {
+ public:
+  virtual ~Executor() {}


@helinwang has a suggestion -- given that the construction of an executor could be expensive -- including the creation of thread pools, it would be reasonable to reuse an executor to run multiple ProgramDesc's.

Therefore we need the constructor:

Executor(const std::vector<Place>& places);

and the Run method:

virtual void Run(const ProgramDesc& program, Scope* scope);

reyoung · 2017-10-09T21:36:06Z

paddle/framework/executor.cc

+    scope->NewVar(var.name());
+  }
+
+  Scope& local_scope = scope->NewScope();


It seems that we should drop local_scope after invoking Run?

Looks like there is no easy way to do this. I will add a TODO on this.

reyoung · 2017-10-09T21:37:40Z

paddle/framework/executor.cc

+          }
+        }
+      }
+      auto op = paddle::framework::OpRegistry::CreateOp(block.ops(i));


I think it will be extremely slow if we create operator every time because the protobuf message will be parsed and copied.

Let's keep the most straightforward implementation. (i.e. avoid any possible premature optimization). Once we get it running, we can go ahead to do profiling.

reyoung · 2017-10-09T21:38:51Z

paddle/framework/executor.cc

+  }
+}
+
+std::vector<bool> Executor::Preprocess(const ProgramDesc& pdesc) {


Preprocess is a bad name. Maybe just Prune is OK. Since there are many preprocessing stages not only Prune

abhinavarora · 2017-10-09T22:31:29Z

paddle/operators/feed_op.cc

+ public:
+  FeedOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddAttr<int>("data_type", "output data type")


Please make sure that all Input, Output and Attribute names adhere to the naming convention. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md

tonyyang-svail · 2017-10-10T05:32:02Z

paddle/operators/feed_op.cc

+
+    PADDLE_ENFORCE_GT(tensors.size(), static_cast<size_t>(col));
+    auto in_dim = tensors[col].dims();
+    ctx->SetOutputDim("Out", in_dim);


@QiJune should we use dims attribute to infer output shape?

Yes, I fixed it.

could you also add enforce (*tensors)[col].numel == product(dims). Chances are a user specifies the wrong the col.

tonyyang-svail · 2017-10-10T08:15:45Z

paddle/operators/fetch_op.cc

+
+    auto input_dim = ctx->GetInputDim("Input");
+    PADDLE_ENFORCE_GT(tensors->size(), col);
+    (*tensors)[col].Resize(input_dim);


@QiJune Same for fetch op, inferShape according to dims attribute.

Then enforce (*tensors)[col].numel == product(dims)

@tonyyang-svail
I have a discussion with @jacquesqiao, InferShape will be done in compile-time first. So, InferShape of FeedOp and FetchOp can not use run-time concepts, like GlobalScope.
FeedOp has an attribute dims to set its output tensor dims. And FecthOp does not need a dims attribute. dims of Tensors in fetch_result can be set from FetchOp's input tensor.

…sign

dzhwinter · 2017-10-10T18:37:33Z

paddle/framework/executor.h

+   * @return
+   *  vector<bool> Same size as ops. Indicates whether an op should be run.
+   */
+  std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id);


Prune is a high level optimize part, which should be done before executor run. The executor takes a "no redundant op groups".

dzhwinter · 2017-10-10T18:38:59Z

paddle/framework/executor.cc

+  std::vector<bool> should_run = Prune(pdesc, block_id);
+  PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size());
+  for (size_t i = 0; i < should_run.size(); ++i) {
+    // if (should_run[i]) {


No comment out code, please.

dzhwinter · 2017-10-10T18:40:01Z

paddle/framework/executor.cc

+  PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size());
+  for (size_t i = 0; i < should_run.size(); ++i) {
+    // if (should_run[i]) {
+    if (true) {


It's better to add a constant value and assigned with true. Otherwise, this would be a magic value.

dzhwinter · 2017-10-10T18:42:08Z

paddle/framework/executor_test.cc

+USE_OP(fill_constant);
+USE_OP(sgd);
+
+using std::string;


seems useless?

wangkuiyi · 2017-10-11T02:48:16Z

paddle/framework/executor.h

+};
+
+/* @Brief
+ * Pruning the graph


What is the purpose of Prune? I'd thought that it needs a target parameter, which could be either a variable or an operator, and it returns a new ProgramDesc that includes only dependent operators. But why doesn't the following code take a target parameter?

wangkuiyi · 2017-10-11T02:51:49Z

paddle/framework/executor.h

+  explicit Executor(const std::vector<platform::Place>& places);
+  ~Executor();
+
+  /* @Brief


I think C++ code is the document, and we don't really need to use Doxygen. Therefore, we can write much shorter comments. For this specific case Executor::Run, I don't even think that it needs a comment.

wangkuiyi

I am approving this PR so it doesn't last too long time. But please consider my comments in #4537 (comment).

In my mind,

ProgramDesc shouldn't carry targets because a program includes all the instructions supposed to be executed.

The Prune function's signature should be

int/bool Prune(
    const ProgramDesc* input, 
    const std::vector<std::string>& targets, 
    ProgramDesc* output);

QiJune added 3 commits September 29, 2017 22:11

add executor class and interface

540cc2c

add global device context

3481bdc

add executor unittest

e42cafb

QiJune requested review from JiayiFeng, Superjomn, dzhwinter, helinwang, jacquesqiao, reyoung, tonyyang-svail and wangkuiyi September 30, 2017 06:29

QiJune force-pushed the executor_impl branch 3 times, most recently from 70663eb to 714148a Compare September 30, 2017 19:35

fix gpu build error

d4be973

QiJune force-pushed the executor_impl branch from 714148a to d4be973 Compare September 30, 2017 22:31

QiJune added 5 commits September 30, 2017 15:52

add scope

b630d40

Merge remote-tracking branch 'baidu/develop' into executor_impl

39b2ff3

pass place to GetCUDADeviceContext

0950091

add struct Device

ce4d14b

Merge remote-tracking branch 'baidu/develop' into executor_impl

f1c5d9e

helinwang reviewed Oct 2, 2017

View reviewed changes

tonyyang-svail mentioned this pull request Oct 2, 2017

Add Executor #4524

Closed

fix gpu build error

f29a6b0

QiJune force-pushed the executor_impl branch from 0611a28 to f29a6b0 Compare October 2, 2017 17:56

reyoung reviewed Oct 2, 2017

View reviewed changes

wangkuiyi reviewed Oct 2, 2017

View reviewed changes

follow comments

b5dbe88

QiJune force-pushed the executor_impl branch from 38e82dc to b5dbe88 Compare October 3, 2017 03:28

reyoung reviewed Oct 9, 2017

View reviewed changes

abhinavarora reviewed Oct 9, 2017

View reviewed changes

Yang Yang and others added 9 commits October 9, 2017 22:57

clean up for review

e515571

Init at block[0]; Run at block[1]

340d21d

merge develop

e655d29

debug for sum

932402c

follow comments and refine codes

1540074

Merge remote-tracking branch 'baidu/develop' into executor_impl

7d21d8c

Fix bug

0e1f21a

merge 4664 in advance

a17442d

pass simple backward

e3161bb

tonyyang-svail reviewed Oct 10, 2017

View reviewed changes

Yang Yang and others added 2 commits October 10, 2017 05:33

pass multiple forward backward

2fc7fc7

infer feed operator output variable shape with dims attribute

975a512

tonyyang-svail reviewed Oct 10, 2017

View reviewed changes

make infershape of feedop and fetchop compatible with compile-time de…

a308ff2

…sign

dzhwinter reviewed Oct 10, 2017

View reviewed changes

Yang Yang and others added 8 commits October 10, 2017 18:53

set variable support dim

3f9e247

add feed infershape todo

293a7d1

clean up

062ff4d

remove log in backward

2e7cd20

follow comments

436ea50

remove prune as member function to function

a528a97

merge follow comment

f410622

clean up for merge

434949c

tonyyang-svail mentioned this pull request Oct 11, 2017

Executor Design Doc #4649

Merged

wangkuiyi reviewed Oct 11, 2017

View reviewed changes

wangkuiyi approved these changes Oct 11, 2017

View reviewed changes

tonyyang-svail changed the title ~~(WIP)Executor interface design and implementation~~ Oct 11, 2017

tonyyang-svail merged commit c3bf332 into PaddlePaddle:develop Oct 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executor interface design and implementation#4537

Executor interface design and implementation#4537
tonyyang-svail merged 64 commits intoPaddlePaddle:developfrom
QiJune:executor_impl

QiJune commented Sep 30, 2017 •

edited by tonyyang-svail

Loading

helinwang Oct 2, 2017

QiJune Oct 2, 2017

helinwang Oct 2, 2017

QiJune Oct 2, 2017

reyoung Oct 2, 2017

wangkuiyi Oct 2, 2017

wangkuiyi Oct 2, 2017

reyoung Oct 9, 2017

tonyyang-svail Oct 9, 2017

reyoung Oct 9, 2017

tonyyang-svail Oct 9, 2017

reyoung Oct 9, 2017

tonyyang-svail Oct 9, 2017

abhinavarora Oct 9, 2017

QiJune Oct 10, 2017

tonyyang-svail Oct 10, 2017

QiJune Oct 10, 2017 •

edited by tonyyang-svail

Loading

tonyyang-svail Oct 10, 2017

tonyyang-svail Oct 10, 2017

tonyyang-svail Oct 10, 2017

QiJune Oct 10, 2017

dzhwinter Oct 10, 2017

dzhwinter Oct 10, 2017

dzhwinter Oct 10, 2017

dzhwinter Oct 10, 2017

wangkuiyi Oct 11, 2017

wangkuiyi Oct 11, 2017

wangkuiyi left a comment •

edited

Loading

Labels

11 participants

Conversation

QiJune commented Sep 30, 2017 • edited by tonyyang-svail Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Timeline(@tonyyang-svail):

Oct 3rd, Tuesday

Oct 4th, Wednesday

Oct 5th, Thursday

Oct 6th, Friday

Oct 7th, Saturday

Oct 8th, Sunday

Oct 9th, Monday

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QiJune Oct 10, 2017 • edited by tonyyang-svail Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Labels

11 participants

QiJune commented Sep 30, 2017 •

edited by tonyyang-svail

Loading

QiJune Oct 10, 2017 •

edited by tonyyang-svail

Loading

wangkuiyi left a comment •

edited

Loading