Design for remote parameter updater using cclient#2330
Design for remote parameter updater using cclient#2330jacquesqiao wants to merge 16 commits intoPaddlePaddle:developfrom
Conversation
|
|
||
| Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md)) | ||
|
|
||
| In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a member of parameter updater and call it's functions directly. In this design, we should also expose the api of RemoteParameterUpdater to python with swig. |
There was a problem hiding this comment.
hold a member -> hold the instance
|
|
||
| Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md)) | ||
|
|
||
| In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a member of parameter updater and call it's functions directly. In this design, we should also expose the api of RemoteParameterUpdater to python with swig. |
There was a problem hiding this comment.
we should -> we will
感觉“we will"的语气更加确定,更合适design doc。"we should"在英语里我理解是我们应当做(但不一定立刻会做)。
|
CI failed, probably need to use pre commit hook. |
… parameterupdater
… parameterupdater
dzhwinter
left a comment
There was a problem hiding this comment.
👍 such a fast implement!
go/pserver/cclient/CMakeLists.txt
Outdated
| include(flags) | ||
|
|
||
| go_library(client STATIC) | ||
| go_library(paddle_cclient STATIC) |
There was a problem hiding this comment.
maybe named with pserver_cclient will be better? since it is only used by the new trainer module and communicate with parameter server. paddle_cclient would lead to an inference client at first glance.
There was a problem hiding this comment.
Spend a lot of time working on CMake to make it work properly.@@
This is a good suggestion, I will rename it~
|
|
||
| // create parameter server client. | ||
| parameterClient_ = | ||
| paddle_new_pserver_client((char *)pserverSpec_.c_str(), FLAGS_trainer_id); |
There was a problem hiding this comment.
there is a little different with current RemoteUpdater here. we don't have a trainer id in paddle cloud.
| LOG(INFO) << "finishBatch in, cost: " << cost; | ||
|
|
||
| // send gradient to parameter server. | ||
| paddle_send_grads(parameterClient_, *newGradients_, parameterSize()); |
There was a problem hiding this comment.
variable cost is useless here. cannot find anywhere update the newGradients_
| # in github. | ||
| add_custom_target(copyPaddle | ||
| COMMAND ln -sf ${PADDLE_DIR} ${PADDLE_IN_GOPATH}) | ||
| COMMAND rm -rf ${PADDLE_IN_GOPATH}/Paddle |
There was a problem hiding this comment.
Strange, -f in ln command force the link, so we probably don't need to remove the old link.
There was a problem hiding this comment.
这个地方我换了一种link方式,在最后面指定了 目录名为Paddle,因为目前的做法,如果paddle所在目录改名字了,link的名字也会变,这样 go get就找不到目录了。
然后修改之后,在mac上两次编译会报Operation not permitted,因为他试图在Paddle这个链接下再建一个链接,循环了。
| add_custom_target(${NAME} ALL DEPENDS ${OUTPUT_DIR}/.timestamp ${ARGN}) | ||
| add_dependencies(${NAME} goGet) | ||
| add_dependencies(${NAME} copyPaddle) | ||
| # add_dependencies(${NAME} goGet) |
There was a problem hiding this comment.
It need to depend on go get to get all dependencies, are there anything wrong with this line?
|
|
||
| // create parameter server client. | ||
| parameterClient_ = | ||
| paddle_new_pserver_client((char *)pserverSpec_.c_str(), FLAGS_trainer_id); |
There was a problem hiding this comment.
paddle_new_pserver_client in cclient does not take any argument. I am curious how does this compile?
There was a problem hiding this comment.
https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/cclient/cclient.go#L103
func paddle_new_pserver_client(addrs *C.char, selected int) C.client {There was a problem hiding this comment.
That's new, sorry I got confused.
| paddle_get_params( | ||
| parameterClient_, names_, newParameters_, (int)parameters_.size()); | ||
| } | ||
| paddle_finish_init_params(parameterClient_); |
There was a problem hiding this comment.
only the trainer that inits params should call paddle_finish_init_params. Please put this line to line 56.
Please see the graph in the linked section for reference: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md#trainer-selection-process
| num_passes, use_sparse_updater) | ||
| else: | ||
| parameter_updater = self.__create_new_remote_updater__( | ||
| pserver_spec) |
There was a problem hiding this comment.
The cclient don't need pserver spec (the only required spec is parameter config), please see: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md#trainer-selection-process
Maybe you have an outdated version of cclient?
There was a problem hiding this comment.
I see, the current code may be old here?
https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/cclient/cclient.go#L103
func paddle_new_pserver_client(addrs *C.char, selected int) C.client {There was a problem hiding this comment.
That's new, sorry I got confused.
|
closed and open a new pr #2413 |
refs: #2274
测试方法:
1,编译。
2,编译完成之后运行 go run ./go/cmd/pserver/pserver.go -port 3000 会在本地启动一个parameter server,监听3000端��。
3,python paddle/go/pserver/cclient/test/test_train.py