Add unit test for testing distributed training with mnist#11189
Add unit test for testing distributed training with mnist#11189Yancey0623 merged 8 commits intoPaddlePaddle:developfrom
Conversation
|
It seems the timeout issue still happens |
|
@panyx0718 yes, but it passed the test on my develop host, will keep tracing this problem. |
|
|
||
| SEED = 1 | ||
| DTYPE = "float32" | ||
| paddle.dataset.mnist.fetch() |
There was a problem hiding this comment.
I'm not sure, shall we need to fake all kinds of dataset which we used in the unit test, such as uc_housing, word2vec?
There was a problem hiding this comment.
Team city is mounting cached dataset into docker I think, and using a real dataset is good for testing training to converge
|
Thought @velconia have fixed that SIGKILL problem? Is there still other problems? |
|
|
||
| SEED = 1 | ||
| DTYPE = "float32" | ||
| paddle.dataset.mnist.fetch() |
There was a problem hiding this comment.
Team city is mounting cached dataset into docker I think, and using a real dataset is good for testing training to converge
| acc_val = np.array(acc_set).mean() | ||
| avg_loss_val = np.array(avg_loss_set).mean() | ||
| if float(acc_val | ||
| ) > 0.2: # Smaller value to increase CI speed |
There was a problem hiding this comment.
can let acc > 0.8 to test model converging.
Fixed #11188