Skip to content

Add unit test for testing distributed training with mnist#11189

Merged
Yancey0623 merged 8 commits intoPaddlePaddle:developfrom
Yancey0623:test_dist_mnist_acc
Jun 27, 2018
Merged

Add unit test for testing distributed training with mnist#11189
Yancey0623 merged 8 commits intoPaddlePaddle:developfrom
Yancey0623:test_dist_mnist_acc

Conversation

@Yancey0623
Copy link
Contributor

Fixed #11188

@Yancey0623 Yancey0623 requested a review from typhoonzero June 5, 2018 07:43
@panyx0718
Copy link
Contributor

It seems the timeout issue still happens

@Yancey0623
Copy link
Contributor Author

Yancey0623 commented Jun 6, 2018

@panyx0718 yes, but it passed the test on my develop host, will keep tracing this problem.

panyx0718
panyx0718 previously approved these changes Jun 7, 2018

SEED = 1
DTYPE = "float32"
paddle.dataset.mnist.fetch()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fake data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, shall we need to fake all kinds of dataset which we used in the unit test, such as uc_housing, word2vec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Team city is mounting cached dataset into docker I think, and using a real dataset is good for testing training to converge

@typhoonzero
Copy link
Contributor

Thought @velconia have fixed that SIGKILL problem? Is there still other problems?


SEED = 1
DTYPE = "float32"
paddle.dataset.mnist.fetch()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Team city is mounting cached dataset into docker I think, and using a real dataset is good for testing training to converge

acc_val = np.array(acc_set).mean()
avg_loss_val = np.array(avg_loss_set).mean()
if float(acc_val
) > 0.2: # Smaller value to increase CI speed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can let acc > 0.8 to test model converging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yancey0623 Yancey0623 merged commit 958823f into PaddlePaddle:develop Jun 27, 2018
@Yancey0623 Yancey0623 deleted the test_dist_mnist_acc branch June 27, 2018 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants