Skip to content

Fluid benchmark support recordio reader#11121

Merged
typhoonzero merged 10 commits intoPaddlePaddle:developfrom
typhoonzero:fluid_benchmark_support_recordioreader
Jun 7, 2018
Merged

Fluid benchmark support recordio reader#11121
typhoonzero merged 10 commits intoPaddlePaddle:developfrom
typhoonzero:fluid_benchmark_support_recordioreader

Conversation

@typhoonzero
Copy link
Contributor

This can also fix the issue when running with --gpus > 1

label = fluid.layers.data(name='label', shape=[1], dtype='int64')
if args.use_reader_op:
filelist = [
os.path.join(args.data_path, f) for f in os.listdir(args.data_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use glob to specify the files.

and batch_size you choose:

```bash
python -c 'from recordio_converter import *; prepare_mnist("data", 32)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to set batch_size=1, we can set the batch_size in the trainer reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@luotao1 luotao1 mentioned this pull request Jun 4, 2018
@typhoonzero typhoonzero requested a review from chengduoZH June 5, 2018 08:08
for pass_id in range(args.pass_num):
train_losses = []
for batch_id, data in enumerate(train_reader()):
reader_generator = train_reader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader_generator = train_reader() ==>

if not args.use_reader_op:
    reader_generator = train_reader()
num_samples += len(data)
batch_id += 1
# FIXME(wuyi): last batch size maybe different
num_samples += len(args.batch_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For use_reader_op, if the current pass is not the last, the last batch of this pass is also equal to args.batch_size.

iters = 0
start_time = time.time()
for batch_id, data in enumerate(train_reader()):
reader_generator = train_reader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader_generator = train_reader() ==>

if not args.use_reader_op:
    reader_generator = train_reader()
thread_num=args.gpus)
data_file = fluid.layers.double_buffer(
fluid.layers.batch(
data_file, batch_size=args.batch_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For use_reader_op, the batch_size of fluid.layers.batch is set with a single card, this is to say if the batch size is 256 when training Vgg, and the machine has 4 cards, the batch_size for fluid.layers.batch should be 64.

thread_num=args.gpus)
data_file = fluid.layers.double_buffer(
fluid.layers.batch(
data_file, batch_size=args.batch_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

num_samples = 0
if iters == args.iterations:
# NOTE: if use reader ops, the input data is not splited to multiple cards
if args.use_reader_op and iters >= args.iterations / args.gpus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think iters >= args.iterations / args.gpus is appropriate.
Because the model's accuracy is highly related to the new parameters that have learned, but the new parameters may be related to the times of updating parameter. So maybe we should not do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, args.iterations is intended to let the benchmark finish fast, no concerns for model accuracy. To run a full model training, we can set args.iterations to -1 so that it can run until all train data have been fed.

# is also equal to args.batch_size.
num_samples += len(args.batch_size)
if args.use_reader_op:
num_samples += args.batch_size
Copy link
Contributor

@chengduoZH chengduoZH Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.batch_size is the batch size on each GPU now. So it should be num_samples += args.batch_size * args.gpus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much! Done. Current know issue, if set --use_reader_op we must also set --no_test will fix this in next PR.

@typhoonzero typhoonzero merged commit 635099c into PaddlePaddle:develop Jun 7, 2018
@typhoonzero typhoonzero deleted the fluid_benchmark_support_recordioreader branch June 7, 2018 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants