Fluid benchmark support recordio reader#11121
Fluid benchmark support recordio reader#11121typhoonzero merged 10 commits intoPaddlePaddle:developfrom
Conversation
| label = fluid.layers.data(name='label', shape=[1], dtype='int64') | ||
| if args.use_reader_op: | ||
| filelist = [ | ||
| os.path.join(args.data_path, f) for f in os.listdir(args.data_path) |
There was a problem hiding this comment.
We can use glob to specify the files.
benchmark/fluid/README.md
Outdated
| and batch_size you choose: | ||
|
|
||
| ```bash | ||
| python -c 'from recordio_converter import *; prepare_mnist("data", 32)' |
There was a problem hiding this comment.
It's better to set batch_size=1, we can set the batch_size in the trainer reader.
… fluid_benchmark_support_recordioreader
benchmark/fluid/fluid_benchmark.py
Outdated
| for pass_id in range(args.pass_num): | ||
| train_losses = [] | ||
| for batch_id, data in enumerate(train_reader()): | ||
| reader_generator = train_reader() |
There was a problem hiding this comment.
reader_generator = train_reader() ==>
if not args.use_reader_op:
reader_generator = train_reader()
benchmark/fluid/fluid_benchmark.py
Outdated
| num_samples += len(data) | ||
| batch_id += 1 | ||
| # FIXME(wuyi): last batch size maybe different | ||
| num_samples += len(args.batch_size) |
There was a problem hiding this comment.
For use_reader_op, if the current pass is not the last, the last batch of this pass is also equal to args.batch_size.
benchmark/fluid/fluid_benchmark.py
Outdated
| iters = 0 | ||
| start_time = time.time() | ||
| for batch_id, data in enumerate(train_reader()): | ||
| reader_generator = train_reader() |
There was a problem hiding this comment.
reader_generator = train_reader() ==>
if not args.use_reader_op:
reader_generator = train_reader()
| thread_num=args.gpus) | ||
| data_file = fluid.layers.double_buffer( | ||
| fluid.layers.batch( | ||
| data_file, batch_size=args.batch_size)) |
There was a problem hiding this comment.
For use_reader_op, the batch_size of fluid.layers.batch is set with a single card, this is to say if the batch size is 256 when training Vgg, and the machine has 4 cards, the batch_size for fluid.layers.batch should be 64.
| thread_num=args.gpus) | ||
| data_file = fluid.layers.double_buffer( | ||
| fluid.layers.batch( | ||
| data_file, batch_size=args.batch_size)) |
benchmark/fluid/fluid_benchmark.py
Outdated
| num_samples = 0 | ||
| if iters == args.iterations: | ||
| # NOTE: if use reader ops, the input data is not splited to multiple cards | ||
| if args.use_reader_op and iters >= args.iterations / args.gpus: |
There was a problem hiding this comment.
I don't think iters >= args.iterations / args.gpus is appropriate.
Because the model's accuracy is highly related to the new parameters that have learned, but the new parameters may be related to the times of updating parameter. So maybe we should not do that.
There was a problem hiding this comment.
Well, args.iterations is intended to let the benchmark finish fast, no concerns for model accuracy. To run a full model training, we can set args.iterations to -1 so that it can run until all train data have been fed.
… fluid_benchmark_support_recordioreader
benchmark/fluid/fluid_benchmark.py
Outdated
| # is also equal to args.batch_size. | ||
| num_samples += len(args.batch_size) | ||
| if args.use_reader_op: | ||
| num_samples += args.batch_size |
There was a problem hiding this comment.
args.batch_size is the batch size on each GPU now. So it should be num_samples += args.batch_size * args.gpus.
There was a problem hiding this comment.
Thanks very much! Done. Current know issue, if set --use_reader_op we must also set --no_test will fix this in next PR.
… fluid_benchmark_support_recordioreader
This can also fix the issue when running with
--gpus > 1