Add recommendation system implementation with new API#10535
Add recommendation system implementation with new API#10535sidgoyal78 wants to merge 8 commits intoPaddlePaddle:developfrom
Conversation
| mov_combined_features = get_mov_combined_features() | ||
|
|
||
| inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features) | ||
| scale_infer = layers.scale(x=inference, scale=5.0) |
There was a problem hiding this comment.
In the new example, there is a new pattern like this. Should we follow that pattern here?
def inference_program:
...
return prediction
def train_program:
prediction = inference_program
...
return avg_cost
There was a problem hiding this comment.
Yeah, good point. Modified the code.
| return mov_combined_features | ||
|
|
||
|
|
||
| def train_network(): |
There was a problem hiding this comment.
Let's change it to train_program
| return avg_cost, scale_infer | ||
|
|
||
|
|
||
| def inference_network(): |
There was a problem hiding this comment.
Let's change this to inference_program
| paddle.dataset.movielens.test(), batch_size=BATCH_SIZE) | ||
|
|
||
| def event_handler(event): | ||
| if isinstance(event, fluid.EndIteration): |
There was a problem hiding this comment.
This is now rename to fluid.EndEpochEvent
|
|
||
| def event_handler(event): | ||
| if isinstance(event, fluid.EndIteration): | ||
| if (event.batch_id % 10) == 0: |
There was a problem hiding this comment.
this is now re-name to event.epoch
| train_reader, | ||
| EPOCH_NUM, | ||
| event_handler=event_handler, | ||
| data_feed_handler=partial(func_feed, feeding_map)) |
There was a problem hiding this comment.
Trainer takes in reader and feed_order. I am not sure if it will support data_feed_handler later.
There was a problem hiding this comment.
I think you might have missed what i have written in the description of the PR. I basically wanted to point out that the way data is fed (https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recommender_system.py#L231) is a bit different here as compared to other book chapters, hence I think there is a need to invoke the concept of a data_feed_handler. What do you think?
There was a problem hiding this comment.
Ah, sorry about missing the PR description. I think we need to discuss with ReYung and JiaYi. So the train function can support it.
| if (event.batch_id % 10) == 0: | ||
| avg_cost = trainer.test(reader=test_reader) | ||
|
|
||
| print('BatchID {0:04}, Loss {1:2.2}'.format(event.batch_id + 1, |
sidgoyal78
left a comment
There was a problem hiding this comment.
@jacquesqiao : Can you please take a look? I modified the train() and test() methods to allow for a data_feed_handler attribute.
| event_handler(begin_event) | ||
| if begin_event.fetch_metrics: | ||
| metrics = exe.run(feed=data, | ||
| metrics = exe.run(feed=data_feed_handler(data) |
There was a problem hiding this comment.
@jacquesqiao : I think probably this way of handling data with data_feed_handler is incorrect right? We would need data from the reader, rather than the feeder.decorate_reader right?
There was a problem hiding this comment.
I think the answer to the above question is yes.
So do you have any recommendation on how we could handle this data_feed_handler with the decorate_reader? I guess without the decorate_reader we can't to parallel fetching, but now with its presence, it is tricky to incorporate the data_feed_handler.
| return [avg_cost, scale_infer] | ||
|
|
||
|
|
||
| def func_feed(feeding, place, data): |
There was a problem hiding this comment.
Maybe we can write a new reader above the default train reader? But not use a function handle, we can process the data in the new reader
There was a problem hiding this comment.
I support this approach. Let the Trainer handle the train process to keep it compact and simple. I also feel that way it will be easier for the user to understand the flow of the fluid programming.
There was a problem hiding this comment.
I am working on this and it should be done by the end of Wed (May 23)
https://github.com/daming-lu/Paddle/tree/recommend_sys
Sid is on label_semantics, and Nicky on machine_translation, which might also need a new reader.
|
Closing this PR (see #10894). |
Fix #10423
In this chapter, the way input data is fed is a bit different than other chapters, hence I had to create a
data_feed_handlerattribute intrainer.train()EDIT: The code is modified based on the recent update to trainer API (#10674). As above, we add a
data_feed_handlerattribute totrain().