Skip to content

Add recommendation system implementation with new API#10535

Closed
sidgoyal78 wants to merge 8 commits intoPaddlePaddle:developfrom
sidgoyal78:new_api_recsys
Closed

Add recommendation system implementation with new API#10535
sidgoyal78 wants to merge 8 commits intoPaddlePaddle:developfrom
sidgoyal78:new_api_recsys

Conversation

@sidgoyal78
Copy link
Contributor

@sidgoyal78 sidgoyal78 commented May 9, 2018

Fix #10423

In this chapter, the way input data is fed is a bit different than other chapters, hence I had to create a data_feed_handler attribute in trainer.train()

EDIT: The code is modified based on the recent update to trainer API (#10674). As above, we add a data_feed_handler attribute to train().

mov_combined_features = get_mov_combined_features()

inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features)
scale_infer = layers.scale(x=inference, scale=5.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new example, there is a new pattern like this. Should we follow that pattern here?

def inference_program:
   ...
   return prediction

def train_program:
    prediction = inference_program
    ...
    return avg_cost
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Modified the code.

return mov_combined_features


def train_network():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it to train_program

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return avg_cost, scale_infer


def inference_network():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to inference_program

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)

def event_handler(event):
if isinstance(event, fluid.EndIteration):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now rename to fluid.EndEpochEvent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def event_handler(event):
if isinstance(event, fluid.EndIteration):
if (event.batch_id % 10) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now re-name to event.epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

train_reader,
EPOCH_NUM,
event_handler=event_handler,
data_feed_handler=partial(func_feed, feeding_map))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trainer takes in reader and feed_order. I am not sure if it will support data_feed_handler later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have missed what i have written in the description of the PR. I basically wanted to point out that the way data is fed (https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recommender_system.py#L231) is a bit different here as compared to other book chapters, hence I think there is a need to invoke the concept of a data_feed_handler. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry about missing the PR description. I think we need to discuss with ReYung and JiaYi. So the train function can support it.

if (event.batch_id % 10) == 0:
avg_cost = trainer.test(reader=test_reader)

print('BatchID {0:04}, Loss {1:2.2}'.format(event.batch_id + 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to event.epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@sidgoyal78 sidgoyal78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao : Can you please take a look? I modified the train() and test() methods to allow for a data_feed_handler attribute.

@sidgoyal78 sidgoyal78 requested a review from nickyfantasy May 17, 2018 01:19
event_handler(begin_event)
if begin_event.fetch_metrics:
metrics = exe.run(feed=data,
metrics = exe.run(feed=data_feed_handler(data)
Copy link
Contributor Author

@sidgoyal78 sidgoyal78 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao : I think probably this way of handling data with data_feed_handler is incorrect right? We would need data from the reader, rather than the feeder.decorate_reader right?

Copy link
Contributor Author

@sidgoyal78 sidgoyal78 May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to the above question is yes.

So do you have any recommendation on how we could handle this data_feed_handler with the decorate_reader? I guess without the decorate_reader we can't to parallel fetching, but now with its presence, it is tricky to incorporate the data_feed_handler.

return [avg_cost, scale_infer]


def func_feed(feeding, place, data):
Copy link
Member

@jacquesqiao jacquesqiao May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can write a new reader above the default train reader? But not use a function handle, we can process the data in the new reader

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this approach. Let the Trainer handle the train process to keep it compact and simple. I also feel that way it will be easier for the user to understand the flow of the fluid programming.

Copy link
Contributor

@daming-lu daming-lu May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this and it should be done by the end of Wed (May 23)

https://github.com/daming-lu/Paddle/tree/recommend_sys

Sid is on label_semantics, and Nicky on machine_translation, which might also need a new reader.

@sidgoyal78
Copy link
Contributor Author

Closing this PR (see #10894).

@sidgoyal78 sidgoyal78 closed this May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants