Add recommendation system implementation with new API#10535

Closed

sidgoyal78 wants to merge 8 commits intoPaddlePaddle:developfrom

sidgoyal78:new_api_recsys

Contributor

sidgoyal78 commented May 9, 2018 •

edited

Loading

In this chapter, the way input data is fed is a bit different than other chapters, hence I had to create a data_feed_handler attribute in trainer.train()

EDIT: The code is modified based on the recent update to trainer API (#10674). As above, we add a data_feed_handler attribute to train().


          Add recommendation system implementation with new API

bc6edc5

sidgoyal78 requested review from JiayiFeng, helinwang, jacquesqiao, jetfuel, reyoung, varunarora and wangkuiyi

May 9, 2018 19:10

jetfuel reviewed

View reviewed changes

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py

+                  mov_combined_features = get_mov_combined_features()
+                  inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features)
+                  scale_infer = layers.scale(x=inference, scale=5.0)

Contributor

jetfuel May 9, 2018

In the new example, there is a new pattern like this. Should we follow that pattern here?

def inference_program:
   ...
   return prediction

def train_program:
    prediction = inference_program
    ...
    return avg_cost

Contributor Author

sidgoyal78 May 10, 2018

Yeah, good point. Modified the code.

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py Outdated

		return mov_combined_features


		def train_network():

Contributor

jetfuel May 9, 2018

Let's change it to train_program

Contributor Author

sidgoyal78 May 10, 2018

done

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py Outdated

		return avg_cost, scale_infer


		def inference_network():

Contributor

jetfuel May 9, 2018

Let's change this to inference_program

Contributor Author

sidgoyal78 May 10, 2018

Done

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py Outdated

+                      paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)
+                  def event_handler(event):
+                      if isinstance(event, fluid.EndIteration):

Contributor

jetfuel May 9, 2018

This is now rename to fluid.EndEpochEvent

Contributor Author

sidgoyal78 May 10, 2018

Done

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py Outdated

+                  def event_handler(event):
+                      if isinstance(event, fluid.EndIteration):
+                          if (event.batch_id % 10) == 0:

Contributor

jetfuel May 9, 2018

this is now re-name to event.epoch

Contributor Author

sidgoyal78 May 10, 2018

Done

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py Outdated

+                      train_reader,
+                      EPOCH_NUM,
+                      event_handler=event_handler,
+                      data_feed_handler=partial(func_feed, feeding_map))

Contributor

jetfuel May 9, 2018

Trainer takes in reader and feed_order. I am not sure if it will support data_feed_handler later.

Contributor Author

sidgoyal78 May 9, 2018

I think you might have missed what i have written in the description of the PR. I basically wanted to point out that the way data is fed (https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_recommender_system.py#L231) is a bit different here as compared to other book chapters, hence I think there is a need to invoke the concept of a data_feed_handler. What do you think?

Contributor

jetfuel May 9, 2018

Ah, sorry about missing the PR description. I think we need to discuss with ReYung and JiaYi. So the train function can support it.

python/paddle/fluid/tests/book/recommender_system/no_test_recommender_system.py Outdated

+                          if (event.batch_id % 10) == 0:
+                              avg_cost = trainer.test(reader=test_reader)
+                              print('BatchID {0:04}, Loss {1:2.2}'.format(event.batch_id + 1,

Contributor

jetfuel May 9, 2018

change to event.epoch

Contributor Author

sidgoyal78 May 10, 2018

Done

sidgoyal78 added 4 commits

May 10, 2018 19:01


          Address review comments

fce6034


          Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

24aaf98

… new_api_recsys


          Modify as per new API

eec0b18


          Modify train and test functions to enable data_feed_handler

a14423a

sidgoyal78 commented

View reviewed changes

Contributor Author

sidgoyal78 left a comment

@jacquesqiao : Can you please take a look? I modified the train() and test() methods to allow for a data_feed_handler attribute.

sidgoyal78 requested a review from nickyfantasy

May 17, 2018 01:19


          Rename script to avoid same names for Cmake

cd788c6

sidgoyal78 mentioned this pull request

[Test-driven] Recommender System: start testing recommender system #10650

Closed

nickyfantasy mentioned this pull request

Add machine translation book with High Level Trainer API #10763

Closed


          Resolve merge conflict

d57afb6

sidgoyal78 commented

View reviewed changes

python/paddle/fluid/trainer.py

                               event_handler(begin_event)
                               if begin_event.fetch_metrics:
-                                  metrics = exe.run(feed=data,
+                                  metrics = exe.run(feed=data_feed_handler(data)

Contributor Author

sidgoyal78 May 18, 2018 •

edited

Loading

@jacquesqiao : I think probably this way of handling data with data_feed_handler is incorrect right? We would need data from the reader, rather than the feeder.decorate_reader right?

Contributor Author

sidgoyal78 May 18, 2018 •

edited

Loading

I think the answer to the above question is yes.

So do you have any recommendation on how we could handle this data_feed_handler with the decorate_reader? I guess without the decorate_reader we can't to parallel fetching, but now with its presence, it is tricky to incorporate the data_feed_handler.


          Fix issues with data_feed_handler

cca4a55

jacquesqiao reviewed

View reviewed changes

.../paddle/fluid/tests/book/high-level-api/recommender_system/test_recommender_system_newapi.py

		return [avg_cost, scale_infer]


		def func_feed(feeding, place, data):

Member

jacquesqiao May 22, 2018 •

edited

Loading

Maybe we can write a new reader above the default train reader? But not use a function handle, we can process the data in the new reader

Contributor

jetfuel May 22, 2018

I support this approach. Let the Trainer handle the train process to keep it compact and simple. I also feel that way it will be easier for the user to understand the flow of the fluid programming.

Contributor

daming-lu May 23, 2018 •

edited

Loading

I am working on this and it should be done by the end of Wed (May 23)

https://github.com/daming-lu/Paddle/tree/recommend_sys

Sid is on label_semantics, and Nicky on machine_translation, which might also need a new reader.

Contributor Author

sidgoyal78 commented May 24, 2018

Closing this PR (see #10894).

sidgoyal78 closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment