[High-Level-API] Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui… by nickyfantasy · Pull Request #526 · PaddlePaddle/book

nickyfantasy · 2018-05-31T03:38:45Z

…d API

I will add plot in next PR

…d API

daming-lu · 2018-05-31T03:41:29Z

05.recommender_system/README.md

-First, we must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
+## Model Configuration

+Our program starts with importing necessary packages and initializes some global variables:


starts with importing necessary packages and initializing

daming-lu · 2018-05-31T03:49:54Z

05.recommender_system/README.md

 ```

-Movie title, a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.
+Movie title, which is a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.


will be fed

daming-lu · 2018-05-31T03:50:22Z

05.recommender_system/README.md

+

-Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features.
+Finally, we can define a `inference_program` that use cosine similarity to calculate the similarity between user characteristics and movie features.


an inference_program that uses

daming-lu · 2018-05-31T03:52:37Z

05.recommender_system/README.md


-Before jumping into creating a training module, algorithm setting is also necessary. Here we specified Adam optimization algorithm via `paddle.optimizer`.
+Next we define data feeders for test and train. The feeder reads a `BATCH_SIZE` of data each time and feed them to the training/testing process. 
+`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input of `buf_size` is generated for training.


the sentence is not clear. Plus, buf_size is larger than BATCH_SIZE. I think the logic is reversed...

daming-lu · 2018-05-31T03:53:07Z

05.recommender_system/README.md

+### Create Trainer

-`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input is generated for training.
+Create a trainer that takes `train_program` as input and specifies optimizer.


Create ... and specify

daming-lu · 2018-05-31T03:54:47Z

05.recommender_system/README.md

-        if step % 100 == 0: # every 100 batches, update cost plot
-            cost_ploter.plot()
+Use create_lod_tensor(data, lod, place) API to generate LoD Tensor, where `data` is a list of sequences of index numbers, `lod` is the level of detail (lod) info associated with `data`.
+For example, data = [[10, 2, 3], [2, 3]] means that it contains two sequences of indexes, of length 3 and 2, respectively.


indexes => indices

daming-lu · 2018-05-31T03:55:19Z

05.recommender_system/README.md

-Finally, we can invoke `trainer.train` to start training:
+### Infer
+
+Now we can infer with inputs that matched with the yield records that we provide in `feed_order` during training.


matched => match

This sentence is not clear. Maybe break it into two?

jetfuel · 2018-05-31T18:28:53Z

05.recommender_system/README.md


 We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.

-`paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.


good catch!

I told Nicki the same. He really has a keen sight 👀

jetfuel · 2018-05-31T18:44:43Z

05.recommender_system/README.md

 ```

-Finally, we can invoke `trainer.train` to start training:
+### Infer


daming-lu · 2018-06-05T20:38:44Z

05.recommender_system/train.py

+        },
+        return_numpy=False)
+
+    print("infer results: ", np.array(results[0]))


Can we show a comparison between prediction and the real data? For example, user 23::M::35::0::90049 rated movie 2278::Ronin (1998)::Action|Crime|Thriller a 4.0 score. Our prediction is 3.458

Good suggestion, i think it would be helpful

Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui…

ed43609

…d API

daming-lu reviewed May 31, 2018

View reviewed changes

nickyfantasy requested review from jetfuel and sidgoyal78 May 31, 2018 17:57

jetfuel reviewed May 31, 2018

View reviewed changes

jetfuel changed the title ~~Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui…~~ Jun 5, 2018

nickyfantasy added 3 commits June 5, 2018 13:24

Add training program

022596a

Update with Daming's comments

096eb72

update index.html and pre-commit fix format

eae2556

daming-lu approved these changes Jun 5, 2018

View reviewed changes

nickyfantasy merged commit e82c7d4 into PaddlePaddle:high-level-api-branch Jun 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[High-Level-API] Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui…#526

[High-Level-API] Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui…#526
nickyfantasy merged 4 commits intoPaddlePaddle:high-level-api-branchfrom
nickyfantasy:recommend_system_book_rewrite

nickyfantasy commented May 31, 2018 •

edited

Loading

daming-lu May 31, 2018

daming-lu May 31, 2018

daming-lu May 31, 2018

daming-lu May 31, 2018

daming-lu May 31, 2018

daming-lu May 31, 2018

daming-lu May 31, 2018

daming-lu May 31, 2018

jetfuel May 31, 2018

daming-lu May 31, 2018

jetfuel May 31, 2018

daming-lu Jun 5, 2018

nickyfantasy Jun 5, 2018

sidgoyal78 Jun 5, 2018 •

edited

Loading

Labels

4 participants



		Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features.
		Finally, we can define a `inference_program` that use cosine similarity to calculate the similarity between user characteristics and movie features.


		We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model. This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies. Each rate is in the range of 1~5. Thanks to GroupLens Research for collecting, processing and publishing the dataset.

		`paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.

Conversation

nickyfantasy commented May 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidgoyal78 Jun 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Labels

4 participants

nickyfantasy commented May 31, 2018 •

edited

Loading

sidgoyal78 Jun 5, 2018 •

edited

Loading