[High-Level-API] Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui…#526
Conversation
05.recommender_system/README.md
Outdated
| First, we must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc). | ||
| ## Model Configuration | ||
|
|
||
| Our program starts with importing necessary packages and initializes some global variables: |
There was a problem hiding this comment.
starts with importing necessary packages and initializing
05.recommender_system/README.md
Outdated
| ``` | ||
|
|
||
| Movie title, a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence. | ||
| Movie title, which is a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence. |
05.recommender_system/README.md
Outdated
|
|
||
|
|
||
| Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features. | ||
| Finally, we can define a `inference_program` that use cosine similarity to calculate the similarity between user characteristics and movie features. |
There was a problem hiding this comment.
an inference_program that uses
05.recommender_system/README.md
Outdated
|
|
||
| Before jumping into creating a training module, algorithm setting is also necessary. Here we specified Adam optimization algorithm via `paddle.optimizer`. | ||
| Next we define data feeders for test and train. The feeder reads a `BATCH_SIZE` of data each time and feed them to the training/testing process. | ||
| `paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input of `buf_size` is generated for training. |
There was a problem hiding this comment.
the sentence is not clear. Plus, buf_size is larger than BATCH_SIZE. I think the logic is reversed...
05.recommender_system/README.md
Outdated
| ### Create Trainer | ||
|
|
||
| `paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input is generated for training. | ||
| Create a trainer that takes `train_program` as input and specifies optimizer. |
05.recommender_system/README.md
Outdated
| if step % 100 == 0: # every 100 batches, update cost plot | ||
| cost_ploter.plot() | ||
| Use create_lod_tensor(data, lod, place) API to generate LoD Tensor, where `data` is a list of sequences of index numbers, `lod` is the level of detail (lod) info associated with `data`. | ||
| For example, data = [[10, 2, 3], [2, 3]] means that it contains two sequences of indexes, of length 3 and 2, respectively. |
05.recommender_system/README.md
Outdated
| Finally, we can invoke `trainer.train` to start training: | ||
| ### Infer | ||
|
|
||
| Now we can infer with inputs that matched with the yield records that we provide in `feed_order` during training. |
There was a problem hiding this comment.
This sentence is not clear. Maybe break it into two?
|
|
||
| We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model. This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies. Each rate is in the range of 1~5. Thanks to GroupLens Research for collecting, processing and publishing the dataset. | ||
|
|
||
| `paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset. |
There was a problem hiding this comment.
I told Nicki the same. He really has a keen sight 👀
| ``` | ||
|
|
||
| Finally, we can invoke `trainer.train` to start training: | ||
| ### Infer |
| }, | ||
| return_numpy=False) | ||
|
|
||
| print("infer results: ", np.array(results[0])) |
There was a problem hiding this comment.
Can we show a comparison between prediction and the real data? For example, user 23::M::35::0::90049 rated movie 2278::Ronin (1998)::Action|Crime|Thriller a 4.0 score. Our prediction is 3.458
There was a problem hiding this comment.
Good suggestion, i think it would be helpful
…d API
I will add plot in next PR