From the course: RAG Fine-Tuning: Advanced Techniques for Accuracy and Model Performance

Unlock this course with a free trial

Join today to access over 24,800 courses taught by industry experts.

Generate and save dataset

Generate and save dataset

- [Instructor] Our data is ready, but we need to change its format so we can use it for training our model. We'll transform our data containing question, answer, golden documents and distracted documents into a format that is ready for model training. We'll create both a CSV format for easy analysis and a JSON Lines format, which is required for fine tuning the model. Let's first look at creating a structured data frame format. Here we are converting our triplets data. Triplets basically means we have question, answer, and documents. Documents then further contain golden documents and distractor documents. So we are converting it into a pandas data frame by first creating an empty list called data. For each triplet, we structure it into a dictionary with question string, answer string, golden document string, and lastly, concatenated distractor document string. We converted into a pandas data frame, and if you want to see the first five records in this data frame, you can use the dot…

Contents