From the course: Build a No-Code ETL Pipeline with Google BigQuery
Unlock this course with a free trial
Join today to access over 24,500 courses taught by industry experts.
Process data, part two - BigQuery Tutorial
From the course: Build a No-Code ETL Pipeline with Google BigQuery
Process data, part two
- [Instructor] To summarize what we've done so far, we have created the analytics table with the correct schema to receive our data, and we have written and tested the query that will transform our data. The final step is to take this transformed data and move it into the analytics table, but there is a catch. We have seen that the way our data is versioned, every day, we receive all of the rows of the previous day, plus a few fresh rows. So for example, if on the 2nd of October, we get 1,000 rows, the next day, the 3rd of October, we will get 1,150 rows, where 1,000, we have seen already and 150 rows are new. Now if we were to just insert all the data from the staging table to the analytics table every day, we would insert a lot of duplicates. For example, at the end of the 3rd of October, we would have 2,150 rows, but 1,000 rows of these would be duplicate. So this is not what we want to do. What we want to do is that each day, we just insert the new rows into the analytics table…