From the course: Build a No-Code ETL Pipeline with Google BigQuery
Create analytics table - BigQuery Tutorial
From the course: Build a No-Code ETL Pipeline with Google BigQuery
Create analytics table
- [Instructor] We are finally ready to process our data in BigQuery. And here's how we're going to do it. So we've already done this part where our data enters daily in Google Cloud storage, and then we use transfer service to load it into this table, which we can call staging table. And the only function of this table is to receive the raw data so that our data is loaded into BigQuery. This table really doesn't need to do anything else, but of course, once our data is in BigQuery, we want to process it and analyze it. And this is the goal of the next step. The way we're going to do this is that we're going to run a SQL transformation on top of the raw data that is in the staging table. And the result of this transformation is going to be stored into a new table, which we can call the analytics table. The goal of this table will be to store our processed data in order to support analysis and visualization. So every day, we will go through this whole process. The data appears in Google Cloud storage, we use the transfer service to load it into the staging table, and then we run SQL on top of the staging table and the results are fed into the analytics table. So let's see how we can implement the next step of this pipeline. Now let us see how to create our analytics table. I'm looking at the stock data table here, which we created previously in order to ingest our raw data. In other words, this is our staging table. And you will remember that every field in this table is of type string, meaning that every field expects text. The reason we did this is not because text is the most appropriate type for each field, but because text is the easiest type to ingest, as we wanted our raw data ingestion to be really straightforward. But now, when creating our analytics table, we want to choose the correct type for each field. For example, the fields open, high, and low represent prices. So instead of text, we should have some sort of numeric type here so that we can run numeric calculations on top of them. Now this page from the BigQuery documentation shows us which data types we have at our disposal. So I can come here and pick the correct data type for my data. And if I go here on the right, I will find a section for numeric data types. And there are several choices here, but the correct one for our prices is numeric alias decimal. And if I scroll down, I can see that these are numeric values with fixed decimal precision. This means that these values can represent decimal fractions exactly, which make them suitable for financial calculation. So this is the best field to represent our stock prices. Now back in BigQuery, I have written the command that will create our analytics table. And you can see that this command will create the table, unless it already exists in the same dataset, Kaggle stocks, and the table will be called stock data DWH. Now, DWH stands for data warehouse. And this is a common term to denote tables which contain processed data. And here, I have to defined the same fields that we have in our staging table, except that now, each field is provided with the appropriate data type. So date is given the timestamp type, which represents a specific moment in time. And if you are wondering why date is enclosed within these back tick quotes, this is because the word date is reserved in BigQuery. It is there to indicate a command that exists in the BigQuery code. So to be explicit that we don't want to use that command, but we want to provide the name of the field. It is good practice for us to use these quotes. Fields open, high, low, and close, they represent prices. So we use the numeric data type to represent these prices with high precision. The volume field represents the number of stocks at a given moment, and this is an integer, a whole number. So we can use the INT64 data type for that. For the dividends, we also use numeric. Stock splits is also a whole number because it counts the number of times that the stock experienced a split. And finally, brand name, ticker, industry tag, country are all text, so they keep the string data type. And capital gains also get the numeric data type. So if you run this command, you will create the analytics table, which will be ready to get your data.