From the course: Learning BigQuery
BigQuery services and compute process - BigQuery Tutorial
From the course: Learning BigQuery
BigQuery services and compute process
- [Instructor] BigQuery is a very powerful tool for big data analysis, and we will now take a closer look at its architecture which enables its high scalability and performance. First, though, we need to understand that BigQuery should not be looked at as an isolated offering but as a very important piece in a larger picture, given that the Google Cloud offers various services for data analysis and business intelligence in general. BigQuery is just a crucial piece in this ecosystem, and significantly, it very easily integrates with some of the other services. What exactly are these services? Well, here is a quick glance. The services have been broken up into four different categories, which include data ingestion, data storage, the processing and analysis of data, and also data exploration and visualization, all of which are important phases in business intelligence. You'll note that BigQuery features in each of these stages, which includes supporting services such as the Data Transfer Service and the BI Engine. With that said, let's zoom in on some of the services which come under the BigQuery umbrella on the Google Cloud. This includes BigQuery, which is the focus of this course, and this is mainly concerned with the storage and query of data. In order to transfer data into BigQuery, there is the Data Transfer Service. To improve the performance of query runs, we can make use of the BI Engine, which enables analysis operations to be performed entirely in memory. If your organization has data spanning multiple cloud platforms and you'd like to analyze those with BigQuery, Omni is the offering you should look into. And then, in order to build and then consume machine learning models using SQL queries, we can make use of BigQuery ML. So when thinking of BigQuery, we should consider it as a suite of services. And now, we can move on to the BigQuery architecture. There are essentially four different components in here. So let's pay attention to them one at a time, starting with the Colossus storage layer. This is a distributed file system where all of the underlying data is stored. The next component here is Jupiter, which represents the networking required for BigQuery, and, among other things, disconnects the Colossus storage layer to Dremel, which handles the computing required for query executions. In short, many pieces need to come together in order to enable efficient as well as cost effective analysis, and all of this is orchestrated by Borg, which is GCPs cluster manager. As a user, though, you will not see any of this. All of this is managed by the BigQuery service. And this is because BigQuery adopts the serverless architecture. This is implemented by decoupling storage as well as compute. Just to remind ourselves, Colossus is a distributed file system where all of the data will reside, Borg is the cluster management system which handles all of the compute resources, and Jupiter serves as the networking infrastructure connecting all of the pieces. This decoupling means that each component is capable of scaling independently. So if CPUs are our bottleneck, we can add more compute nodes without having to increase the storage. Compute operations are optimized by means of Dremel, which serves as the query engine for BigQuery. Among other things, it'll break down a query into smaller pieces which can be executed concurrently. It'll then also need to assemble the results from each of the individual pieces in order to present unified query results, and this is accomplished by effectively turning a query run into an execution tree. Again, all of this is orchestrated by Borg, which is the cluster manager. And with that, we have a fairly clear idea of the different pieces which come together to make BigQuery as scalable and as efficient as it is.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.