From the course: Data Engineering Foundations

Understanding database schema

- [Instructor] The next step is to understand how to design and develop databases. To do that, we need to learn about database schema. To reiterate, a database schema describes the structure and relationships between tables of a database. Here, you can see a database schema on the right hand side. This is a very simple example of a restaurant ordering system. We have one thing we call customer and the other called order. You have defined properties of each table, where ID in each table is the primary key which is the unique identifier of each instance of that particular table. The field called customer ID in the order table connects orders with customers. We call this kind of column or field a foreign key, as it refers to another table. Let's see how we can create such a schema in a database using SQL. If you want to get a SQL primary, you can refer to courses in our library that can help you master SQL. The SQL statements here create the tables of the defined schema that we saw earlier. The query on the left creates the customer tables with properties like ID, first name, last name, email, password and ID is the primary key. On the right, you see the SQL statements that creates the order table, where we have customer ID as a foreign key. We can leverage these foreign keys by joining tables using the joint statement as well. Now companies often build data warehouses to handle large volumes of data from various sources for analytical purposes. And multidimensional schema is especially designed to model such data warehouse systems. These schemas are designed to address the unique needs of very large databases that serve our analytical purposes. A schema you'll often see is the star schema. Lot of analytical databases like Redshift have optimizations for these kinds of schemas. According to Wikipedia, the star schema consists of one or more fact tables. For example, the orders table here. And then this fact table is referencing a number of dimension tables. For example customer, dimension restaurant, dimensions time, dimension dish. So dimension tables, hold the information on the world itself. Like customer names and distresses. Fact tables on the other hand contains the records that represent things that happen in the world. So designing a database schema is the first step in building a data pipeline. It often requires lots of planning because an ineffective schema can lead to a database that consumes a lot of memory and resources.

Contents