From the course: Advanced SQL for Data Science: Time Series

Characteristics of time series data - SQL Tutorial

From the course: Advanced SQL for Data Science: Time Series

Characteristics of time series data

- [Instructor] Let's begin by looking at the structure of time series data. Time series data is a sequence of data points. Now, each of these data points includes a timestamp. Timestamps usually usually include a date and time, and sometimes they go down into the milliseconds or even microseconds. Now, in the time series data that we're going to work with, our data will be generated at regular intervals, and each of these data points will have one or more measurements. Now, when we talk about intervals, what we're really talking about is frequency. How often is a data point sent to us? And that can vary by application to application. So if for example, you're measuring CPU utilization, you might be measuring in terms of seconds, even milliseconds, but if you're measuring something like births and deaths in the human population, then probably measuring at an annual frequency would be enough. Many different types of time series data use different intervals or different frequencies. Now, another thing we want to look at is the unit of measure. Now in time series data, we have measurements that are numbers, but what do those numbers represent? That's the kind of thing we need to know when we're working with data with time series data because the unit of measure is typically not included with the data. Now, this is done since it's always the same, so it would be redundant to carry the unit of measure along with the data point. Now, some common units of measure are percentages. For example, if you're looking at CPU utilization or free memory, you might be measuring in percentages. If you're looking at, say, the number of units produced or the number of customers served in a restaurant, our unit of measure is a count. Oftentimes, we have to deal with financial data as well. Now, in the case of working with financial data, like a company profit, we'll typically use some kind of monetary unit, like dollars or euros. But again, the unit of measure that we deal with will typically vary by application. Now, we also want to look at different metric types or different types of measurements. A common one is a counter. Counters monotonically increase. So for example, if we were counting the number of cars that pass through a toll booth, that number will continuously increase. It will never go down. Another common metric is called a gauge, and a gauge is a numerical measure that can go up or down. So for example, the temperature of a room is a gauge. Another type of measure is a summary. Summaries calculate values over some period of time or some time window, and these could be counts, or they could be rates. Less often but sometimes, we do also see histograms, and histograms are used to count items over buckets. Now, an important thing to note is that there may be different types of timestamps that we have to track in our time series data. There is event time. Now, event time is the time the actual event occurred or the data was being generated. Now ingestion time is the time at which a data point is collected into a storage system or a database, and then processing time refers to the time period that the data is being analyzed prior to it being available for use. We'll focus primarily on event time, but these are other types of timestamps convey important information about acquiring and processing time series data. Now in this course, we're going to work with time series in relational databases. So we'll be querying the data with SQL or SQL. While we will often use relational databases in this course, other types of data storage systems often using cloud-based object storage are becoming more widely used for large-scale time series data analysis. These are particularly useful when ingesting terabytes of data on a daily basis. Now, in this course, we'll restrict our attention to relational data stores, but just be aware, if you scale up to very large-scale time series applications, you'll probably start to want to look at an alternative method of storing and analyzing that data, typically, again, based on cloud-based object storage.

Contents