Analyze data using pipe syntax
This tutorial shows you how to write queries using pipe syntax to analyze data.
Pipe syntax is an extension to GoogleSQL that supports a linear query
structure designed to make your queries easier to read, write, and maintain.
Pipe syntax consists of the pipe symbol |>
, a
pipe operator
name, and any arguments. For more information, see the following resources:
- For an introduction to pipe syntax, see Work with pipe query syntax.
- For full syntax details, see the Pipe query syntax reference documentation.
In this tutorial, you build a complex query in pipe syntax using the publicly
available
bigquery-public-data.austin_bikeshare.bikeshare_trips
table,
which contains data about bicycle trips.
Objectives
- View table data by starting a query with a
FROM
clause. - Add columns by using the
EXTEND
pipe operator. - Aggregate data by day and week by using the
AGGREGATE
pipe operator. - Aggregate data over a sliding window by using the
CROSS JOIN
pipe operator. - Filter data by using the
WHERE
pipe operator. - Compare the linear query structure of pipe syntax to the nested query structure of standard syntax when performing multi-level aggregations.
Before you begin
To get started using a BigQuery public dataset, you must create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you must also enable billing.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- BigQuery is automatically enabled in new projects.
To activate BigQuery in a preexisting project,
Enable the BigQuery API.
For more information about the different ways to run queries, see Run a query.
View table data
To retrieve all the data from the bikeshare_trips
table,
run the following query:
Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`;
Standard syntax
SELECT *
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`;
In pipe syntax, the query can start with a
FROM
clause
without a SELECT
clause to return table results.
The result is similar to the following:
+----------+-----------------+---------+-----------+-------------------------+-----+ | trip_id | subscriber_type | bike_id | bike_type | start_time | ... | +----------+-----------------+---------+-----------+-------------------------+-----+ | 28875008 | Pay-as-you-ride | 18181 | electric | 2023-02-12 12:46:32 UTC | ... | | 28735401 | Explorer | 214 | classic | 2023-01-13 12:01:45 UTC | ... | | 29381980 | Local365 | 21803 | electric | 2023-04-20 08:43:46 UTC | ... | | ... | ... | ... | ... | ... | ... | +----------+-----------------+---------+-----------+-------------------------+-----+
Add columns
In the bikeshare_trips
table, the start_time
column is a timestamp, but you
might want to add a column that only shows the date of the trip. To add a column
in pipe syntax, use the
EXTEND
pipe operator:
Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
|> EXTEND CAST(start_time AS DATE) AS date;
Standard syntax
SELECT *, CAST(start_time AS DATE) AS date
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`;
The result is similar to the following:
+----------+-----------------+---------+-----------+-------------------------+------------+-----+ | trip_id | subscriber_type | bike_id | bike_type | start_time | date | ... | +----------+-----------------+---------+-----------+-------------------------+------------+-----+ | 28875008 | Pay-as-you-ride | 18181 | electric | 2023-02-12 12:46:32 UTC | 2023-02-12 | ... | | 28735401 | Explorer | 214 | classic | 2023-01-13 12:01:45 UTC | 2023-01-13 | ... | | 29381980 | Local365 | 21803 | electric | 2023-04-20 08:43:46 UTC | 2023-04-20 | ... | | ... | ... | ... | ... | ... | ... | ... | +----------+-----------------+---------+-----------+-------------------------+------------+-----+
Aggregate daily data
You can group by date to find the total number of trips taken and the bikes used per day.
- Use the
AGGREGATE
pipe operator with theCOUNT
function to find the total number of trips taken and bikes used. Use the
GROUP BY
clause to group the results by date.
Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
|> EXTEND CAST(start_time AS DATE) AS date
|> AGGREGATE
COUNT(*) AS trips,
COUNT(DISTINCT bike_id) AS distinct_bikes
GROUP BY date;
Standard syntax
SELECT
CAST(start_time AS DATE) AS date,
COUNT(*) AS trips,
COUNT(DISTINCT bike_id) AS distinct_bikes
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY date;
The result is similar to the following:
+------------+-------+----------------+
| date | trips | distinct_bikes |
+------------+-------+----------------+
| 2023-04-20 | 841 | 197 |
| 2023-01-27 | 763 | 148 |
| 2023-06-12 | 562 | 202 |
| ... | ... | ... |
+------------+-------+----------------+
Order results
To sort the results in descending order by the date
column, add the
DESC
suffix to the GROUP BY
clause:
Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
|> EXTEND CAST(start_time AS DATE) AS date
|> AGGREGATE
COUNT(*) AS trips,
COUNT(DISTINCT bike_id) AS distinct_bikes
GROUP BY date DESC;
Standard syntax
SELECT
CAST(start_time AS DATE) AS date,
COUNT(*) AS trips,
COUNT(DISTINCT bike_id) AS distinct_bikes
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY date
ORDER BY date DESC;
The result is similar to the following:
+------------+-------+----------------+
| date | trips | distinct_bikes |
+------------+-------+----------------+
| 2024-06-30 | 331 | 90 |
| 2024-06-29 | 395 | 123 |
| 2024-06-28 | 437 | 137 |
| ... | ... | ... |
+------------+-------+----------------+
In pipe syntax, you can add the sorting suffix directly to the GROUP BY
clause
without using the
ORDER BY
pipe operator.
Adding the suffix to the GROUP BY
clause is one of several optional
shorthand ordering features with AGGREGATE
that pipe syntax supports. In standard syntax, this isn't possible and you
must use the ORDER BY
clause for sorting.
Aggregate weekly data
Now that you have data on the number of bikes used each day, you can build on your query to find the number of distinct bikes used over each seven-day window.
To update the rows in your table to display weeks instead of days, use the
DATE_TRUNC
function
in the GROUP BY
clause and set the granularity to WEEK
:
Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
|> EXTEND CAST(start_time AS DATE) AS date
|> AGGREGATE
COUNT(*) AS trips,
COUNT(DISTINCT bike_id) AS distinct_bikes,
GROUP BY DATE_TRUNC(date, WEEK) AS date DESC;
Standard syntax
SELECT
DATE_TRUNC(CAST(start_time AS DATE), WEEK) AS date,
COUNT(*) AS trips,
COUNT(DISTINCT bike_id) AS distinct_bikes,
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY date
ORDER BY date DESC;
The result is similar to the following:
+------------+-------+----------------+
| date | trips | distinct_bikes |
+------------+-------+----------------+
| 2024-06-30 | 331 | 90 |
| 2024-06-23 | 3206 | 213 |
| 2024-06-16 | 3441 | 212 |
| ... | ... | ... |
+------------+-------+----------------+
Aggregate over a sliding window
The results in the preceding section show trips in a fixed window between
start and end dates, such as 2024-06-23
through
2024-06-29
. Instead, you might want to see
trips in a sliding window, over a seven-day period that
moves forward in time with each new day. In other words, for any given date you
might want to know about the number of trips taken and bikes used over the
following week.
To apply a sliding window to your data, first copy each trip forward six
additional active days from its start date. Then, compute the dates of the
active days by using the DATE_ADD
function. Finally, aggregate the trips and
bike IDs for each active day.
To copy your data forward, use the
GENERATE_ARRAY
function and a cross join:Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` |> EXTEND CAST(start_time AS DATE) AS date |> CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days;
Standard syntax
SELECT *, CAST(start_time AS DATE) AS date FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days;
The
GENERATE_ARRAY
function creates an array with seven elements,0
to6
. TheCROSS JOIN UNNEST
operation creates seven copies of each row, with a newdiff_days
column that contains one of the array element values from0
to6
for each row. You can use thediff_days
values as the adjustment to the original date to slide the window forward by that many days, up to seven days past the original date.To see the calculated active dates for trips, use the
EXTEND
pipe operator with theDATE_ADD
function to create a column calledactive_date
that contains the start date plus the value in thediff_days
column:Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` |> EXTEND CAST(start_time AS DATE) AS date |> CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days |> EXTEND DATE_ADD(date, INTERVAL diff_days DAY) AS active_date;
Standard syntax
SELECT *, DATE_ADD(date, INTERVAL diff_days DAY) AS active_date FROM ( SELECT *, CAST(start_time AS DATE) AS date FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days)
For example, a trip that starts on
2024-05-20
is also considered active on each day through2024-05-26
.Finally, aggregate trips IDs and bike IDs and group by
active_date
:Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` |> EXTEND CAST(start_time AS DATE) AS date |> CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days |> EXTEND DATE_ADD(date, INTERVAL diff_days DAY) AS active_date |> AGGREGATE COUNT(DISTINCT bike_id) AS active_7d_bikes, COUNT(trip_id) AS active_7d_trips GROUP BY active_date DESC;
Standard syntax
SELECT DATE_ADD(date, INTERVAL diff_days DAY) AS active_date, COUNT(DISTINCT bike_id) AS active_7d_bikes, COUNT(trip_id) AS active_7d_trips FROM ( SELECT *, CAST(start_time AS DATE) AS date FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days) GROUP BY active_date ORDER BY active_date DESC;
The result is similar to the following:
+-------------+-----------------+-----------------+ | active_date | active_7d_bikes | active_7d_trips | +-------------+-----------------+-----------------+ | 2024-07-06 | 90 | 331 | | 2024-07-05 | 142 | 726 | | 2024-07-04 | 186 | 1163 | | ... | ... | ... | +-------------+-----------------+-----------------+
Filter future dates
In the preceding query, the dates extend into the future up to six days beyond the last date in your data. To filter out dates that extend beyond the end of your data, set a maximum date in your query:
- Add another
EXTEND
pipe operator that uses a window function with anOVER
clause to compute the maximum date in the table. - Use the
WHERE
pipe operator to filter out the generated rows that are past the maximum date.
Pipe syntax
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
|> EXTEND CAST(start_time AS DATE) AS date
|> EXTEND MAX(date) OVER () AS max_date
|> CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days
|> EXTEND DATE_ADD(date, INTERVAL diff_days DAY) AS active_date
|> WHERE active_date <= max_date
|> AGGREGATE COUNT(DISTINCT bike_id) AS active_7d_bikes,
COUNT(trip_id) AS active_7d_trips
GROUP BY active_date DESC;
Standard syntax
SELECT
DATE_ADD(date, INTERVAL diff_days DAY) AS active_date,
COUNT(DISTINCT bike_id) AS active_7d_bikes,
COUNT(trip_id) AS active_7d_trips
FROM(
SELECT *
FROM (
SELECT *,
DATE_ADD(date, INTERVAL diff_days DAY) AS active_date,
MAX(date) OVER () AS max_date
FROM(
SELECT *, CAST(start_time AS DATE) AS date,
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips`
CROSS JOIN UNNEST(GENERATE_ARRAY(0, 6)) AS diff_days))
WHERE active_date <= max_date)
GROUP BY active_date
ORDER BY active_date DESC;
The result is similar to the following:
+-------------+-----------------+-----------------+
| active_date | active_7d_bikes | active_7d_trips |
+-------------+-----------------+-----------------+
| 2024-06-30 | 212 | 3031 |
| 2024-06-29 | 213 | 3206 |
| 2024-06-28 | 219 | 3476 |
| ... | ... | ... |
+-------------+-----------------+-----------------+
What's next
- For more information about how pipe syntax works, see Work with pipe query syntax.
- For more technical information, see the Pipe query syntax reference documentation.