Metadata indexing for BigQuery tables
This document describes column metadata indexing in BigQuery and explains how to allocate dedicated resources to improve index freshness and query performance.
BigQuery automatically indexes metadata for BigQuery tables exceeding 1 GiB. This metadata includes file location, partitioning information, and column-level attributes, which BigQuery uses to optimize and accelerate your queries.
By default, metadata indexing in BigQuery is a free background
operation and requires no action on your part. However, index freshness
depends on available free resources and doesn't have
performance service level objectives (SLOs). If index freshness is critical for
your use case, we recommend configuring a
BACKGROUND
reservation.
View the metadata index refresh time
To see the last metadata index refresh time of a table, query the
LAST_METADATA_INDEX_REFRESH_TIME
column of the
INFORMATION_SCHEMA.TABLE_STORAGE
view.
To do so, do the following:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
SELECT project_id, project_number, table_name, last_metadata_index_refresh_time FROM [
PROJECT_ID
.]region-REGION
.INFORMATION_SCHEMA.TABLE_STORAGE;Replace the following:
PROJECT_ID
: the ID of your Google Cloud project. If not specified, the default project is used.REGION
: the region where the project is located—for example,region-us
.
Click
Run.
View column metadata index usage
To view whether the column metadata index was used after a job completes, check
the
TableMetadataCacheUsage
property
of the Job resource. If the
unusedReason
field is empty (not populated), the column
metadata index was used. If it is populated, the accompanying explanation
field provides a reason why the column metadata index wasn't used.
You can also view column metadata index usage with the
metadata_cache_statistics
field in the
INFORMATION_SCHEMA.JOBS
view.
For example, the following displays column metadata index usage for the my-job
job:
SELECT metadata_cache_statistics FROM `region-US`.INFORMATION_SCHEMA.JOBS WHERE job_id = 'my-job';
As another example, the following displays the number of jobs that used column
metadata index for the my-table
table:
SELECT COUNT(*) FROM `region-US`.INFORMATION_SCHEMA.JOBS, UNNEST(metadata_cache_statistics.table_metadata_cache_usage) AS stats WHERE stats.table_reference.table_id='my-table' AND stats.table_reference.dataset_id='my-dataset' AND stats.table_reference.project_id='my-project' AND stats.unusedReason IS NULL;
Set up dedicated indexing resources
To set up resources for metadata indexing updates in your project, you first need to have a reservation assigned to your project. To do so, do the following:
After setting up your reservation, select one of the following methods to assign slots to your metadata indexing job. By default, slots that you allocate in this manner are shared with other jobs if the slots are idle. For more information, see Idle slots.
Console
In the Google Cloud console, go to the Capacity Management page.
Click > Create assignment.
Reservation actionsSelect your reservation project.
Set Job Type to Background.
Click Create.
bq
Use the
bq mk
command.
bq mk \ --project_id=ADMIN_PROJECT_ID \ --location=LOCATION \ --reservation_assignment \ --reservation_id=RESERVATION_NAME \ --assignee_id=PROJECT_ID \ --job_type=BACKGROUND \ --assignee_type=PROJECT
Replace the following:
ADMIN_PROJECT_ID
: the project ID of the administration project that owns the reservation resource.LOCATION
: the location of the reservation.RESERVATION_NAME
: the name of the reservation.PROJECT_ID
: the project ID to assign to this reservation.
SQL
To assign a reservation to a project, use the
CREATE ASSIGNMENT
DDL statement.
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
Replace the following:CREATE ASSIGNMENT ADMIN_PROJECT_ID.region-LOCATION.RESERVATION_NAME.ASSIGNMENT_ID OPTIONS ( assignee = 'projects/PROJECT_ID', job_type = 'BACKGROUND');
ADMIN_PROJECT_ID
: the project ID of the administration project that owns the reservation resource.LOCATION
: the location of the reservation.RESERVATION_NAME
: the name of the reservation.ASSIGNMENT_ID
: the ID of the assignment. The ID must be unique to the project and location, start and end with a lowercase letter or a number, and contain only lowercase letters, numbers, and dashes.PROJECT_ID
: the project ID containing the tables. This project is assigned to the reservation.
Click
Run.
View indexing job information
After you set up your dedicated indexing jobs, you can view information
about the indexing jobs with the
JOBS
view.
The following SQL sample shows the five most recent refresh jobs in
PROJECT_NAME.
SELECT * FROM region-us.INFORMATION_SCHEMA.JOBS WHERE project_id = 'PROJECT_NAME' AND SEARCH(job_id, '`metadata_cache_refresh`') ORDER BY creation_time DESC LIMIT 5;
Replace PROJECT_NAME
with the name of the project
containing your metadata indexing jobs.
Configure metadata indexing alerts
The Cloud Monitoring alerting process notifies you when your BigQuery performance doesn't meet defined criteria. For more information, see Alerting overview. With metadata indexing, you can configure alerts for slot usage and staleness.
Slot usage alert
This alert notifies you when your background reservation exceeds a defined percentage of its allocation. The default value is 95%. You can configure this alert for a specific reservation or for every background reservation. When this alert triggers, we recommend that you increase your reservation size.
To configure this alert for every background reservation, do the following:
- Set up a Monitoring notification channel if you haven't already.
Go to the Integrations page.
Find the BigQuery integration and click View details.
In the Alerts tab, select Slot Usage - Background Metadata Cache Slot Usage Too High.
Optional: To customize this alert further, click Show options > Customize alert policy.
For Configure notifications, select your notification channel.
Click Create.
Staleness alert
This alert notifies you when the average column metadata index staleness increases too much compared to the existing average. The default threshold is if the average over 4 hours is more than double the previous average for more than 30 minutes. When this alert triggers, we recommend that you increase your reservation size or create a background reservation if you don't have one.
To configure this alert, do the following:
- Set up a Monitoring notification channel if you haven't already.
Go to the Integrations page.
Find the BigQuery integration and click View details.
In the Alerts tab, select Column Metadata Index Staleness - Too Much Percent Increase.
Optional: To customize this alert further, click Show options > Customize alert policy.
For Configure notifications, select your notification channel.
Click Create.
Limitations
Metadata query performance enhancements only apply to SELECT
, INSERT
, and
CREATE TABLE AS SELECT
statements. Data manipulation language (DML) statements
won't see improvements from metadata indexing.
What's next
- Learn how to see all jobs in your project with the
JOBS
view. - Learn how to view slot capacity and utilization.