This project demonstrates an end-to-end data engineering pipeline for analyzing Uber trip data. It leverages tools like Mage for ETL (Extract, Transform, Load), BigQuery for data warehousing, and Looker Studio for creating insightful dashboards. The project was inspired by Darshil's YouTube tutorial, and customized to fit specific analytical goals.
- Mage: For ETL pipeline development.
- BigQuery: To store and analyze large datasets efficiently.
- Looker Studio: For visualizing key metrics and trends.
- Python: For scripting and data manipulation.
- Google Cloud Platform (GCP): Hosting BigQuery and Looker Studio.
-
Data Ingestion:
- Imported Uber trip data into Mage pipelines for preprocessing.
-
Data Transformation:
- Cleaned and normalized raw data.
- Created dimensional tables (e.g.,
datetime_dim,rate_code_dim). - Built a
fact_tablefor analytics.
-
Data Export:
- Exported transformed data into BigQuery for efficient querying.
-
Visualization:
- Designed a dynamic dashboard using Looker Studio.
- Filters for Vendor, Payment Type, Rate Code, and Trip Distance.
- Summaries of total revenue, average trip distance, and fare amount.
- Interactive maps for pickup and drop-off locations.
- Extract: Load raw data into Mage.
- Transform: Apply transformations to generate dimension and fact tables.
- Load: Store processed data in BigQuery.
- Visualize: Use Looker Studio for creating interactive dashboards.
-
Mage Setup:
- Built pipelines to transform data into analytics-ready tables.
-
BigQuery Tables:
fact_table: Consolidated facts like revenue and trip distance.datetime_dim,rate_code_dim,pickup_location_dim, etc.: Dimension tables for analytics.
-
Dashboard Features:
- Summary cards for revenue, record count, and average trip distance.
- Interactive filters for customized insights.
- Geographic visualization of pickup/drop-off locations.
The final dashboard highlights:
- Revenue Metrics: Total revenue from trips.
- Trip Analysis: Average trip distance and fare amount.
- Geographic Trends: Pickup and drop-off hotspots visualized on a map.
Special thanks to Darshil for his tutorial that inspired this project.
-
Clone this repository:
git clone https://github.com/yourusername/uber-data-engineering.git
-
Install dependencies:
pip install -r requirements.txt
-
Configure
io_config.yamlfor your BigQuery and Mage setup. -
Run the Mage pipeline to process the data.
-
View the dashboard in Looker Studio by connecting to your BigQuery dataset.
- Add support for real-time data ingestion.
- Implement machine learning models for trip prediction or fraud detection.
- Enhance visualizations with more interactive features.
This project is open-source and available under the MIT License.