This project provides a straightforward way to analyze trip transactions using a data engineering pipeline that follows the Medallion Architecture (Bronze-Silver-Gold). We use Azure Data Factory, Databricks, and Delta Lake to create an automated ETL process, offering real-time monitoring and email notifications through Logic Apps.
No programming knowledge is needed. Just follow the steps below to set it up.
Before you begin, ensure your system meets these requirements:
- Windows or MacOS
- Internet connection
- Azure account (You can create a free account here)
To get started, visit the releases page to download the project.
- Go to the link above.
- Look for the latest release.
- Click on the file titled
AzureDataFactoryDatabricks.zip. - Download the file and save it to your computer.
- Once the download is complete, extract the contents of the zip file to a folder of your choice.
After downloading the files, you need to set up the project in your Azure environment.
-
Sign in to Azure. Open your browser and go to the Azure portal. Log in using your Azure account credentials.
-
Create resources:
- Data Factory: In the Azure portal, search for "Data Factory" and create a new instance.
- Databricks: Search for "Databricks" and set up a new workspace.
- SQL Database: Search for "SQL Database" and create one to store your data.
- Storage: Ensure you set up an Azure Data Lake Storage Gen2 account for data storage.
-
Configure ETL Process:
- In the files you extracted, find
Instructions.txt. This file contains detailed steps to configure the data factory and connect it to Databricks. - Follow the steps carefully to set up the necessary pipelines and datasets.
- In the files you extracted, find
This project includes the following features:
- Automated ETL: The project automates the data extraction, transformation, and loading processes.
- Real-Time Monitoring: Get notifications for data pipeline status and errors through Azure Logic Apps.
- Medallion Architecture: Organizes data into different layers (Bronze, Silver, Gold) for better management and reporting.
- Analytics Ready: Designed for business intelligence applications to analyze trip transaction data.
If you encounter issues during setup or use, check the following:
- Azure Permissions: Ensure you have the right permissions to create resources in your Azure account.
- Network Issues: Check your internet connection.
- Follow Instructions: Review
Instructions.txtfor any missed steps.
For further assistance, consider visiting the Azure support page or referring to community forums.
Here are some helpful resources to help you understand the project better:
If you have questions or need help, feel free to contact us through the project's GitHub page. You can open an issue for support or suggestions.
We welcome contributions! If you would like to contribute to this project, please follow the guidelines in the project repository.
This project is licensed under the MIT License. Check the LICENSE file for details.
Feel free to get started with this project to streamline your data analysis process. For any questions or issues, please open an issue or check the provided resources. Enjoy exploring the world of data!