From the course: Build with AI: Data Pipelines with Cursor, Neon, and Streamlit
Overview: What you will build
From the course: Build with AI: Data Pipelines with Cursor, Neon, and Streamlit
Overview: What you will build
- [Instructor] For this course project, we will work with the Cursor AI assistant to build a powerful data pipeline that will extract data and generate meaningful insights, so let's get a quick overview of what that looks like. What we are going to do is build a dashboard that tracks data about academic papers in AI. It will allow us to monitor the production of scientific knowledge on AI, and this is a quick preview of what that dashboard might look like. It will show key metrics for our data, such as publication trends showing how many AI papers are published each day, and this dashboard will be hosted online for everyone to see, and you will be able to use it as a portfolio project. To build our dashboard, we are going to leverage the OpenAlex API. This is a source of data on millions of academic papers that is updated every day. It is powerful, it is free, and it doesn't require any signup or authentication. Now, let's get a quick overview of how our data pipeline will be structured. It will work in the following steps. First, we're going to use the OpenAlex API to extract fresh data on academic papers. Next, we are going to store these papers in a cloud database, where the data will be clean and organized and always available for our dashboard, and we're also going to run data tests on our database to make sure that the data looks good and there are no errors or inconsistencies. Then comes the visualization dashboard. We're going to use the Python package Streamlit to build a live data dashboard that displays insights about our data. The dashboard is going to connect to our cloud database to get fresh data every time it's needed, and also, this dashboard will be deployed online in the cloud for anyone to access. Now, here are a few insights about how we will build our data pipeline and the best practices that we will use. First of all, we will use Git and GitHub for code versioning and code management, meaning that every change that we do to the code will be saved, checkpointed, documented, and then pushed in the cloud. Moreover, we will use a Python virtual environment to manage the dependencies, meaning the libraries on which our project depends. They will be fully isolated within our system and will not interfere with other libraries that are installed in our system. We will learn how to implement secure secrets management, such as the confidential password to our database. We will store and use this password without leaking it online and without showing it to the AI agent. And finally, and maybe most importantly, we will work with the powerful Cursor AI agent to get code assistance and develop our codebase agentically. So, I hope you're ready to build your first data pipeline with AI.