From the course: Build with AI: Data Pipelines with Cursor, Neon, and Streamlit
OpenAlex quick start: Analyze data for Python data pipelines
From the course: Build with AI: Data Pipelines with Cursor, Neon, and Streamlit
OpenAlex quick start: Analyze data for Python data pipelines
- [Instructor] When you are building a pipeline, it's likely that the data you want to access will change all the time. Stock prices update, new research papers appear, weather patterns shift. APIs let you capture these changes automatically without manual downloads, or CSV imports, or copy-paste. You just write code once and then get fresh data whenever you need it. This is why you will often use APIs in your data pipelines. And for our pipeline, we will be accessing the OpenAlex API. This is a free and open-source index of scholarly works. And authors, institutions, and topics, and publishers, all of it is accessible through a simple API. And the best part is that this API is free. It doesn't require any signup or authentication and it has a very generous limit of API calls. Each user gets 100,000 requests per day, which is more than enough for our needs. And for our data pipeline, we will be accessing the OpenAlex API to keep track of academic developments in artificial intelligence. Now, in our pipeline, we could write code that accesses the API directly by constructing and running HTTP requests. But instead of doing this, we're going to use the PyAlex package. This is a Python package that is designed to facilitate working with the OpenAlex API. So instead of writing raw requests, we will be using pre-written Python methods inside this library. One of the main reasons I want to use this library is that the Cursor agent will work much better by using this package, rather than accessing the API directly. It will be able to see what methods are available and understand them, and it will get started much faster working with the OpenAlex API. So the first thing we want to do is to get our Cursor agent to install PyAlex for us. So I'm going to copy the name and the latest version of the library. And now, back in Cursor, I'm going to tell my agent, "Add to the current virtual environment," and I'm going to paste what I copied. So now, the agent will add it. It's installing PyAlex in my current virtual environment. I'm going to approve this command. Now, the library was installed successfully, and as you can see, the model has added a new line to requirements.txt, which is exactly what we want because it allows us to keep track of the libraries that we install. So I'm going to go down here in the chat and I'm going to select Keep All. And that was it. Our agent successfully installed the library. Now that the PyAlex Library is installed, I want to make sure that our AI agent can use it effectively. So, I'm going to ask you to write a quick script. I'm going to go into the chat and I'm going to say, "Write a quick temporary script that uses PyAlex to display the titles of five AI papers." So, it's a quick script. I don't want it to work too much on it, and it's temporary because after we try it, we're going to delete it. And it's a very simple request, right? I just want five titles of some AI papers, but it will help me make sure that the agent can actually leverage this library that we just installed. So let's run this. Now, this is a new chat, so the agent has to look at the code that we already have. It found that PyAlex is already installed and it started writing this temporary script. Now it wants to run it, so I'm going to authorize this. And as you can see, the script has run successfully, and we can see here in the output the names of five papers. So, we've successfully accessed the API and retrieved data about AI. And actually, because I asked for a temporary script, at the end, it even deleted it, so that we don't save something that we don't need in our code base. And personally, I'm fine with that. So this looks very good to me. Now, we have actually changed one file in our code base, which is requirements.txt. So we must remember to commit and push on Git to save our changes. So I'm going to go here, and next to Changes, I'm going to select Plus, Stage All Changes, and then the Sparkle symbol to get AI to write a commit message, which looks correct to me. So I'm going to select Commit and then Sync Changes. And now, I can see my commit down here. Great, we now have a functional data extraction layer for our pipeline. Whenever we need fresh data on academic papers, we can ask our AI agent, and it will write a script that will leverage PyAlex Library to get the data we need. And because this is an API, we will always get fresh data.
Contents
-
-
-
-
OpenAlex quick start: Analyze data for Python data pipelines6m 59s
-
(Locked)
Data extraction layer: Get data from OpenAlex22m 40s
-
(Locked)
Neon database setup: Cloud PostgreSQL for data pipeline projects9m 50s
-
(Locked)
Design table schema and create a table in the database9m 50s
-
(Locked)
Process and load your data12m 23s
-
(Locked)
Data quality testing11m 21s
-
(Locked)
Consolidate pipeline logic8m 38s
-
(Locked)
Build a Streamlit dashboard7m 58s
-
(Locked)
Deploy the Streamlit dashboard7m 28s
-
-