From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Solution: Query the API with R - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Solution: Query the API with R
(upbeat music) - [Instructor] The solution for the Chapter 1 challenge is you can find it in this Quarto doc, R_challenge_solutions. This file is under chapter-1 folder in the course repository. Let's get started by loading the required libraries. As before, we're going to load EIAapi to query data from the API, dplyr to process data, lubridate to reformat date and time objects, and we're going to use plotly to visualize the data. The first question, we were asked to extract the metadata of the San Diego Gas and Electric balancing authority from the EIA dashboard. So let's go to the eia.gov website. I'm on the main page. If you scroll down under the Features, and click the API icon. And next, click the Browse the API, this will lead you to the API Dashboard. Now, the first thing we want to do is select the route. The main category here is Electricity. Let's go ahead and select it. And the subcategory is Electric Power Operation (Daily and Hourly). And since we want the region level, so we're going to select Hourly Demand by Subregion. The next step is to select the facet. We want to narrow down out of the series that are available to get the metadata of the San Diego Balancing Authority. So let's click the facet. There are two option here. Either you can go directly to the Subregion and just filter for the specific one. You can see there are 83, overall 83 subregions. Or you first select the, well, let's remove it first. You can first select the Balancing Authority. We're going to select California Independent System Operator and then go ahead and select the Subregion. And as you can see, it's narrowed down to the four subregions under the parent balancing authority. So now we can go ahead and click, select the San Diego Gas and Electric. Don't forget to save the selection, and submit the request. And as you can see, the dashboard retains the API metadata. What we can use from here, first is the API route. You can see that here is the API route that we are going to use, electricity/rto/region-sub-ba-data. We'll have to add data after as we're expecting to pull data and not metadata. Next is the header. You can see here that the frequency is hourly and this is what we're going to use as before. And the facet that require for pulling this series is we need the parent to be CISO and set the subba or the subregion to SDGE. So let's go back to our studio and set the first get request. The second question, we are required to pull the series using the metadata we extract in the first question and bound it between January 1st and January 31st, 2024. Let's go ahead and update the parameters over here. So we're going to use the same method to load the API. We're going to use the same API route as before. And the frequency we're going to set is hourly. And for the facet we're going to set the parent as CISO and the subba should be San Diego Gas Electricity, SDGE. Let's confirm it over here. SDGE. We're going to set also the start and end arguments. Recall that the eia_get function is using string using the same format as the API, which is year, month, day and hour separate by T. So let's go ahead and set January 1st as the start point, and the endpoint should be January 31st. We're going ahead and execute and assign those variables. And then let's call the function and assign it into df1. Remember, the function return the period or the timestamp variable in a character format. So we're going to use lubridate year, month, day, hour or ymd_h function to reformat the period or the timestamp and assign it to index. Recall that we want to set the time zone as UTC. We're then going to arrange the index to be the first column and sort the data by the index. Let's go ahead and run it. And as you can see, we got the output the same as before, but this time we can see that the subba or in the subba name is San Diego. Last but not least, we can visualize the result with plotly. And as you can see, this look as expected. And we got the data starting from January 1st all the way until the end of January. So now we can move to the next question. In the third question, we were asked to use the eia_backfill function to pull the data between January 1st, 2020 and February 1st, 2024. Remember that while the eia_backfill function and the eia_get function use the same arguments, the main difference between the two is that the start and end inputs or class is different. The eia_backfill function use POSIX object to set the start and end. We're going to use the as.POSIXct function to set the start and end according to the time range between 2020 and 2024, February. And we're going to define the time zone as UTC. In addition, we're going to set the offset to 2000 observation their request during the sequential request that the function running on the backend. Let's go ahead and execute the function. It might take a few seconds to run as we are pulling couple of thousands of observations. Okay, we got the output, we can clear it 'cause it's not really useful. Let's look at the structure of the dataframe that we pulled df2. And you can see we pulled 35,000 observation and we got seven variables. Another difference between the output of the eia_backfill and the eia_get is that it is within the timestamp as set as time, and it's already a reformat with POSIX object. Let's go ahead and plot the output using the plotly function. And we got the series as expected.
Contents
-
-
-
EIA API2m 47s
-
Setting an environment variable3m 22s
-
The EIA API dashboard4m 10s
-
GET request structure5m 41s
-
Querying the data via the browser4m 4s
-
Querying data with R and Python2m 50s
-
Pulling metadata from API with R3m 5s
-
Sending a simple GET request with R5m 19s
-
API limitations with R4m 43s
-
Handling a large data request with R4m 27s
-
Pulling metadata from API with Python3m 47s
-
Sending a simple GET request with Python4m 44s
-
API limitations with Python3m 54s
-
Handling a large data request with Python3m 10s
-
Challenge: Query the API1m 2s
-
Solution: Query the API with R7m 28s
-
Solution: Query the API with Python7m 45s
-
-
-
-
-