1. Querying for Alaskan data

A sample case of data querying

After understanding the basics, you may want to see an approach for querying a specific set. In this article, we take a look at how to use the package to determine and query for data in Alaska.

What we’re looking for

We want to work with ozone concentrations in Alaska. In particular, we’re looking to get data for the entire month of January in 2019. We might be able to see how to make the call easier by setting up these requirements as a bulleted list:

  • Start data = January 1, 2019
  • End date = January 31, 2019
  • Pollutant = ozone
  • State = Alaska

What service to use

The EPA contains several services so we need to first determine what service to use to get these data.

Since we’re looking for data for a month, it might make sense to check out the daily summary services.

services$`Daily Summary Data`$Description
## [1] "Returns data summarized at the daily level.  All daily summaries are calculated on midnight to midnight basis inlocal time.  Variables returned include date, mean value, maximum value, etc."

The endpoint for the service

If we want to try using the daily summary service, we need to find an endpoint for it as well as the API variables to fill.

The services object gives us the answer. We can easily select the Daily Summary Data option and subsequently find what endpoint we need to filter by state.

services$`Daily Summary Data`$Filters$`By State`
## $Endpoint
## [1] "dailyData/byState"
## 
## $RequiredVariables
## [1] "email, key, param, bdate, edate, state"
## 
## $OptionalVariables
## [1] "cbdate,cedate"
## 
## $Example
## [1] "Example; returns all benzene daily summaries from North Carolina collected on May 15th, 1995:https://aqs.epa.gov/data/api/dailyData/byState?email=test@aqs.api&key=test&param=45201&bdate=19950515&edate=19950515&state=37"

We can see the endpoint and the API variables we need here. Recall that we don’t need to include an email and key since the package already includes authentication credentials.

We can use the variables object to find appropriate codes to use for each variable.

variables$bdate
## [1] "bdate"                                                                                                                                                                                                                                                                                                                              
## [2] "The begin date of the data selection in YYYYMMDD format.  Only data on or after this date will be returned.  (Note, for annual data, only the year portion of the bdate and edate are used and only whole years of data are returned.  For example, bdate = 20171231 and edate = 20180101 will return full data for 2017 and 2018.)"
## [3] "20170101"

This tells us bdate should be in a YYYYMMDD format. A similar process also tells us that edate should follow the same YYYYMMDD format.

No we can move on to finding parameter codes for the state of Alaska and for ozone.

Finding codes for API variables

We need to find the parameter code for Alaska. How might we do this? A service in the API might help us here.

services$List$Description
## [1] "Provides information you need to construct other queries.  Valid values for the required variables: parameter code, state code, etc.  (See below.)"

Indeed, the list service description seems to fit our needs.

The state code

After tooling around the filters, it looks like the States filter would give us the info we need to get state parameter codes.

services$List$Filters$States
## $Endpoint
## [1] "list/states"
## 
## $RequiredVariables
## [1] "email, key"
## 
## $OptionalVariables
## [1] ""
## 
## $Example
## [1] "Returns a list of the states and their FIPS codes used for constructing other requests:https://aqs.epa.gov/data/api/list/states?email=test@aqs.api&key=test"

It looks like we need only use the endpoint suggested to get the state codes.

endpoint <- services$List$Filters$States$Endpoint
state.codes <- perform.call(endpoint)
head(state.codes$Data)
##   code value_represented
## 1   01           Alabama
## 2   02            Alaska
## 3   04           Arizona
## 4   05          Arkansas
## 5   06        California
## 6   08          Colorado

Alaska shows up as 02 so we can now set that API variable up in our call.

The parameter code

Now, we’re left with determining the parameter code for ozone. We check out the variables object to get more info.

variables$param
## [1] "param"                                                                                                                                                                                                                                                   
## [2] "The AQS parameter code for the data selection.  AQS uses proprietary 5 digit codes.  They may be obtained via the  list parameters service.  Up to 5 parameters may be requested, separated by commas.  Only data for these parameters will be returned."
## [3] "81101,44201"

It looks like we will be using a 5 digit code, but we should use the List service to find out more. We can start by looking at what parameter groups exist.

Using the endpoint for this filter, we see what pollutants are available.

endpoint <- services$List$Filters$`Parameter Classes (groups of parameters, like criteria or all)`$Endpoint
result <- perform.call(endpoint)
head(result$Data)
##             code
## 1    AIRNOW MAPS
## 2            ALL
## 3 AQI POLLUTANTS
## 4      CORE_HAPS
## 5       CRITERIA
## 6       CSN DART
##                                                     value_represented
## 1 The parameters represented on AirNow maps (88101, 88502, and 44201)
## 2                                     Select all Parameters Available
## 3                                 Pollutants that have an AQI Defined
## 4                                          Urban Air Toxic Pollutants
## 5                                                 Criteria Pollutants
## 6     List of CSN speciation parameters to populate the STI DART tool

It looks like several types of parameters exist. Since we’re looking for ozone, and ozone is a pollutant we’ll try looking at AQI POLLUTANTS. Now that if we want to search the AQI POLLUTANTS class of pollutants for ozone, we have to use a different service.

services$List$Filter$`Parameters in a class (obtain the list of classes from the List - Parameter Classes service)`
## $Endpoint
## [1] "list/parametersByClass"
## 
## $RequiredVariables
## [1] "email, key, class"
## 
## $OptionalVariables
## [1] ""
## 
## $Example
## [1] "Example; returns all parameters in the CRITERIA class:https://aqs.epa.gov/data/api/list/parametersByClass?email=test@aqs.api&key=test&pc=CRITERIA"

Notice that the example requires specifying the parameter class as pc. Now we can search for the ozone parameter code within AQI POLLUTANTS. We see the endpoint first:

services$List$Filters$`Parameters in a class (obtain the list of classes from the List - Parameter Classes service)`$Endpoint
## [1] "list/parametersByClass"
endpoint <- "list/parametersByClass"
parameters <- "AQI POLLUTANTS"       
result <- perform.call(endpoint, parameters, "pc")
result$Data
##    code                      value_represented
## 1 42101                        Carbon monoxide
## 2 42401                         Sulfur dioxide
## 3 42602                 Nitrogen dioxide (NO2)
## 4 44201                                  Ozone
## 5 81102                  PM10 Total 0-10um STP
## 6 88101               PM2.5 - Local Conditions
## 7 88502 Acceptable PM2.5 AQI & Speciation Mass

Bingo. Ozone has a code of 44201 so we can declare an API variable for our original query.

Putting it all together

Recall that we set out to find data for the following:

  • Start data = January 1, 2019
  • End date = January 31, 2019
  • Pollutant = ozone
  • State = Alaska

We determined the endpoint and parameter codes to use for each value. Here we setup our API variables based on our findings.

endpoint <- "dailyData/byState"
vars <-  list(bdate = "20190101",
              edate = "20190131",
              param = "44201",    # Ozone
              state = "02")       # Alaska

We set up a variables list and make the call. The order variables go in the list doesn’t matter.

alaska <- perform.call(endpoint, vars)

We can check the data is present as called.

alaska <- alaska$Data
head(alaska)
##   state_code county_code site_number parameter_code poc latitude longitude
## 1         02         068        0003          44201   1  63.7232 -148.9676
## 2         02         068        0003          44201   1  63.7232 -148.9676
## 3         02         068        0003          44201   1  63.7232 -148.9676
## 4         02         068        0003          44201   1  63.7232 -148.9676
## 5         02         068        0003          44201   1  63.7232 -148.9676
## 6         02         068        0003          44201   1  63.7232 -148.9676
##   datum parameter         sample_duration pollutant_standard date_local
## 1 WGS84     Ozone                  1 HOUR  Ozone 1-hour 1979 2019-01-01
## 2 WGS84     Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-Hour 1997 2019-01-01
## 3 WGS84     Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-Hour 2008 2019-01-01
## 4 WGS84     Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-hour 2015 2019-01-01
## 5 WGS84     Ozone                  1 HOUR  Ozone 1-hour 1979 2019-01-02
## 6 WGS84     Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-Hour 1997 2019-01-02
##    units_of_measure event_type observation_count observation_percent
## 1 Parts per million       None                24                 100
## 2 Parts per million       None                24                 100
## 3 Parts per million       None                24                 100
## 4 Parts per million       None                17                 100
## 5 Parts per million       None                24                 100
## 6 Parts per million       None                24                 100
##   validity_indicator arithmetic_mean first_max_value first_max_hour aqi
## 1                  Y        0.042667           0.045              9  NA
## 2                  Y        0.041917           0.044              5  41
## 3                  Y        0.041917           0.044              5  41
## 4                  Y        0.041588           0.044              7  41
## 5                  Y        0.036708           0.040              1  NA
## 6                  Y        0.035708           0.039              0  36
##   method_code                      method                 local_site_name
## 1         047 INSTRUMENTAL - ULTRA VIOLET Denali NP & PRES - Headquarters
## 2         047 INSTRUMENTAL - ULTRA VIOLET Denali NP & PRES - Headquarters
## 3         047 INSTRUMENTAL - ULTRA VIOLET Denali NP & PRES - Headquarters
## 4         047 INSTRUMENTAL - ULTRA VIOLET Denali NP & PRES - Headquarters
## 5         047 INSTRUMENTAL - ULTRA VIOLET Denali NP & PRES - Headquarters
## 6         047 INSTRUMENTAL - ULTRA VIOLET Denali NP & PRES - Headquarters
##           site_address  state  county          city cbsa_code cbsa
## 1 DENALI NATIONAL PARK Alaska Denali  Not in a city      <NA> <NA>
## 2 DENALI NATIONAL PARK Alaska Denali  Not in a city      <NA> <NA>
## 3 DENALI NATIONAL PARK Alaska Denali  Not in a city      <NA> <NA>
## 4 DENALI NATIONAL PARK Alaska Denali  Not in a city      <NA> <NA>
## 5 DENALI NATIONAL PARK Alaska Denali  Not in a city      <NA> <NA>
## 6 DENALI NATIONAL PARK Alaska Denali  Not in a city      <NA> <NA>
##   date_of_last_change
## 1          2020-02-26
## 2          2020-02-26
## 3          2020-02-26
## 4          2020-02-26
## 5          2020-02-26
## 6          2020-02-26