1. Lab #1 – Statistics – Air Quality Data
~ 1 ~
Purpose: To learn about the air quality data available online from TCEQ and to learn about
some basic statistical methods utilizing Matlab and Excel when analyzing environmental data.
Instructions:
The data set for this exercise will be obtained from the TCEQ website:
http://_______________________________[insert link here]
Follow the above link and under ‘Select a Monitoring Site’, select “CAMS [num] [city]”. It is
possible to access data for a specific day or the entire month. Click the ‘Data Reports’ tab to
retrieve hourly data sets of parameters.
For this exercise, data from two days of [year] will be analyzed. Select the Month and Day and
24 hour time format. Use the dates [month, day, year] and [month, day, year].
The first day represents ozone season day with higher than average ozone values; the latter day is
typical of summer ozone values in [city]. PM2.5 data for each day is also needed for this
assignment.
A couple exercises need an entire month of [month, year] ozone and particulate matter data
(PM2.5). For the PM2.5 data, select the ‘last’ selection on the list before generating a comma
delimited report.
For the report, follow the laboratory report guidelines posted and make sure to label charts and
graphs and use appropriate combinations to communicate or answer the exercises and questions.
Exercises:
1. Complete the following regression analysis for each dependent variable listed in the
following table for [month, day] and [month, day].
Present a scatter plot in your report with a trend line for each of these:
Dependent Variable Independent Variable Coefficient of Determination
Ozone Temperature R2
Ozone Solar Radiation R2
Ozone Particulate Matter R2
Temperature Solar Radiation R2
2. Next, take the same data for those days, a do a multiple regression for [month, day] and
[month, day] for the following parameters. Note the R2 value for each combination.
Ozone = Temperature + Solar Radiation
2. Lab #1 – Statistics – Air Quality Data
~ 2 ~
Ozone = Temperature + Solar Radiation + Particulate Matter 2.5
Particulate Matter 2.5 = Ozone + Temperature + Solar Radiation
3. Do a box-plot for the 24 hours of ozone for [month, day] and [month, day] with
MATLAB.
4. Plot histograms for the 24 hours of ozone for [month, day] and [month, day] with
MATLAB.
5. Use Excel to create a table of the summary statistics of ozone for [month, day] and
[month, day].
6. Do a histogram using MATLAB for the ozone data for the month of [month, year].
7. Do a histogram using MATLAB for the particulate matter data (PM2.5) for the entire
month of [month, year].
Questions:
1. For the linear regression plots in exercise 1, ozone is best correlated to which parameter?
2. Can temperature be correlated with sun light?
3. How do the boxplots for ozone in exercise 3 differ or compare?
4. For the multiple regressions in exercise 2, which combination yielded the best coefficient
of determination? Only post the R2 value for each combination analyzed.
5. What is the distribution of the monthly ozone data in exercise 6? Normal, positively
skewed, negatively skewed?
6. Does the particulate matter data in exercise 7 look like a normal distribution?
7. How do the histograms of ozone for [month, day] and [month, day] differ? Also, do the
numeric values generated by the summary statistics for skew and kurtosis appear
consistent with the histograms?
8. Is the combination of independent variables tested for particulate matter a reasonable
choice?
Note: For the purposes of the lab report, it is okay to answer these questions with a
paragraph format. Also, feel free to add references to make an explanation better. The
references can be web/online or a published reference.