Seasonality analysis

Searching for periodic patterns in the data

The term seasonality means a periodical repeating fluctuation of pollutant concentration, such as decreasing value in some part of the year or culminating in another. It is well observable in PAHs, which are bound to the combustion processes and therefore exhibit nearly annual fluctuation. The whole thing is to find a time period, of these changes, comparing individual values by one of the method. Autoregression method uses the lag approach, shifting the time series repeatedly by one time period between measurements and computing the highest correlation of the time series with itself.

Air concentrations of many POPs are sensitive to weather conditions such a temperature and sunshine, which change continuously in the course of the year. Changes in ratio of the pollutant amount in gas phase and bound on particles can strongly depend on temperature, which influences the sampling result; primary and secondary sources of the POPs play an important role in the seasonal fluctuations of their concentration as well.

POPs may be adsorbed on a plenty of surfaces such leafs, needles, soil etc. and released into the environment in different extents through year(s). Moreover, there are also season-dependent primary sources: both industrial and residential combustion devices (this is important particularly for PAHs, whose concentration is higher in winter – see example 1).

There are several methods how to detect seasonality in the data, of which probably the most known is autocorrelation function (ACF) and discrete Fourier transform (DFT). We used the autoregression method in our example, which provides relatively smooth curve if a suitable order of the regression is chosen. It is immediately apparent from the curves, whether the compound has high or low annual periodicity in concentration, or whether it eventually exhibits some non-annual fluctuations.

Autocorrelation

A basic concept of repeating pattern search is the autocorrelation, which is (on equidistant time series) defined as a correlation of the time series with itself, shifted in time by a period called time lag. It is considered as an autocorrelation function (ACF) of the time lag, which starts at 0 and continues by integer multiples of the period between two consequent measurements and could be estimated as:

The higher the value of ACF is for the chosen time lag, the stronger is a tendency to repeat similar values periodically after the period of this time. This is also used for an estimation of autoregression coefficients.

Autoregression

The principle of autoregression expects the time series of the form:

where εt denotes the influence of random processes in the form of white noise. Thus the constants φi characterize the shape of the time series and could serve as an input for computation of a frequency decomposition of the time series. Thee are several ways how to compute the φi constants; we used the most usual set of Yule-Walker equations in our R example.

Spectrum

Assuming that every time series can be decomposed to a set of individual frequencies, we can estimate spectral characteristics representing the time series spectrum. It can be computed from the estimated autoregression coefficients or using the Fourier transform as well.

The higher spectral density of individual frequency is, the higher tendency to repeat with this period the time series has. That means if there are obvious peaks on the line inside the periodogram, the time series probably repeats with these periods (note that most of the pollutants have significant peak in about 1-year time period).

Previous step.

References

Cryer, J. D.; Kung-Sik, C., Time Series Analysis With Applications in R. Springer: New York, 2009; Vol. 3., p 491.

Next step
5: Seasonality analysis
5: Seasonality analysis