Annual aggregation

Finding the central tendency of individual years to gain long term trends

If there are multiple results within one year in a particular monitoring programme, the GMP methodology specifies, that the arithmetic mean should be used for computing the central tendency (mean). From the nature of most of POPs in the air, whose distribution of values inclines to be lognormal, the use of geometric mean (or median as a non-parametric method) appears to be an eligible option, but hides a risk of inappropriate data treatment. Try to compare all the central tendencies in our example.

There is a lot of different measurement programmes worldwide, which employ slightly different methodologies and approaches. Nevertheless, although different methods according to the purpose of the POPs monitoring, financial and climate options and technological equipment can be used, once a global initiative takes a turn, it is necessary to unify at least several nodes of the methodology to enable mutual comparison and merging of the data.

Within the frame of the Global Monitoring Programme (GMP), 1-year long period was chosen as the most appropriate time period for reporting the data on environmental pollution by POPs, taking into account different lengths of sampling periods (usually weeks and months by passive samplers, hours to days by active ones) and typical annual seasonal fluctuation of many compounds.

Thus, there is a need of a simple tool for computing annual aggregations from arbitrarily long periods of measurement and numbers of samples, which could use a wide set of functions to find a central tendency within each year, consequently used for computing basement levels, trends and other statistics.

Annual aggregation

The process of annual aggregation represents nothing else than searching for one representative value to substitute 1-year segment of the time series. On one hand this process omits and hides certain characteristics of the time series (especially the within-year fluctuations), but on the other hand it also allows to compute some more extensive statistics not to be affected by details - especially the long-term time trends.

As in the previous examples, the selection of the best function used for aggregation of more measurements into one number depends on the distribution of the values. If we search for the central tendency, the most often used mean (intended also by the GMP methodology):

is suitable for normal (symmetric in general) distributions, while geometric mean:

is better in case of lognormal distribution. The nonparametric central tendency is represented by median:

.

Nevertheless, not only the central tendency is important for characterization of the one year period of measurements. We can use other functions to get the information about spread and distribution of the values. E.g., maximum tell us about the highest concentration through the year, which could be of high importance, or the number of individual measurement describes the design of pollutant monitoring. Therefore, there is an option to input your own function into the R example.

Previous step.

References

Cowpertwait, P. S. P.; Metcalfe, A. V., Introductory Time Series with R. Springer: New York, 2009; p 254.

Next step
6: Annual aggregation
6: Annual aggregation