Visual inspection of the data

Most of data on POPs concentration exhibit roughly log-normal distribution, which corresponds to the processes described by 1^st order chemical equation. Usually, if no source of a new pollutant is present, the histogram should exhibit log-normal shape. In contrast, if there is some source of the pollutant in the environment, concentration values don’t shift to the lower values so quickly and the concentration level settles near a specific value. In such cases, rather normal or more complicated distribution occurs.

Time series plot

This is a simple xy plot, where time is on the x-axis and the concentration is on the y-axis. Each POPs measurement is depicted as a point with x-axis equal to the middle time of a measurement and y-axis showing the resulting concentration. It could be useful to connect points by linear polyline, which is a set of consequent line segments of the shape

for consequent points [x₁;y₁], [x₂;y₂].

Trend plot

Trend plot adds a trend line to the time series plot. If some of the 30 example compounds measured in Kosetice is chosen, the exponential trend line fitted by least squares method is drawn:

otherwise - in case of an user defined input, the linear trend is depicted:

Histogram

Although the histogram is basically an xy plot as well, the meaning of axes is different here. The concentration is now on the x-axis and the y-axis shows the frequency of concentration values inside ranges defined by width of the columns. E.g. if some column starts at 1 ng/filter and ends in 2 ng/filter on the x-axis and its height is 21, it means that 21 values are higher than 1 ng/filter and lower or equal to 2 ng/filter.

In case of example compounds, there is a log-normal curve added to the histogam, showing an ideal log-normal distribution, which is expected. The shape of the curve is following:

where the notion is the same as in the previous equations, i.e. the concentration is denoted by y and frequency by f(y). Note, that there is no variable x in the equation - we consider the distribution of the values time independent.

Box & whisker plot

In box & whisker plot, there are five lines corresponding to the values of interest. The concentration is now back on the y-axis and the dark violet line in the middle of each column depictes the median (2^nd quartile) of the concentration, the box margins denote the 1^st and 3^rd quartiles and the whiskers denote the minimal and maximal measured value (0^th and 4^th quartiles). As in the previous case, this statistics are time-independent:

P(y<Q(p))=p i.e. Q(p)=F^-1(p)

where Q(p) denotes the p-th quantile, i.e. 4p quartile, P denotes probability and F denotes the distribution function, obtained as an integral of the probability density function.

There is a build-in function plot in the elementary R function pool, which could be used for drawing both simple plots with time points responding to individual measurements and connecting them by a polyline:

plot(dat,flu)
lines(dat,flu)

plot(dat,dde)
lines(dat,dde)

The genasis package provides three useful tools for data visualisation. The function genplot allows to draw plain time series and add several options of trends, the function genwhisker allows to asses variance of concentration values by drawing box&whisker plots for each compound and the function genhistogram provides several approaches to draw histograms of concentration values, which could i.a. serve as a decision tool for deploying a transformation of the data (usually the logarithmic transformation is used).

genplot(dde,dat,il="ls",ci="")
frame<-data.frame(dat,flu,dde)
colnames(frame)<-c("date","fluorene","p,p'-dde")

genwhisker(frame,distr="lnorm")
genhistogram(flu,distr="lnorm",col="gold")

This is the content of all three variables:

flu
[1] 630.80000 745.00000 246.80000 126.80000   89.60000   55.85714 55.85714
[8]    5.60000   37.40000 143.44640 442.76790 1853.50000 1853.50000 898.20000
[15] 688.00000 569.80000 216.60000 64.70000 140.00000 138.20000 94.40000
[22] 97.40000 58.00000 417.60000 1354.44000 1261.38000 1003.46000 1392.10000
[29] 612.78000 387.66000 125.82000   99.02000   82.24000   84.34000 53.80000
[36] 75.02000 210.02000 390.46000 663.28000 218.50000 390.56000 256.72000
[43] 176.70000 102.68000 51.40000   33.70000   37.37143   62.52071 122.84000
[50] 342.74000 441.00000 849.75000

dde
[1] 6.904000 4.884000 5.926000 7.446000 3.153000 4.055714 4.055714 4.223000
[9] 6.220000 5.762714 16.882140 15.716250 15.716250 4.049000 5.496000 3.293000
[17] 8.338000 7.298000 5.098000 5.991000 5.902000 6.660000 6.319000 7.097000
[25] 9.851000 5.295000 3.220000 1.640000 2.440000 4.980000 8.000000 10.860000
[33] 11.120000 8.280000 5.500000 8.040000 8.560000 9.560000 3.960000 5.740000
[41] 3.060000 5.040000 2.620000 4.160000 5.260000 2.440000 3.171429 8.589286
[49] 8.900000 3.600000 3.060000 2.720000

dat
[1] "2004-01-21" "2004-02-18" "2004-03-17" "2004-04-14" "2004-05-12" "2004-06-09"
[7] "2004-07-07" "2004-08-04" "2004-09-01" "2004-09-29" "2004-10-27" "2004-11-24"
[13] "2004-12-22" "2005-01-19" "2005-02-16" "2005-03-16" "2005-04-13" "2005-05-11"
[19] "2005-06-08" "2005-07-06" "2005-08-03" "2005-08-31" "2005-09-28" "2005-10-26"
[25] "2005-11-23" "2005-12-21" "2006-01-18" "2006-02-15" "2006-03-15" "2006-04-12"
[31] "2006-05-10" "2006-06-07" "2006-07-05" "2006-08-02" "2006-08-30" "2006-09-27"
[37] "2006-10-25" "2006-11-22" "2006-12-20" "2007-01-17" "2007-02-14" "2007-03-14"
[43] "2007-04-11" "2007-05-09" "2007-06-06" "2007-07-04" "2007-08-01" "2007-08-29"
[49] "2007-09-26" "2007-10-24" "2007-11-21" "2007-12-19"

Visual inspection of the data

Filtering damaged datasets, drawing plots and emerging hidden patterns

Time series plot

Trend plot

Histogram

Box & whisker plot

References