Treatment of values under LoQ

Identification and substitution of the values below the limit of quantification

Of course, the right censored data (or simply censored data if both the lowest and the highest values cannot be quantified) could also occur in statistical analyses, but considering the design of measuring devices and processes of release and dilution of the pollutants in the environment, they are very rare in the field of POPs monitoring.
Right censored data are quite usual in a surveillance analysis.

Principally, every method of determining the concentration of any POPs has some limitations. Very low concentrations could be detected, but not reliably quantified and if the concentration is even lower, it may happen, that the measuring device is not able to detect presence of the pollutant, although there is some very little amount in the surrounding environment.

The two threshold values, under which the concentration cannot be determined or even detected by the device, are usually called “Limit of Quantification (LoQ)” and “Limit of detection (LoD)”. Because the inequality LoD ≤ LoQ holds in all cases, we simplify the notation using the term LoQ only. The data, where at least one value is lower than LoQ, are called left censored data and before the further steps of the analysis it is necessary to treat these values by one of the substitution/exclusion methods.

The simplest, but unrecommended method is exclusion of the values under LoQ. In this case, the resulting datasets have less (and non-equidistant) records and overall statistics is highly biased (try this method in our examples).

A substitution of the censored values by a constant seems to be a better approach and it is also frequently used in the form of one half of the LoQ value. Nevertheless, if the number of censored values is higher, this approach could bias aggregated statistics, too.

It is also possible to reconstruct the values under LoQ by a maximum likelihood method, but in this case some assumptions about the statistical distribution of the concentration values have to be done (see previous page and the histogram). If we know, for example, that the pollutant is not emitted more and the ambient concentration decreases by 1st order chemical process, we can assume, that the distribution will be lognormal and compute the missing values under LoQ using this distribution.

Exclusion method

The worst option how to treat values under LoQ is their exclusion. Although we do not know the real concentration value, it is not true that we have no information about it. At least we know that the concentration is very low and this should be reflected within the analysis. If the lowest values are excluded, the whole concentration level shifts up and most of the statistics is biased.

Nevertheless, there are several specific methods, in which very low values are not of interest. The exclusion method is included in the genloq R function and also in our example (to demonstrate its disadvantages). The values under LoQ cease in the plot if this method is selected.

Substitution methods

The most often used method of the values under LoQ treatment is their substitution by a constant. Usually this constant is derived from the limit value of the device (method), but sometimes zero is used instead. There are two disadvantages of such approach. The first one means that if we do not know the distribution of the concentration values, we cannot decide, which constant is appropriate (the most often used 1/2 of LoQ has no theoretical basis - in fact the divisor also depends on the magnitude of the limit itself). Moreover, even if we know the best average value of the constant to replace, the higher amount of equal numbers distort the variance of the data and all of the derived parameters.

Maximal likelihood method

The most trustworthy method is reconstructing the values according to expected distribution of the concentration values. The maximum likelihood method (MLE) is usually used to estimate parameters of the distribution. Once the distribution is known, it is possible to compute all the descriptive statistics, i.e., the aggregated numbers, even without knowing the individual numbers.

Since we expect lognormal distribution, the likelihood function to maximize looks like this:

where L denotes the likelihood to be maximized and μ and σ denote characteristics of the distribution to be found. Final values are then obtained by division of the interval (0,f(LoQ)) where f denotes probability density function by number of searched values yk and computing f-1(yk) for all of them.

Neverthless, if we want to compute trend tests, we need to know every individual value of the time series. It is easy to estimate a set of these values, but we need to distribute them to individual positions. Here, a tricky principle was used:

There is an assumption that real concentration changes continuously (according to the physical processes of exhaustion, dilution, adsorption, desorpiton and decomposition of POPs) and that the slope of the time vs. concentration curve is dependent on the concenration (i. e., the shape of the curve follows some known (periodic) functions) and the magnitude of the value under LoQ is thrfore related to the last preceding and next following values above LoQ. This means that the substituing values are sorted according to the neighbouring measurements of the substituted ones.

Previous step.

References

Hornung, R. W.; Reed, L. D., Estimation of Average Concentration in the Presence of Nondetectable Values. Applied Occupational and Environmental Hygiene 1990, 5 (1), 6.

Next step
2: Treatment of values under LoQ
2: Treatment of values under LoQ