Treatment of values under LoQ

Identification and substitution of the values below the limit of quantification

Of course, the right censored data (or simply censored data if both the lowest and the highest values cannot be quantified) could also occur in statistical analyses, but considering the design of measuring devices and processes of release and dilution of the pollutants in the environment, they are very rare in the field of POPs monitoring.
Right censored data are quite usual in a surveillance analysis.

Principally, every method of determining the concentration of any POPs has some limitations. Very low concentrations could be detected, but not reliably quantified and if the concentration is even lower, it may happen, that the measuring device is not able to detect presence of the pollutant, although there is some very little amount in the surrounding environment.

The two threshold values, under which the concentration cannot be determined or even detected by the device, are usually called “Limit of Quantification (LoQ)” and “Limit of detection (LoD)”. Because the inequality LoD ≤ LoQ holds in all cases, we simplify the notation using the term LoQ only. The data, where at least one value is lower than LoQ, are called left censored data and before the further steps of the analysis it is necessary to treat these values by one of the substitution/exclusion methods.

The simplest, but unrecommended method is exclusion of the values under LoQ. In this case, the resulting datasets have less (and non-equidistant) records and overall statistics is highly biased (try this method in our examples).

A substitution of the censored values by a constant seems to be a better approach and it is also frequently used in the form of one half of the LoQ value. Nevertheless, if the number of censored values is higher, this approach could bias aggregated statistics, too.

It is also possible to reconstruct the values under LoQ by a maximum likelihood method, but in this case some assumptions about the statistical distribution of the concentration values have to be done (see previous page and the histogram). If we know, for example, that the pollutant is not emitted more and the ambient concentration decreases by 1^st order chemical process, we can assume, that the distribution will be lognormal and compute the missing values under LoQ using this distribution.

Show methodology Hide methodology

Exclusion method

The worst option how to treat values under LoQ is their exclusion. Although we do not know the real concentration value, it is not true that we have no information about it. At least we know that the concentration is very low and this should be reflected within the analysis. If the lowest values are excluded, the whole concentration level shifts up and most of the statistics is biased.

Nevertheless, there are several specific methods, in which very low values are not of interest. The exclusion method is included in the genloq R function and also in our example (to demonstrate its disadvantages). The values under LoQ cease in the plot if this method is selected.

Substitution methods

The most often used method of the values under LoQ treatment is their substitution by a constant. Usually this constant is derived from the limit value of the device (method), but sometimes zero is used instead. There are two disadvantages of such approach. The first one means that if we do not know the distribution of the concentration values, we cannot decide, which constant is appropriate (the most often used 1/2 of LoQ has no theoretical basis - in fact the divisor also depends on the magnitude of the limit itself). Moreover, even if we know the best average value of the constant to replace, the higher amount of equal numbers distort the variance of the data and all of the derived parameters.

Maximal likelihood method

The most trustworthy method is reconstructing the values according to expected distribution of the concentration values. The maximum likelihood method (MLE) is usually used to estimate parameters of the distribution. Once the distribution is known, it is possible to compute all the descriptive statistics, i.e., the aggregated numbers, even without knowing the individual numbers.

Since we expect lognormal distribution, the likelihood function to maximize looks like this:

where L denotes the likelihood to be maximized and μ and σ denote characteristics of the distribution to be found. Final values are then obtained by division of the interval (0,f(LoQ)) where f denotes probability density function by number of searched values y_k and computing f^-1(y_k) for all of them.

Neverthless, if we want to compute trend tests, we need to know every individual value of the time series. It is easy to estimate a set of these values, but we need to distribute them to individual positions. Here, a tricky principle was used:

There is an assumption that real concentration changes continuously (according to the physical processes of exhaustion, dilution, adsorption, desorpiton and decomposition of POPs) and that the slope of the time vs. concentration curve is dependent on the concenration (i. e., the shape of the curve follows some known (periodic) functions) and the magnitude of the value under LoQ is thrfore related to the last preceding and next following values above LoQ. This means that the substituing values are sorted according to the neighbouring measurements of the substituted ones.

Show R source code

There is a function genloq in the R package genasis designed for substitution of the values under LoQ. The parameter method allows deploying several methods, such exclusion, substitution by one, one half or one divided by square root of two times LoQ and maximal likelihood method for normal and log-normal distribution. The function demands the vector (or data frame column) to be of class character, thus it is necessary to „manufacture“ the LoQs first:

flu_with_loq<-flu

flu_with_loq[which(flu<60)]<-"LoQ"

dde_with_loq<-dde

dde_with_loq[which(dde<4)]<-"LoQ"

Now, we can use the genloq function directly.

genloq(flu_with_loq,dat,loq=60,method="mle")$res

genloq(dde_with_loq,dat,loq=4,method="2.0")$res

And the results with substituted LoQs:

genloq(flu_with_loq,dat,loq=60,method="mle")$res

[1]  630.80000  745.00000  246.80000  126.80000   89.60000   40.07971   19.77429   25.17542   
[9]   49.90023  143.44640  442.76790 1853.50000 1853.50000  898.20000  688.00000  569.80000
[17] 216.60000   64.70000  140.00000  138.20000   94.40000   97.40000   54.90277  417.60000
[25]1354.44000 1261.38000 1003.46000 1392.10000  612.78000  387.66000  125.82000   99.02000
[33]  82.24000   84.34000   35.19422   75.02000  210.02000  390.46000  663.28000  218.50000
[41] 390.56000  256.72000  176.70000  102.68000   44.96888   13.58817   30.25652   62.52071
[49] 122.84000  342.74000  441.00000  849.75000

genloq(dde_with_loq,dat,loq=4,method="2.0")$res

[1]  6.904000  4.884000  5.926000  7.446000  2.000000  4.055714  4.055714  4.223000  6.220000
[10] 5.762714 16.882140 15.716250 15.716250  4.049000  5.496000  2.000000  8.338000  7.298000  
[19] 5.098000  5.991000  5.902000  6.660000  6.319000  7.097000  9.851000  5.295000  2.000000
[28] 2.000000  2.000000  4.980000  8.000000 10.860000 11.120000  8.280000  5.500000  8.040000
[37] 8.560000  9.560000  2.000000  5.740000  2.000000  5.040000  2.000000  4.160000  5.260000
[46] 2.000000  2.000000  8.589286  8.900000  2.000000  2.000000  2.000000

References

Hornung, R. W.; Reed, L. D., Estimation of Average Concentration in the Presence of Nondetectable Values. Applied Occupational and Environmental Hygiene 1990, 5 (1), 6.

2: Treatment of values under LoQ