Download - Espiritu 4.pdf
-
8/10/2019 Espiritu 4.pdf
1/14
Analytica Chimica Acta 462 (2002) 87100
Chemometrics characterisation of the quality of river water
Darinka Brodnjak-Voncina a,, Danilo Dobcnika, Marjana Novic b, Jure Zupan b
a Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova 17, SI-2000 Maribor, Sloveniab National Institute of Chemistry, Ljubljana, Slovenia
Received 7 March 2002; accepted 3 April 2002
Abstract
Within theperiod fromautumn 1990 to spring1999 (from October to April in each period) 207samples were collectedand the
measurement of 19 physical and chemical variables of theMura river, Slovenia,were carriedout. These variables are: river flow,
water temperature, air temperature, dissolved oxygen, deficit of oxygen, oxygen saturation index, chemical oxygen demand
(COD) in unfiltered and filtered samples, and biochemical oxygen demand after 5 days (BOD5) in unfiltered and filtered
samples, pH, conductivity, ammonium, nitrite, nitrate, and phosphate concentrations, adsorbable organic halogens (AOX),
dissolved organic carbon (DOC), and suspended solids. For handling the results of all measurements different chemometrics
methods were employed: (i) the basic statistical methods for the determination of mean and median values, standard deviations,
minimal and maximal values of measured variables, and their mutual correlation coefficients, (ii) the principal component
analysis (PCA), and (iii) the clustering method based on Kohonen neural network. The influences of season, month, sampling
site, and sampling time on the pollutant levels were examined. Before 1993, the pulp and paper industry was the main
source of pollutants because of large amounts of chlorine emission as a consequence of industrial treatment, the leaching ofcellulose. After the year 1993, the technology was changed and the quality of the river water has improved. The improvement
could be detected 1 year after the change of technology. For one part of water samples the river quality classes based
on biological parameters were also determined. The correlation between the biologically determined quality classes and
chemical measurements was sought. Consequently, the biological classification for the water samples based on the chemical
analyses was studied. 2002 Elsevier Science B.V. All rights reserved.
Keywords: Water quality; Mura river; Principal component analysis; Classification
1. Introduction
The physical and chemical studies on surface water,namely, the Mura river water have been performed
during a joint SlovenianAustrian project since 1966.
Here, we present the data collected in the time period
19901999. Through this period the quality of the
water was followed and every year the classification
has been made. The classification was done regarding
chemical parameters and biological analyses. There
Corresponding author. Tel.: +386-2-229-44-32;
fax: +386-2-252-77-74.
are four main biological classes: class I, unpolluted to
very slightly pollutedoligosaprobic; class II, moder-
ately pollutedbeta-mesosaprobic; class III, heavilypollutedalpha-mesosaprobic and class IV, exces-
sively pollutedpolysaprobic. Additionally, there are
three subclasses of quality of water between the main
four classes; III for slightly polluted, IIIII for crit-
ically polluted, and IIIIV for very heavily polluted
water. In the reported time period, all the samples
from the Mura river were classified into one of the
following three classes: class II, moderately polluted;
classes IIIII, critically polluted or class III, heavily
polluted, river quality classes. Since the biological
0003-2670/02/$ see front matter 2002 Elsevier Science B.V. All rights reserved.
P I I : S 0 0 0 3 - 2 6 7 0 ( 0 2 ) 0 0 2 9 8 - 2
-
8/10/2019 Espiritu 4.pdf
2/14
88 D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100
investigations are time consuming, it would be worth-
while to find a way to reduce the number of biological
investigations and replace them by chemical analy-
ses. The aim of this work is to find the correlationbetween biological classes and variables obtained by
chemical measurements.
Chemometrics methods have been often used for the
classification and comparison of different samples[1].
Some examples are, for instance, the differentiation of
rainwater compositional data by principal component
analysis (PCA)[2], application of chemometric tech-
niques to the analysis of Suquia River water quality
[3],identification of sources of bottom waters in the
Weddel Sea by PCA and target estimation[4],deter-
mination of correlation of chemical and sensory data
in drinking waters by factor analysis[5],to name just
a few. Chemometrics methods have been used also for
evaluating environmental data of Lagoon water [6],
San Francisco Bay and Estuary[7],and Muggia Bay
in Northern Adriatic Sea [8]. They were used also
for the oceanographic characterisation of northern Sao
Paulo cost [9]. Partial least squares (PLS) was per-
formed for simultaneous spectrophotometric determi-
nation of calcium and magnesium in water[10]. PCA
and PLS were used for the characterisation of wastew-
ater in Australia (Melbourne)[11]. An example of us-
ing Kohonen maps is given in a paper discussing theunsupervised training, clustering and classification of
multivariate biological data[12].
The quality of the water of the Mura river was
studied through 9 years (during nine seasons), from
1990/1991 to 1998/1999. The measurements were per-
formed from October till April, each week at the same
day, at the same time (h), and at the same sampling
site. During summer months sampling was not carried
out because of the excessive river flow causing high
water level. Additionally, twice a year the samples
were collected at four different sites along the riverstream. The first sampling site was in the middle of
the river Mura, in Spielfeld (Austria). The other three
sites were situated 10 km downstream, in the middle
and on both Mura riverbanks, the left one being in
Austria (Bad Radkersburg), and the right one in Slove-
nia (Gornja Radgona). Altogether 19 variables were
measured for all 207 samples collected and analysed
during this period. The measurements were alternately
performed by two institutions, Faculty of Chemistry
and Chemical Engineering, Maribor, Slovenia, and
Amt der Steiermarkischen Landesregierung, Graz,
Austria. Twice a year, in October and in February,
sampling by both participating institutions was made
and both results were compared. The average valueswere used in further treatments.
2. Experimental
2.1. Sampling
A standard method was used for sampling [13].
Water was collected in polyethylene bottles 0.5 m
below the surface at four sample sites along the river
stream. All glass and plastic ware used for samplingand analyses were rinsed with milli-Q water. Filtra-
tion was made through glassfibre filters. All mea-
surements were performed the day as samples were
gathered.
2.2. Instruments and reagents
Ion chromatograph Dionex and Perkin-Elmer spec-
trophotometer were used for some ion determinations,
Dohrmann apparatus for AOX and TOC determina-
tions, WTW conductivity meter and WTW oximetre
were employed for the corresponding analyses. All
reagents were analytical grade. The milli-Q system
was used for purifying the water.
2.3. Analytical methods
Standard methods were used for determination of
DOC [14], AOX [15], COD [16], BOD5 [17], sus-
pended solids[18],and pH measurement[19].
2.4. Spectrophotometric determinations
The samples were filtered before the beginning of
analyses. Ammonium was determined by reaction
of ammonium with salicylate and hypochlorite ions
in the presence of sodium nitrosopentacyanoferrate
[20], nitrate by the 2,6-dimethylphenol method [21]
and also with ion chromatography[22],nitrite by the
reaction of NO2 ions with sulphanylamyd yield-
ing intensively coloured diazonium salt [23], and
also with ion chromatography, and orthophosphate
-
8/10/2019 Espiritu 4.pdf
3/14
D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 89
was measured by the ammonium molybdate method
[24]. The absorbencies were measured at max of the
particular component.
2.5. Data analysis
The 207 samples are characterised by 19 physico-
chemical variables: river flow, water temperature,
air temperature, dissolved oxygen, deficit of oxygen,
oxygen saturation index, chemical oxygen demand
(COD) in unfiltered and filtered samples, and bio-
chemical oxygen demand after 5 days (BOD5) in
unfiltered and filtered samples, pH, electrical con-
ductivity, ammonium, nitrite, nitrate, and phosphate
concentrations, adsorbable organic halogens (AOX),
dissolved organic carbon (DOC), and suspendedsolids (seeTable 1). The enumerated variables are the
components of the vector representation of each sam-
ple which is used in further chemometric analysis.
The results of all measurements have been investi-
gated by different chemometric methods[1]:the basic
statistical methods for the determination of mean and
median values, standard deviations, minimal and max-
imal values of measured variables and their mutual
correlation coefficients. The PCA[1,25]and artificial
neural networks [26] were applied for grouping of
water samples due to measured variables. Among dif-ferent neural networks the Kohonen self organising
maps[27]are the most suitable for clustering, while
the counterpropagation artificial neural networks (CP
ANNs) are good as modelling method [26,2831].
All the calculations and plots in the following (PCA)
section were done with the Teach/Me software [25]
using Teach/Me data analysis option which is one of
the applications of the Teach/Me system, providing
very flexible tools for most fields of data analysis.
3. Results and discussion
3.1. Statistical screening of data
After determining mean and median values, and
standard deviation, the mutual correlation was sought
for all measured variables. The estimation of the pol-
lution should not depend on the river flow. As most
of the measured variables, except for the pH, temper-
atures, oxygen saturation, and conductivity are related
to the river flow, they were scaled by multiplication
with the river flow values given in m3 s1. The units
of scaled variables were correspondingly changed. In
this way, the river flow is eliminated leaving only 18variables in the vector representation of the samples.
The maximal correlation coefficient of the scaled data
was found between measurements of deficit of oxygen
and saturation oxygen index (R = 0.96), which is ob-
vious because the later one is obtained from a known
relationship between the oxygen saturation at differ-
ent temperatures and dissolved oxygen. Large corre-
lations (R > 0.84) were found between filtered and
unfiltered values of COD and filtered and unfiltered
values of BOD5, as expected.
The AOX variable shows the greatest change in wa-
ter quality in the past 9 years. The pulp and paper in-
dustry was the main source of pollutants because of
large amounts of chlorine emission as a consequence
of industrial treatment, the leaching of cellulose. Af-
ter the year 1993, the technology was changed; in the
new leaching process the chlorine was omitted and the
quality of the river water has improved. The improve-
ment could be detected no sooner than 1 year after
the change of technology. A good correlation of AOX
with the sampling time settled by the translation of
the 9-year period into days is evident fromFig. 1. The
important fall of AOX variable after the year 1994 andconsequently the improvement of quality of water is
shown inFig. 1.
The number of micro-organisms which are respon-
sible for self cleaning of water has been lowering
all the time until the year 1995. The changed indus-
trial procedure of cellulose leaching helps to improve
the water quality so the number of micro-organisms
started growing in 1995. BOD5 values also in-
creased. It is evident that the river needed 2 years for
self-remediation.
3.2. Principal component analysis (PCA)
It was performed in order to get an overall im-
pression about the correlation of 207 water samples,
described with physical and chemical variables, with
the quality of water in different seasons, months, or
sampling sites. Original data depending on waterflow
were multiplied with individual waterflow values as
described above, paragraph statistical screening of
data. PCA was applied on the matrix composed of
-
8/10/2019 Espiritu 4.pdf
4/14
-
8/10/2019 Espiritu 4.pdf
5/14
-
8/10/2019 Espiritu 4.pdf
6/14
-
8/10/2019 Espiritu 4.pdf
7/14
-
8/10/2019 Espiritu 4.pdf
8/14
-
8/10/2019 Espiritu 4.pdf
9/14
D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 95
Fig. 1. Plot of the normalised AOX variable and sampling time settled by translation of the 9 year period into days. The important fall of
this parameter and consequently the improvement of water quality after the year 1994 is evident. Samples are numbered from 1 to 207.
207 18 elements. A total of 207 rows represent wa-
ter samples composed of 18 variables. Data was addi-
tionally pre-processed on two different ways. First the
column centring of the data was used, what means
that the mean value of each column was subtracted
from individual (207) elements. Second, the autoscal-ing of individual variables was performed, called col-
umn standardisation. With this procedure the mean
of the column elements is subtracted from individual
elements and divided by the column standard devia-
tion. Consequently, each column has zero mean and
unit variance. The percentages of variances in resulting
eigenvectors (PCs) for both types of pre-processing of
the data is shown inTable 2.
FromTable 2, it can be seen that using column cen-
tred data, 99.8% of variance is gathered in the first two
PCs. However, analysing the composition of the first
and the second PCs it was found out that almost all of
variance is that of AOX (variable 18, v 18 inTable 1).
Consequently, there would not be much different to
analyse plots of samplesv18 (AOX) againstv7 (COD)
or v
17 (suspended solids), which are the two secondmost informative variables. For this reason, only the
PCA using autoscaled variables was further analysed.
With the autoscaled variables, 49.5% of total variance
was achieved in the first two principal components.
Any conclusion on the basis of plots shown in the
space of PC1 and 2 would neglect >50% of total infor-
mation about the data. Some rough indications from
the obtained distribution of transformed samples were
derived anyway, however, for further evaluation of the
water samples other chemometrical methods, such as
-
8/10/2019 Espiritu 4.pdf
10/14
96 D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100
Table 2
Comparison of variances in PCA using two different scaling modes,
column centring of data (m = 0.0) and autoscaling (m = 0.0,
s = 1.0)
PC Column centring
of data
Column standardisation
(autoscaling) of data
Variance (%) Total Variance (%) Total
1 99.42 99.42 35.47 35.47
2 0.38 99.80 13.99 49.46
3 0.10 99.90 10.83 60.29
4 0.04 99.94 7.50 67.79
5 0.02 99.96 5.50 73.29
6 0.02 99.98 4.59 77.89
7 0.01 99.99 3.83 81.72
8 0.01 100.00 3.63 85.35
9 0.00 100.00 2.91 88.26
10 0.00 100.00 2.74 91.0011 0.00 100.00 2.24 93.25
12 0.00 100.00 1.98 95.23
13 0.00 100.00 1.52 96.75
14 0.00 100.00 1.11 97.85
15 0.00 100.00 0.70 98.55
16 0.00 100.00 0.56 99.12
17 0.00 100.00 0.46 99.58
18 0.00 100.00 0.42 100.00
Kohonen and counterpropagation ANNs were imple-
mented.
InFig. 2, the biplot resulting from PCA of the watersamples represented with 18 variables is shown. It can
be seen that the first component, PC1, is associated
with a group of variables such as nitrite, nitrate con-
centrations, phosphate, suspended solids, AOX, COD,
and BOD. The second component PC2 represents
mainly the dependence on temperature (variables 1
and 2, printed bold inFig. 2, correspond in Table 1
to v2 and v3, respectively). It is evident from Fig. 2
that samples separated from the main central cluster
and distributed in the region of larger values of PC1
were all collected before the year 1994 (sample labels
-
8/10/2019 Espiritu 4.pdf
11/14
D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 97
Fig. 2. Biplot (scores and loadings) of 207 samples and 18 variables in the PC12 co-ordinate system for water samples of river Mura.
The sample numbers from 1 to 207 are given in Table 1, while the original variables (118, printed bold in the biplot) forming the PC1
and 2 components are defined in Table 1 as v219 because the water flow (the variable v1) was previously eliminated (explained in the
paragraph statistical screening of data).
ANN was trained for 240 epochs, which was sufficient
for a satisfactory recognition of the training samples.
The 18 components of each samples vector repre-
sentation are physico-chemical variables describedin Section 2. The maximal and minimal correction
factors in the modelling procedure were 0.4 and 0.01,
respectively. The prediction results of the 56 training
samples are shown inFig. 3.
In Fig. 3, the regression line between the experimen-
tal and predicted biological class numbers of training
samples is shown. The standard deviation of prediction
residuals, SEP = 0.247, and the correlation coeffi-
cientR = 0.958 prove that the CP ANN model trained
with 56 samples describes a good correlation between
18-component vector representation of samples
(physicochemical properties) and biological classes.
The constructed model was tested with remaining
151 samples (out of 207) for which the biologicalclass was not known nor determined experimentally.
Since there is no information about experimental bi-
ological classes for these 151 samples, the quality of
the prediction results can not be confirmed. However,
the trend of improving the water quality assessed by
the biological classification of 56 training samples is
obvious. The resulting predictions with respect to the
sampling year are shown inFig. 4.
The biological classes predicted for 151 samples
show the same trend of improvement of the quality
-
8/10/2019 Espiritu 4.pdf
12/14
98 D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100
Fig. 3. Regression line of predictions of 56 training samples with constructed CP ANN model. A and B are the estimated parameters,
intercept and slope, of the regression line. Their standard errors are also given. S.D. is estimated standard deviation of the fitting, and R
the correlation coefficient between the experimental and predicted biological classes.
Fig. 4. The prediction of biological class numbers of 151 samples using CP ANN model. The samples are discriminated by the year in
which they were gathered.
-
8/10/2019 Espiritu 4.pdf
13/14
D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 99
of water as observed for the 56 training samples. The
biological investigations are time consuming in com-
parison to the determination of physico-chemical pa-
rameters and already a rough prediction of biologicalclass numbers is helpful.
4. Conclusions
The study has given us the opportunity to follow
all processes involved in the complex system of sur-
face water pollution. The time series on overall pol-
lution levels as well as results of specific measuring
parameters are important indicators and can be used
for planing short term and long term preventive action.
In this work, standard multivariate statistical methods
and PCA was used for pre-screening of the data. It
was shown that it is necessary to use autoscaled vari-
ables. From the results, it was concluded that the PCA
method is not discriminant enough since the variables
are weakly correlated. Less than 50% variance is ex-
plained in the first two principal components. For the
classification of this kind of data the non-linear meth-
ods such as artificial neural networks are more suit-
able. The artificial neural networks were implemented
as the method for clustering of all 207 water samples
as well as for the predictions of biological classes. Theanalysis has shown that AOX content is the parame-
ter with the greatest discriminating power. The results
obtained from the evaluation of data gathered during
the 9-years monitoring of Mura river water confirmed
that the improvement of the quality of water during the
last 9 years is significant and, therefore, the Austrian
Project for improving the quality of rivers can be con-
sidered as successful.
One of the goals of the research presented in
this work was to find correlation between biologi-
cal classes and chemical parameters. Because of thetime-consuming biological analyses, only a small
amount of water samples were chosen for the pro-
cedure of determination of biological classes. The
experience-based CP ANN model was built using the
water samples for which the biological activity was
known. By the constructed model the rest of the sam-
ples were examined to obtain the prediction of bio-
logical activity. The predicted values were in the same
range as training samples values; besides, from the
predicted biological activities the trend of the water
quality improvement was evident. Although the usual
validation procedures to estimate the quality of the
model were not applicable because of low number of
available training samples, the overview of predictionresults indicates that the biological activity obtained
from the proposed model is of significant value in the
case that the experimental values are not available.
Acknowledgements
The authors thank the Ministry of Education,
Science and Sport of Republic of Slovenia, con-
tract numbers P1-0507-0104, and P1-0508-0104 for
financial support. The Amt der Steiermarkischen
Landesregierung, Graz, Austria, is kindly acknowl-
edged for completing the data about Mura river water
samples with their results.
References
[1] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De
Jong, P.J. Lewi, J.S. Verbeke, Handbook of Chemometrics
and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.
[2] P. Zhang, N. Dudley, A.M. Ure, D. Littlejohn, Anal. Chim.
Acta 258 (1992) 110.
[3] W.D. Alberto, D.M. Del Pilar, A.M. Valeria, P.S. Fabiana,H.A. Cecilia, B.M. De Los Angeles, Water Res. 35 (2001)
28812894.
[4] R. Lindegren, M. Josefson, Chemometr. Intell. Lab. Syst. 44
(1998) 403409.
[5] A.K. Meng, I.H. Suffet, Environ. Sci. Technol. 31 (1997)
337345.
[6] E. Marengo, M.C. Gennaro, D. Giacosa, C. Abrigo, G. Saini,
M.T. Avignone, Anal. Chim. Acta 317 (1995) 5363.
[7] W.M. Jarman, G.W. Johnson, C.E. Bacon, J.A. Davis, R.W.
Risebrough, R. Ramer, Fresenius J. Anal. Chem. 359 (1997)
254260.
[8] P. Barbieri, G. Adami, A. Favretto, E. Reisenhofer, Fresenius
J. Anal. Chem. 361 (1998) 349352.
[9] M.M.C. Ferreira, C.G. Faria, E.T. Paes, Chemometr. Intell.Lab. Syst. 47 (1999) 289297.
[10] J.B. Marzo, M.J.M. Hernandez, S. Sagrado, E. Bonet, R.
Gimenes, J. Chemometr. 12 (1998) 323336.
[11] M.P. Kallio, S.P. Mujunen, G. Hatzimihalis, P. Koutoufides,
P. Minkkinen, P.J. Wilkie, M.A. Connor, Anal. Chim. Acta
393 (1999) 181191.
[12] M.F. Wilkins, L. Boddy, C.W. Morris, Binary-Comput.
Microb. 6 (1994) 6472.
[13] Water Quality-SamplingPart 11: Guidance on Sampling of
Ground Waters, ISO 5667-11: 1992 (E).
[14] Water Quality, Guidelines for the Determination of Total
Organic Carbon (TOC), ISO 8245: 1987 (E).
-
8/10/2019 Espiritu 4.pdf
14/14