Review Article Volume 7 Issue 4
1Department of Water and Environmental Engineering, School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor, Malaysia
2Department of Geography, Federal University Birnin kebbi. P.M.B. 1057. Kebbi State, Nigeria
3Department of Geography, Usmanu Danfodiyo University Sokoto, P.M.B. 2346. Sokoto State, Nigeria
4Department of Water Resources and Environmental Management, National Water Resources Institute, P.M.B. 2309. Kaduna State, Nigeria
5Department of Geology, Ahmadu Bello University Zaria. P.M.B. 1045. Kaduna State, Nigeria
6Department of Physics with Electronics, Federal University Birnin kebbi, P.M.B 1157. Kebbi State, Nigeria
Correspondence: Saadu Umar Wali, Department of Geography, Federal University Birnin kebbi. P.M.B. 1057. Kebbi State, Nigeria
Received: August 13, 2023 | Published: August 30, 2023
Citation: Wali SU, Alias N, Harun SB, et al. An integrated approach for understanding natural - and anthropogenic controls on water quality in arid and semiarid environments. Int J Hydro. 2023;7(4):168-180. DOI: 10.15406/ijh.2023.07.00353
The objective of this review is to highlight the need for an integrated approach to the understanding of the major processes controlling the hydrochemical composition of water bodies in drylands using multivariate statistics, water quality index and heavy metal pollution index. The integrated approach to the hydrochemical investigation of streams and aquifers in drylands is essential owing to their distinctive climate, notably, low rainfall and high temperature. Studies on water quality in arid and semi-arid areas using multivariate analysis and water quality indices were scrutinized. Results showed that the hydrochemistry of streams and aquifers is controlled by both the natural geogenic processes and anthropogenic activities. However, in-depth understanding of geochemistry and land use types, as well as climatic vagaries, is required, to be able to discriminate these processes, since several ions of rock minerals origin are increasingly being added into the environment through human activities. While the sources of solutes and processes controlling the hydrochemistry of streams and aquifers can be established through application of multivariate analysis, this technique is limited in water quality investigations since it cannot measure the suitability of water for domestic, agriculture and industrial uses. Thus, an integrated approach incorporating water quality indices in conjunction with multivariate analysis is required. This is essential owing to the fact that the suitability of water for various uses is central to any hydrogeochemical investigation in arid and semi-arid environments. Thus, it is expected that future hydrochemical studies would apply this approach.
Keywords: natural geogenic and anthropogenic processes, correlation analysis, principal component analysis, hierarchical cluster analysis, water quality index, heavy metal pollution index
Arid and semi-arid areas (Figure 1), draining wide basins and overlying extensive aquifer(s), e.g., Iullemedden basin, West Africa,1,2 Saladin Province, Iraq,3 Goshen Valley, UT, USA,4 Northwest China,5 and Hamedan-Bahar plain, Iran.6 These areas are largely fortified with irrigation facilities that, combined with the application of agrochemicals, make possible areas of intensive irrigation farming. The survival and throughput of these arid areas depend on the availability of water of acceptable quality. A decline in the accessibility or quality of water will consequently affect human health and wellbeing as well as agricultural output which, given the position of these areas for food production could have detrimental effects to the immediate environment and beyond.7 Arid and semi-arid areas cover approximately 40% of the global landmass and are occupied by 37% of the global population.7 Though there is no general definition of the term dryland, FAO8 classifies drylands into four categories based on their precipitation (p) and associated potential evapotranspiration (ETP). The four types of drylands are: (1) p/ETP below 0.03 (hyper-arid), comprised of areas of barren land and precipitation below (pb) 100 mm; (2) p/ETP varying between 0.03 to 0.2 mm (arid) and pb ~300 mm; (3) p/ETP 0.2 to 0.5 (semi-arid) and p fluctuating between 300 and 500 mm and rising to 800 mm in tropical regions with very short rain season; and (4) p/ETP oscillating between 0.5 to 0.75 mm - sub-humid.7
Figure 1 World map of aridity zones. Retrieved from http://www.unesco.org/mab/ doc.html, 22/07/2019.
In this review, all four subtypes of drylands are counted in within the term “semi-arid and arid areas” or drylands. The climatic features common to all drylands largely include uneven distribution of annual rainfall, drought phases, rate of evaporation greater than annual precipitation, a dry and wet season within a year, and scanty dispersal of native vegetation.9,10 Freshwater resources in these regions are over-exploited as a result of low rainfall and slight aquifer recharge consequent of the growing demand of water from industry, domestic water supply, and dry season farming.9,11,12 Water scarcity has become endemic in arid and semi-arid areas, particularly in developing countries.13–15 Thus, groundwater occupies a central role in the water supply of these areas and is increasingly gaining ground in the supply of water to both rural and urban centres.16,5 For instance, estimates show that at least one-third of the global population relies on groundwater for drinking.17 Thus, understanding of the hydrogeochemical characteristics of sources of water supply plays a vital role in water resources management, especially as it relates to water quality from aquifers and streams. This allows for water classification for various uses. Thus, it is essential to understand the hydrochemical composition of streams and aquifers and its evolution under environmental processes for effective management and sustainable utilization of water resources.
The quality of water basically depend on diverse hydrogeochemical and biological processes that take place over space and time in a stream or groundwater aquifer.18–20 The variation of stream or groundwater quality is the joint effect of anthropogenic and natural geogenic processes, such as leaching of organic and inorganic fertilizers, biogeochemical processes, mixing of surface and groundwater, dissolution of minerals/precipitation, cation exchange, reduction/oxidation, composition of precipitation and geological formation underlying groundwater aquifers or channels over which streams pass. Other anthropogenic activities affecting water quality include mining, urbanization, industry and improper sewage disposal (Figure 2). Consequently, different water types are produced as a result of the interaction of these factors.17 Generally, studies on water quality of streams and aquifers are deemed beneficial in detecting processes that govern the hydrochemical composition of streams and groundwater aquifers.22–24,17
Figure 2 Geogenic and anthropogenic controls on water quality. After Andrade.21
Hydrochemical transformation in surface and groundwater in drylands present a thought-provoking topic,25–27 owing to difficulties involved in understanding the interaction between surface and groundwater. Though pollutant wash-off,28 municipal and industrial sewage,29,30 agriculture and mining dominate pollutants pools;31,32 streams and catchments, floodplain inundation and infiltration of water during and after rainfall event, initiate pollutant load to groundwater aquifers.33–35 Therefore, understanding the factors controlling the hydrochemical composition of streams and aquifers is necessary for the overall management of water resources. This can be achieved by the application of simple statistical techniques, which enable understanding the types of natural and anthropogenic processes controlling water chemistry.36,22 Likewise, numerous indices such as water quality index (WQI) and heavy metal pollution index (HPI),37,38 are used to evaluate the water quality and determine its suitability for drinking, agriculture and industrial uses.
Conversely, simple statistical techniques such as correlation, PCA, FA and HCA,39,40 can be used to identify the origin of salinity in streams and aquifers (Table 1). The application of WQI and HPI in conjunction with multivariate statistics can provide vibrant evidence relating to the sources of solutes and the geochemical and/or anthropogenic processes related to water composition. These methods perhaps present simple analytical tools for the evaluation and management of water quality in drylands. Remarkably, a large portion of global arid areas occurs in an underdeveloped part of the world (e.g. North Africa, Middle East, West Africa), where resources needed for large scale water quality evaluation is seldom available. Consequently, the significance of these tools lies with the fact that they are easy to apply, cheap and are able to provide the needed results on the hydrochemical composition of streams and aquifers both at local and regional scales. It is against this background that this review aimed at designing an integrated approach to water quality investigation in arid and semi-arid areas.
Water quality index
Water quality index (WQI), is measured as a formidable technique that can offer a wide-ranging delineation of sources of potable water. The WQI is the degree that mirrors the combined effect of multiple water quality parameters.41–46,38 It is computed by assigning discrete weights (wi) to each chemical parameter in a scale of 1 (smallest impact on water quality) to 5 (greatest impact on water quality) founded on their supposed impact on human health and based on its relative significance in the quality of drinking water.47 Parameters that have grave health concerns and whose existence beyond the critical absorption limits could affect the usability of the water for domestic and drinking purposes (e.g. Cl, TDS, NO3-, Pb, Cd, As and SO42-) were allotted highest weight of 5, whereas, parameters that have an inconsequential role in water quality evaluation like K were allotted the minimum weight of 1 (Table 2). The intermediate parameters, including pH, EC, TH, HCO3, Ca, Mg, were assigned weights between 2 and 4 depending on their relative significance in the water quality evaluation. The relative weight (Wi) is calculated from Eq. 1:
S/no |
Parameters |
Unit |
WHO Standards (2011) |
Weight (Wi) |
Relative weight (wi) |
1 |
Al |
mg/l |
0.1-0.2 |
2 |
0.029 |
2 |
B |
mg/l |
0.5 |
2 |
0.029 |
3 |
Ba |
mg/l |
0.7* |
2 |
0.029 |
4 |
Ca |
mg/l |
500 |
2 |
0.029 |
5 |
Cl |
mg/l |
200 |
5 |
0.071 |
7 |
Cu |
mg/l |
1 |
2 |
0.029 |
8 |
EC |
µS/cm |
1000 |
5 |
0.071 |
9 |
F |
mg/l |
1.5* |
2 |
0.029 |
11 |
Fe |
mg/l |
0.3 |
2 |
0.029 |
11 |
HC03 |
mg/l |
250 |
3 |
0.043 |
12 |
K |
mg/l |
12 |
1 |
0.014 |
13 |
Li |
mg/l |
0.7** |
2 |
0.029 |
14 |
Mg |
mg/l |
125 |
2 |
0.029 |
15 |
Mn |
mg/l |
0.4 |
2 |
0.029 |
16 |
Na |
mg/l |
200 |
2 |
0.029 |
17 |
NO3 |
mg/l |
50 |
5 |
0.071 |
18 |
Pb |
mg/l |
0.01* |
4 |
0.057 |
29 |
pH |
- |
6.5-8.5 |
4 |
0.057 |
20 |
PO4 |
mg/l |
0.2 |
5 |
0.071 |
21 |
SO4 |
mg/l |
125-130 |
4 |
0.057 |
22 |
TDS |
mg/l |
500 |
5 |
0.071 |
23 |
TH |
mg/l |
200 |
3 |
0.043 |
24 |
Zn |
mg/l |
3 |
2 |
0.029 |
Table 2 Example of the relative weight of chemical parameters
*: WHO (2006); **EPA (2018); EC: Electrical Conductivity; TH: Total Hardness
Eq.1
where the relative weight = , the weight of each parameter = , and n is the number of parameters. For instance, the calculated relative weight ( ) values of individual parameters can be given as in Table 2.
The quality rating (qi) for individual parameters is allotted by dividing its concentration in each water sample by its reference values given by the World Health Organization (WHO), and the result is converted to a percentage (%) by multiplying 100:
Eq.2
where the quality rating is , the concentration of each chemical parameter in individual water samples in mg/l is , and the drinking water standard for the individual chemical parameter in mg/l based on the guidelines of the WHO, is . The SIi value is first computed using Eq. 3 before WQI is calculated. The equation is thus:
Eq.3
Eq.4
where, is the sub-index of the ith parameter; is the quality rating based on the concentration of the ith parameter. Accordingly, the calculated WQI values are normally grouped into five classes, viz: Excellent Water (<50); Good Water (50-100); Poor Water (100-200); Very Poor Water (200-300); and Unsuitable for Drinking (>300). Table 3 presents examples of literature reports on WQI in arid and semi-arid areas. Evaluation of WQI from 699 sampling sites (Table 3) showed that 235 (33.62%) of water sources in arid and semi-arid areas fall in Excellent Class, 178 (25.46%) fall in Good Class, 235 (33.62%) fall in Poor Class, 37 (5.29%) fall in Very Poor Class and 24 (3.43%) fall in Unsuitable Class (Figure 3).
S/no |
Study |
Region/ Country |
No. of sampling |
Range/Classification |
||||
Excellent (<50) |
Good (50-100) |
Poor (100-200) |
Very poor (200-300) |
Unsuitable (>300) |
||||
1 |
Aminiyan et al.37 |
Karoon river, Iran |
14 |
- |
15 |
- |
- |
- |
2 |
Bouteraa et al.38 |
Boumerzoug-El Khroub valley, NE Algeria |
26 |
- |
25 |
1 |
- |
- |
3 |
Eslami et al.42 |
Jiroft, Iran |
105 |
105 |
- |
- |
- |
- |
4 |
Machiwal and Jha44 |
Udaipur district, Rajasthan, India |
53 |
- |
53 |
- |
- |
- |
5 |
Mahfooz et al.45 |
Faisalabad, Pakistan |
34 |
27 |
4 |
3 |
||
6 |
Subba Rao et al.46 |
Wanaparthy District, Telangana, India |
15 |
7 |
8 |
1 |
- |
- |
7 |
Ketata-Rokbani et al.104 |
El Khairat, Tunisian Sahel |
17 |
1 |
2 |
10 |
4 |
- |
8 |
Pei-yue et al.126 |
Ningxia, Northwest China |
47 |
53 |
14 |
6 |
1 |
- |
9 |
Rocha et al.130 |
Upper Jaguaribe River, Brazil |
16 |
1 |
16 |
- |
- |
- |
10 |
Sadat-Noori et al.131 |
Saveh-Nobaran aquifer, Iran |
58 |
8 |
11 |
16 |
6 |
17 |
11 |
Vasanthavigar et al.140 |
Tamilnadu, India |
148 |
5 |
45 |
84 |
15 |
- |
12 |
Wilson et al.143 |
Mayo Tsanaga River Basin, Cameroon |
100 |
30 |
- |
70 |
- |
- |
13 |
Xiao et al.145 |
Tarim River Basin, NW China |
42 |
5 |
15 |
15 |
3 |
4 |
14 |
Singh et al.125 |
Bokaro, Central African Republic |
14 |
- |
4 |
6 |
4 |
- |
15 |
Abbasnia et al.78 |
Sistan-Baluchistan, Iran |
10 |
20 |
10 |
- |
- |
- |
Total |
699 |
235 |
178 |
235 |
37 |
24 |
||
Percentage |
- |
33.62 |
25.46 |
33.62 |
5.29 |
3.43 |
Table 3 Literature report using WQI in arid and semi-arid areas
Heavy metal pollution index
The heavy metal pollution index (HPI) is an outstanding tool for the evaluation of general pollution of water bodies with respect to heavy metals.47 The HPI is built on weighted mathematical quality (Table 4). Weights ( ) between 0 and 1 were allocated for individual metals and the hazardous contamination index is 100 in this indexing.48–51,40,47 The ranking is built on individual quality concerns, the relative importance of parameters and delineated as inversely proportional to the recommended standards ( ) for the individual parameter. First of all, the HPI computation requires the calculation of weightage ( ) of parameter employing the equation below:
S/n |
Heavy metals |
Wi (k) |
Mean concentration (Mi) |
Unit weightage (Wi) |
Standard permissible value (Si) |
Wi x Qi |
Wi x Qi |
HPI (∑ Wi x Qi / ∑ Wi) |
1 |
Ag |
1 |
- |
0.5 |
2 |
- |
- |
- |
2 |
As |
1 |
- |
0.1 |
10 |
- |
- |
- |
3 |
B |
1 |
- |
0.000417 |
2400 |
- |
- |
- |
4 |
Ba |
1 |
- |
0.000769 |
1300 |
- |
- |
- |
5 |
Cd |
1 |
- |
0.333333 |
3 |
- |
- |
- |
6 |
Cr |
1 |
- |
0.02 |
50 |
- |
- |
- |
7 |
Cu |
1 |
- |
0.0005 |
2000 |
- |
- |
- |
8 |
Fe |
1 |
- |
0.0005 |
2000 |
- |
- |
- |
9 |
Hg |
1 |
- |
0.166667 |
6 |
- |
- |
- |
10 |
Mn |
1 |
- |
0.0025 |
400 |
- |
- |
- |
11 |
Mo |
1 |
- |
0.05 |
20 |
- |
- |
- |
12 |
Ni |
1 |
- |
0.014286 |
70 |
- |
- |
- |
13 |
Pb |
1 |
- |
0.1 |
10 |
- |
- |
- |
14 |
U |
1 |
- |
0.033333 |
30 |
- |
- |
- |
15 |
Zn |
1 |
- |
0.000333 |
3000 |
- |
- |
- |
∑ 15 |
∑ 1.343 |
Table 4 Example of the computation of relative weight of heavy metal pollution
Note:
HPI=
Eq.5
where the proportionality constant is k and the standard reference value of parameter (based on WHO reference standard) is . Secondly, the computation of water quality is ( ) ranking for individual heavy metals:
Eq.6
where the sub-index is of parameter is , the supervised value of the parameter (µg/L) is and the permissible limit or standard value for the parameter is . The concentrations of individual pollutants after calculation of the results can be converted into HPI using Eq. 6. The computed results will be presented as outlined in Table 4.
Eq.7
Like WQI, HPI also aids understanding of water quality by measuring the range into which water pollution is likely to fall. Conversely, it is important to note that, computation of HPI is not a substitute for poor field sampling or laboratory analyses. Therefore, appropriate field sampling and laboratory analyses are essential for accurate reporting of HPI. Example of literature reports on HPI in arid and semi-arid areas is summarized in Table 5.
S/no |
Study |
Region/ Country |
No. of Sampling |
Range/Classification |
||||
Excellent (0-25) |
Good (26-50) |
Poor (51-75) |
Very poor (76-100) |
Unsuitable (>100) |
||||
1 |
Basahi et al.40 |
Wadi Baysh Basin, western Saudi Arabia |
49 |
17 |
27 |
23 |
66 |
|
182 |
||||||||
2 |
El-Ameir48 |
Damietta Branch of Nile River, Egypt |
4 |
- |
- |
- |
- |
4 |
3 |
Ma et al.150 |
Yellow River, Northern China |
10 |
- |
- |
- |
- |
- |
4 |
Maurya and Srivastava50 |
Agra districts of Uttar Pradesh, India |
12 |
- |
- |
- |
- |
|
12 |
||||||||
5 |
Ehya and Marbouti55 |
Behbahan plain, SW Iran |
30 |
- |
- |
- |
1 |
29 |
6 |
Kumssa et al.110 |
North Rift |
- |
7 |
2 |
- |
- |
|
and North Eastern Kenya |
9 |
|||||||
7 |
Mehrabi et al.115 |
Ahangaran mining district, west of Iran |
28 |
28 |
- |
- |
- |
- |
8 |
Vesali Naseh et al.142 |
Ghaen Plain, Iran |
16 |
16 |
- |
- |
- |
- |
9 |
Yazidi et al.147 |
Ichkeul Lake, Northern Tunisia |
20 |
20 |
- |
- |
- |
- |
10 |
Abu Khatita et al.79 |
South Eastern Sinai, Egypt |
35 |
5 |
4 |
2 |
4 |
20 |
11 |
Kwaya et al.49 |
Maru town and environs, NW, Nigeria |
29 |
8 |
1 |
- |
- |
20 |
12 |
Khazaala et al.105 |
Lake Habbaniyah, Al-Anbar, Iraq |
50 |
32 |
8 |
5 |
5 |
- |
Total |
425 |
180 |
37 |
36 |
33 |
139 |
||
Percentage |
- |
42.35 |
8.71 |
8.47 |
7.76 |
32.71 |
Table 5 Example of literature report of water quality classification using HPI
The computed HPI from the literature reports, showed that 180/425(42.35%) of sources of water supply in drylands fall in Excellent Class, 37/425(8.71%) fall in Good Class, 36/425(8.47%) fall in Poor Class, 33/425(7.76%) fall in Very Poor Class and 139/425(33.71%) fall in Unsuitable Class (Figure 4). Overall, 51.06% of sources of water supply fall in Excellent – Good Class, whereas, 48.94% fall Poor-Unsuitable Class.
Understanding controls on water quality
An integrated assessment of water quality using WQI and HPI, together with statistical techniques, notably Correlation (r), Principal Component Analysis (PCA), Factor Analysis (FA) and Hierarchical Cluster Analysis (HCA) in arid and semi-arid environments will aid understanding of the origin of solutes and processes controlling water composition in streams and aquifers. Assessment of water quality in drylands is essentially a multivariate problem owing to a wide range of physicochemical parameters (variables) related to several sampling locations (or observations).52–63,39,40 Consequently, Correlation (r), PCA/FA, and HCA are increasingly used to analyze hydrochemical data and have been recognized as suitable statistical techniques for the understanding hydrogeochemical composition of water bodies (Table 1).
Correlation analysis
Correlation analysis provides a basic tool for studying water/rock mineral interactions. Concentrations levels and relations between elements can expose the source of solutes and the processes that produced the detected water chemistry.64 It could be assumed that a substantial amount of HCO3 come from dissolution of carbonate minerals in streams and aquifers via the action of infiltering water (recharge) enhanced with CO2 after being in interaction with the atmosphere. Therefore, the dissolution of carbonates minerals releases Ca into solution, producing Ca-HCO3 water type.64 Calculation of the slops of Ca, Mg and Na with HCO3 give valuable evidence relating to the stoichiometry of the process. All these can be understood using correlation analysis. For instance, significant relationship (r = ≥0.50) between Ca and HCO3, suggests that Ca is derived from calcite.64
However, poor correlation between these elements (r= ≤0.40) may suggest that Ca ions originate from the dissolution of gypsum which can contribute SO4 and Ca ions in streams and aquifers. Thus, significant correlations (r= ≥0.50) between Ca and SO4 may suggest that Ca is derived from Gypsum.64 If Ca and Mg correlates significantly (r= ≥0.50), it indicates that the two ions have the same origin. In the same vein, significant correlations (r= ≥0.50) between SO4 and Mg, is an indicator that parts of SO4 and Mg are derived from magnesium sulfate minerals.64 Conversely, if HCO3, SO4, Mg and Ca originate from the simple dissolution of gypsum, dolomite and calcite rocks, then a charge balance must exist between the cations and anions. In addition, if there is a deficiency of (Ca + Ma) in comparison with (HCO3 + SO4) and (HCO3 + SO4) relative to (Ca +Mg), then the excess positive charge of Ca and Mg would be balanced by Cl, the only major anion. This may further suggest that HCO3, SO4, Mg and Ca are not derived from gypsum, dolomite and calcite minerals.64
Though anthropogenic inputs can be measured through variations in TDS between sampling locations, ions including Na, SO4, NO3, and Cl in streams and aquifers can also be derived from anthropogenic sources -municipal wastes, fertilizer application, and organic wastes. Thus, a significant correlation (r= ≥0.50) between TDS and these ions is a strong indicator of water pollution from anthropogenic activities.64 However, a significant correlation (r= ≥0.50) between ions derived from rock mineral and TDS, may suggest silicate weathering reaction. It is equally important to note that, some elements that are derived primarily from rock minerals such as Cd, Cr, and Pb can also be added into the environment through industrial sewage. For instance, Cd and Cr are added into surface waters from sewage ejections from dyeing plants, textile, paint, electroplating, and tanning industries. While Pb is primarily derived from ores, substantial amounts of Pb can enter surface and groundwater from effluent ejections.65
Although correlations analysis can be used to establish the origin of ions in surface and groundwater, in-depth analyses of geology and land use is required. Otherwise, drawing conclusions relating to the origin of pollutants in streams and aquifer can be vague. In-depth knowledge of geochemistry of study area and land use types in addition to correlations analysis will aid understanding of the origin of heavy metals such as Cd, Cr, and Pb in streams and aquifers. A significant correlation (r= ≥0.50) between TDS with ions that can be derived from anthropogenic sources such as NO3, Cl and Na can indicate water pollution from anthropogenic activities. Thus, significant correlation (r= ≥0.50) between NO3 + Cl/Na+ molar ratio and NO3 + Cl-/HCO3 molar ratio, can be used to further supports the anthropogenic inputs or impact of urbanization on water quality.64
Principal component analysis
Among the leading multivariate statistical techniques applied in the interpretation of hydrochemical data is principal component analysis.65–69,38 The PCA which is multivariate statistical method is applied to reduce the size of hydrochemical data, which tend to be intercorrelated to a less important set of ‘principal components’ (PCs) which can be interpreted.44 Basically, PCA comprises of two steps, standardization of data and extraction of PCs.44 The data contained in a correlation matrix is taken by the PCA and reordered in a way that better explains the fundamental processes that produced the observed concentration of ions. The PCA starts by generating a new collection of hydrochemical variables from the original dataset (i.e. PCs) which are a linear arrangement of original parameters. The eigenvectors and eigenvalues are first extracted by the PCA of the correlation matrix and then remove the less significant observations. Subsequently, PCs of the dataset are transformed from eigenvectors. Therefore, the first PC describes the larger part of the variance, while subsequent PCs describe recurrently reduced parts of the variance. How the PCs illustrate significant relationships (negative or positive) between hydrochemical variables and PC relating the variable is revealed by the PC loadings. For instance, PCs with high positive loadings (r= ≥0.65) of ions such as NO3, Cl, PO4, and Na can be related to anthropogenic inputs, in the absence of geologic sources in the study area.70
In PCA, Kaiser Normalization Criterion,71 can be used to define the number of PCs to be extracted. This can best define the variance of analyzed hydrochemical data (i.e. eigenvalue >1), which can be used for additional analysis. How best the variance of a certain hydrochemical variable is explained by a specific set of factors is measured as ‘commonality’. Communalities retained in PCs or number of variables is derived by squaring the parameters in the PC matrix and adding the sum within each parameter. Preferably, if a PCA is effective, PCs will be easily interpretable in terms of specific processes influencing the hydrochemical composition of a stream or groundwater aquifer. Thus, commonalities will be high (~1) and number of PCs will be less. In water quality analysis, PCA is performed on a subset of selected variables (e.g. pH, EC, Temp., TDS, TSS, Ca, Mg, Na, K, HCO3, Cl, SO4, H4SiO4, Al, Ba, Be, Fe, Li, Mn, Pb, Se, and Sr), which may represent the overall water quality outline.
Though some related data on the variability between objects (parameters) or sampling locations (observation) may be lost via transformation, the explanation of the system is significantly abridged and it can be simply envisaged to derive suitable evidence on the relationship between parameters and observations.72 The PCA bilinear model can be rearranged following the matrix decomposition equation,72 thus;
Eq.8
where X represents a matrix of data which is compressed into T which is the scores of matrices, PT is a matrix of loadings and E is matrix residual.72 The scores of matrices provide information relating to the patterns of loadings or pollution sources between observations. Information about the influence of the original variables to each one of these physicochemical patterns or sources will usually provide the matrix of loadings.
The principle of PCA
For a better understanding of why PCA is commonly applied in water quality studies, the following principles should be noted:73
The data structure of the matrix is often revealed, once the boundaries of the PCA technique and the scores-scores illustration are detected. Even though, a mathematical or statistical technique will process a collection of figures whether they are analytically expressive or not; it is the duty of the user to make sure the suitable quality of the data. Although there may not be anything worthy to realize after the application of statistical analysis, PCA is a technique of transforming data, which simplifies four points (a-d) outlined above. Subsequent to this procedure, the new axes, termed principal components (or PCs), are selected based on a linear model (Eq. 9) so that PC1 defines the greatest variance in the data set; which is trailed by PC2, which defines the second greatest expanse of variance within the data set, but which is built orthogonally to PC1, and accordingly, is autonomous to the PC1.
Eq.9
where PCjk is the value of the principal component, j for object k (the score value for object j on component k), aj1 is the loading of variable 1 on component j, xk1 is the measurement value for variable 1 on object k and n is the total number of variables studied. Such analysis can be repetitive until the number of PCs is equivalent to the number of the primary variables. The benefit of the PCA is that the variance in the data set is mostly confined in the first few PCs, hence the reduction in size or magnitude of the multivariate matrix.
Interpretation of PC loadings
The extraction of certain components (often 1-5) is normally built on the proportion of variance accrued, which included a percentage greater than 80%.74,70 For instance, based on ‘scree test’, the certain defined PCs with typical PC loadings suggest different noticeable contributions were involved in controlling the hydrochemical composition of streams and aquifers. The PC 1 which explained the largest variance in the data matrix, may have high positive loadings on particular ions of either rock-mineral derivative or anthropogenic origin. If these ions are of rock mineral origin or anthropogenic sources, then the factor can be related to either of the two sources of solutes in streams and aquifers.
However, relating ions may require a thorough understanding of geochemistry and land use practice in the study area. For instance, if a particular factor is having high positive loadings (≥0.65) on physical parameters such as EC and Temperature, it could be deemed reasonable since Temperature level is closely associated with EC level in streams and groundwater aquifers. The later rises by 2% with an elevated temperature of 1oC. Temperature range between 5 to10˚C in gravity flow water also affects TDS levels, which eventually disturbs solubility of gasses, ion exchange capacity, redox reaction, sorption processes, complexation, speciation, and pH level (EPA, 2001).
For a better interpretation of factor loadings and relating them to either rock mineral or anthropogenic activities in drylands, a seasonal sampling approach must be employed. Because the behavior of certain ions tends to be correlated with seasonal rainfall and/or recharge. For instance, negative loadings on pH in a particular PC(s) can be deemed reasonable since pH usually attained a converse relationship with ions of carbonate origin.74 The application of PCA can provide the needed information for the hydrochemical and geographical understanding of the data (Kokot and Stewart, 1995; Kokot et al., 1998; Olsen et al., 2012).73,75 However, extracting this type of information may require submission of the PCA scores to another multivariate statistical tool for unverified classification analysis using hierarchical clustering analysis (HCA).
Hierarchical cluster analysis
The objective of applying HCA in water quality studies is too grouped sampling locations that have similar hydrochemical attributes into classes (i.e. clusters). Such a clustering technique would help in recognizing hydrochemical data sets of the locations base on the sources of solutes, i.e. anthropogenic or natural.74,44 The HCA is an unverified outline identification method that exposes inherent assembly or pattern recognition of a dataset without a prior hypothesis with regards to the data so that the objects of the system can be classified into clusters based on their resemblances.74,44
Basically, there are two major categories of HCA: (i) non-hierarchical; and (ii) hierarchical. The former is a widely used technique which can form clusters consecutively, beginning with the most identical pair of parameters and forming complex clusters after each step which is repeated until a single cluster comprising all the observations is attained.74,44 The results are presented as a dendrogram, which offers a graphic summary presenting an image of the clusters and their closeness with a studied decrease in the dimensionality of original observations.74,44 In clustering, observations with similar characteristics or else observations with dissimilarity would be collected into an identical group.74,44 Often the Ward’s-algorithmic clustering technique subsequent to the squared Euclidean distance, is applied. This is measured as the most influential means of clustering.74,44 Before the clustering analysis, the hydrochemical data, xji is standardized by Z-scale transformation, Eq. 10:
Eq.10
where xji = value of the jth hydrochemical parameter measured at ith location, ẋj = mean (spatial) value of the jth parameter and Sj = standard deviation (spatial) of the jth parameter.
The clustering achieved with standardized data is anticipated to be influenced less by the large and/or small variance of the hydrochemical data. Also, the influence of diverse measurement units of the data can be removed by making the data dimensionless.44 In water quality studies HCA is performed on a subset of selected variables (e.g. pH, EC, Temp., TDS, TSS, Ca, Mg, Na, K, HCO3, Cl, SO4, H4SiO4, Al, Ba, Be, Fe, Li, Mn, Pb, Se, and Sr), which represented the overall water chemistry outline. Depending on geology, land use and the studied ions, individual clusters can be related to the natural geogenic processes or anthropogenic activities.
The need for integrated approach to water quality analyses in drylands
Comparison of studies in Table 1 showed that 15.57% of studies have measurements on WQI and 12.29% have measured HPI. This suggests that most of water quality studies in arid and semi-arid areas, do not have adequate reporting on the WQI and HPI. Thus, water suitability for drinking, agriculture, and industrial uses remained poorly known in arid areas (Figure 5). In contrast, 37.70%, 67.21%, and 64.75% have applied correlation analysis, PCA and HCA in water quality investigations, indicative of the wider extent to which processes controlling water chemistry is understood in global drylands (Figure 6). However, the application of these statistical techniques alone is not enough, for the reason that the concentrations of ions in sources of water supply in relation to their suitability for drinking, agriculture, and industrial uses cannot be revealed by mere statistical applications; this calls for an integrated approach to water quality analysis.
Figure 5 Example of studies on water quality using WQI, HPI, Correlation, PCA and HCA in arid and semi-arid environments.
Figure 6 Example of literature reports on major processes controlling water quality in arid and semi-arid environments.
Accordingly, it is imperative to note that, though the WQI, HPI, Correlation, PCA and HCA have provided simple tools for assessing water quality, especially in underdeveloped countries, where improved water supply is mostly lacking especially in remote areas, computation of WQI and HPI is not a substitute for poor field sampling or laboratory analyses. Therefore, appropriate field sampling and laboratory analyses are essential for accurate computation and reporting of WQI and HPI. One of the best ways for testing the internal consistency of water quality data is by Chemical Balance Error (CBE), where the hydrochemical data can be subjected to internal consistency tests.24 Thus, it is beneficial to apply CBE (Eq. 11), in order to test the internal consistency of hydrochemical data (Eq. 11).
Eq.11.
where the entire absorptions of cations and anions are in meq/l.
Typically, the cations and anions must accurately balance, and under normal circumstances where the study recovers most of the elements, the cations and anions must not vary by more than 5%.24 Further, apart from the CBE, the absorptions of certain ions in comparison with the total dissolved ion content, the consistency between physical parameters (pH, EC, TDS) measured in situ and those determined in the laboratory can be used to aid in measuring internal consistency of the water quality data. 24 The TDS level for water in each sample, for example, can be checked by adding up the levels of the major ions. These concentration levels can then be related to the TDS levels determined in situ. These indices all imply that the data is essentially consistent internally and could be employed for further analyses. 24
Though WQI, HPI, Correlation, PCA and HCA provide simple tools for assessment of water quality, yet an integrated model technique incorporating these tools is lacking. Thus, there is a dire need for joint application of water quality indices and multivariate statistics in future studies on water quality in drylands. In this review, we attempted a model design that can be used for water quality assessments in drylands (Figure 7). The need for the integrated conceptual model design is perhaps due to distinctive climatic conditions of arid and semi-arid environments, mainly high temperatures and low rainfall. The latter could affect the volume water received by streams and aquifers, whereas the former affects the solubility of gasses, ion exchange capacity, redox reaction, sorption processes, complexation, speciation and pH level in both streams and groundwater aquifers. However, several types of trace elements and heavy metals of rock origin are increasingly being added into streams and groundwater consequence of anthropogenic activities. Thus, defining the origin of these elements requires an integrated approach and in-depth analyses of the hydrogeochemical configurations the study area, as well as land use types. Overall, the natural geogenic processes, seasonality, environmental change (drought) appeared to exert more controls on water quality than human activities such as industry, agriculture, mining, and urbanization.
The literature is unanimous about the need for understanding the natural geogenic and anthropogenic processes controlling water quality in arid and semi-arid areas. Drylands constitute a distinctive ecological system which is mainly characterized by low rainfall, high rate evaporation, and poor vegetation cover. These coupled with human activities such as mining, irrigation, municipal and industrial water demand and improper sewage discharge from urban and industrial sources, have threatened water quality. Relating literature reports with the aforementioned drivers of water use results in the following remarks:
While there is significant reporting on water quality in arid and semi-arid areas around the world, water quality investigations are largely influenced by the prevailing environmental conditions as well as anthropogenic activities. These factors are also highly varied in arid and semi-arid environments. Thus, the rationality for establishing controls on water quality may be very difficult. Hence, water quality reports must be interpreted within the framework of the existing environmental conditions, time and regularly essential standard application for reporting water quality in the literature.
This study was supported by Universiti Teknologi Malaysia. Sincere thanks to all anonymous contributors.
The authors pronounce that there is no conflict of interest associated with this paper.
©2023 Wali, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.