Research Article Volume 5 Issue 1
^{1}Associate Professor (PhD, CQRM), Mathematics with Economics Programme, Universiti Malaysia Sabah, Malaysia
^{2}Lecturer, Environmental Science Programme, Universiti Malaysia Sabah, Malaysia
Correspondence: Noraini Abdullah, Associate Professor (PhD, CQRM), Mathematics with Economics Programme, Universiti Malaysia Sabah, Jln.UMS 88400, Kota Kinabalu, Sabah, Malaysia, Tel 0163421243
Received: December 08, 2019  Published: January 22, 2020
Citation: Abdullah N, Tair R. Relationship of heavy metals concentration accumulation via geloina similis physical properties using exponential regression models. MOJ Eco Environ Sci. 2020;5(1):1726. DOI: 10.15406/mojes.2020.05.00172
Modelling applications in trends of environmental sciences are currently very much sought after in identifying the determinants affecting the ecosystem. This study thus aspires to demonstrate the modelling procedures in the study of the relationship between the concentration of heavy metals in the soft tissues and the physical properties of Geloina similis by using the exponential regression modelling approach. Data collected for this study were obtained from a mangrove lagoon in Salut, in the state of Sabah, East Malaysia. Experimental analyses were carried out in the laboratory of the Environmental Science Program of the Faculty of Science and Natural Resources, Universiti Malaysia Sabah. Physical properties of molusc (Geloina similis) considered were differences in length, height, width, wet weight, and dry weight, besides the length and width of the soft tissues. Mathematical modelling procedures were then employed, involving listing out all the possible models, model transformation of nonlinear to linear models, multicollinearity test, coefficient test, followed by the Runs test and residuals normality test. The best model obtained was tested for its robustness and accuracy for prediction via the Mean Absolute Percentage Error (MAPE). Findings showed that the physical properties of Geloina similis involving height (X_{2}), wet weight (X_{4}), and interaction between height and length tissue (X_{2}A), had significantly contributed to the concentration of heavy metals accumulation (HMA) as given by equation:. Due to the absorption habit of Geloina similis, and the presence of heavy metals in the soils, it can be concluded that the presence of heavy metals concentration in the soft tissues of G.similis are thus found to have significant relationship with the molusc physical properties.
Keywords: environmental sciences, physical properties, heavy metal accumulation, model transformation, exponential models
MAPE, mean absolute percentage error; HMA, heavy metals accumulation; Pb, lead; Cu, Copper; Cr, Chromium; Cd, Cadmium; Zn, Zinc; GOF, goodnessoffit; DV’s, dependent variables; VIF, variance inflation factor; 8SC, eight selection criteria; IV’s, independent variables; WT, width tissue; LT, length tissue; WWt, wet weight; DWt, dry weight
Mangrove forests are one of the nurseries for some of the marine creatures especially the bivalve molluscs in the sediment area. Mangrove forests bring about the ecosystem by providing habitat for a few species of mud clam. Mangrove ecosystems reevaluated as the important intertidal wetland, which only can be found in tropical or subtropical areas.^{1} The ecosystems provide a diversity of ecological benefits including not only being highly productive, but also performing as nursery and a haven region for biodiversity. In addition, mangrove forests are capable to function as a protection against coastline erosion. Mangroves have roots that functions to trap soft sediments. Significantly, these sedimenttrapping root systems not only act as a cushion for the coastal area against waveinduced erosion, but also capable of protecting the coastal ecosystems from the erosion of the shoreline.^{2} Geloina similis are a kind of rare mud clam that can be found in the mangrove swamps area especially in the bottom of sediment. Recently, the mud clam is severely contaminated by heavy metals due to the intense industrialization which contributes to the high concentration of heavy metals in the sediment area. G.similis stores these contaminants in their soft tissues.^{3}
Longterm pollution caused by human activities had led to high contamination of heavy metals that had been recorded in mangrove sediments all over the world.^{4,5} Heavy metal is considered as one of the most impacted pollutants in this natural environment because of their toxicity, persistence and bioaccumulation problem, besides being nonbiodegradable, and persistent in the environment.^{6} Therefore, sediments that contained high concentrations of metals once they were ingested by the suspended filterfeeder become bioavailable as sources of metal uptake by the mussels.^{7} Besides, mangrove forests have an important role in the biogeochemistry of trace metal contaminants in coastal areas. They buffer and immobilize heavy metals before reaching nearby aquatic ecosystems.^{8} The mangroves polluted by the heavy metals are associated with anthropogenic inputs such as industrial effluents, agrobased industries, agricultural runoffs, sewage treatment plants, leaching from domestic garbage dumps, urbanization, and chemical and oil spills.^{9} The heaviest metals, such as copper, lead, and zinc, were accumulated in aquatic organism were then consumed by humans. Thus, the presence of heavy metals has received significant attention due to their longterm effects on the environment, especially in the coastal regions. While there exists various types of Heavy Metals Accumulation (HMA), the concentrations of heavy metals in this paper are focused on lead (Pb), Copper (Cu), Chromium (Cr), Cadmium (Cd), and Zinc (Zn). There is some study resources related to G.similis and other Geloina as well. However, there is no availability of previous research on bivalve mollusc (Geloina similis) using a mathematical model. While other works had been referred to as contribution in a linear relationship, this study expounds further to determine the nonlinear relationship between the different amounts of heavy metals concentration accumulated in the soft tissues, and as relative to the physical properties of Geloina similis using exponential modelling procedures.
Study site
This study had collected ninety samples of G.similis from the Salut area of mangrove forests in Sabah as shown in Figure 1^{10} of latitude 6˚6’4.18”N and longitude 116˚10’22.78”E . Data had included the physical properties of G.similis samples, namely, the height, and width of the molusc shells, wet weight and dry weight of the total soft tissue of the sample, and the accumulation of each heavy metal concentration (Zn, Pb, Cu, Cr, and Cd) carried out during the experimental laboratory analyses.
Mathematical modelling
Exponential model is frequently used in solving problems related to changes in populations, pollution, temperature, bank savings, drugs in the bloodstream and radioactive materials, so as to name a few. An exponential function is classified when it has a base that is constant and an exponent that is a variable. The general function of the exponential model is shown in Equation (1) below:
General Exponential Model Equation: ${P}_{i}={a}_{i}{e}^{{b}_{i}({X}_{i})},{b}_{i}>0$ (1)
Where P=dependent variable, X=independent variable, a=constant variable, b=constant variable, and e=base of the function, with i=1, …, n with ‘n’ is the number of dependant variables. The value of the constant variable, ‘b’ should be more than zero for the function to be valid. This is due to the fact that it is not possible to determine the value of the dependent variable, ‘P’ as the value cannot be calculated. Although the exponential function and the logarithmic function seem the same in the function, but in fact, they are not. The inverses of the exponential function are regarded as the logarithmic function. The function for logarithmic function is shown below in Equation (2):
Logarithmic Function:${Y}_{i}=Ln{p}_{i}={a}_{i}{b}_{i}{}^{{c}_{i}({X}_{i})}=\mathrm{ln}{a}_{i}+{c}_{i}\mathrm{ln}{b}_{i}({X}_{i})={\beta}_{0}+{\beta}_{i}{X}_{i}$ (2)
Where b= base of the function, a= constant variable, c= constant variable, X= independent variable, for i=1, 2,…, n with n is the number of dependent variables.
Before the exponential regression equation is applied on any case of study, basic assumptions are important to be identified. This is done to check whether the equation is appropriate to the case, and hence satisfies these assumptions. Three assumptions that are needed to be considered are:
Data analyses
Figure 2 depicts the modelling flowchart showing all the procedures involved in this research, starting with experimental data comprised of site data collection and laboratory analysis, mathematical variables identification, factor analysis and dummy transformation, treating outliers, data partitioning for modelling (90%) and forecasting (10%). Preparing the data facilitates statistical analysis, and this includes checking for data normality, identifying necessary extracted variables, statistically adjusting for outliers and data transformation.^{11} Nonnormal data were transformed into normality by using basic transformation such as logarithmic function and square root transformation.^{12}
Exponential model obtained will then undergo normality and randomness tests for its goodnessoffit (GOF), and finally, the transformed data are substituted back into the exponential regression equation for interpretation. Procedural summary of the modelling flowchart are as shown in Figure 2 below.
Dummy variables were identified by carrying out Factor Analysis in SPSS version 22. It is used to select the factors by extraction so as to identify the importance of the variables chosen. Theory is the first criteria to determine the number of factors to be extracted. From theory, the number of factors extracted does have to make sense. Criteria for practical and statistical significance of factor loadings can be classified based on their magnitude such as follows:
Variables with significance lower than 0.50 will be chosen as dummy variables. In this study, the practice of dichotomization on quantitative measures will be based on the median value which was known as median split.^{13} An observation that appears to deviate markedly from other observations in the sample is called an outlier.^{14} Outliers were observed using boxplots and extreme values table computed to check for the presence of extreme values or outliers. Winsorization is a common way in dealing with outliers. It is the statistical transformation by restricting extreme values to reduce the influence of possibly spurious outliers in the statistical data. It is the modification of one or more data points at the end of the tails of the distribution to the next highest or lowest values within the distribution that are not suspected to be outliers. Instead of truncating or trimming the outliers, simply just modify the outliers to the next lowest or highest value in the tail of each side of the distribution. Winsorization is used because the valid data points were derived from a heavytailed distribution. Without dealing with outliers, it might affect our statistical analyses. Winsorizing data points were highly considered because the outliers probably might greatly affect the accuracy of the significant pvalue, that is, it becomes more consequential to the pvalues in terms of accuracy.
Data distribution
The data collected had thus undergone experimental laboratory analyses based on the five heavy metals accumulated in the soft tissues. The data set for each metal was then categorized into two partitions which were 90% for partition P1 (for modelling), and 10% for partition P2 (for prediction):
Descriptive statistics were used to describe the basic features of the data by providing simple summaries about the sample and measures; with simple graphic analysis.^{15} Quantitative descriptions were also presented in a manageable form.^{16} The modelbuilding procedures can be referred to as in^{17,18} and the Four Phases.^{19}
The phases of modelbuilding approach in Figure 3 can be described simplistically as follows:
Phase 1  all possible models: In the development of the Exponential models for these datasets, the concentration of heavy metals accumulated in the soft tissues would be the Dependent Variables (DV’s) noted by P_{i}, where i=1,2,…..,5 based on the five heavy metals tested; whereas, length (X_{1}), height (X_{2}), Width (X_{3}), dry weight (X_{4}), and wet weight (X_{5}) would be the Independent Variables (IV’s). Length tissue (A) and width tissue (B) were included as independent dummy variables in the models. Dummy variables were executed during the calculation of the possible models but included in the models before modelbuilding procedures were carried out. The number of all possible exponential models, N can be calculated by using the formula:
$N={\displaystyle \sum _{j=1}^{q}j({}^{q}C{}_{j})}$ (3)
Where, ‘N’ is the number of all possible models generated, and ‘q’ is the number of variables, and j=1, 2,…, q.
Phase 2  selected models: After running the models using the SPSS, there were a few selected models obtained. The selected model was first determined using the Multicollinearity Test via the VIF values obtained. Multicollinearity test can be carried out in two ways which are by testing the correlation based value or through Variance Inflation Factor (VIF). In this study, VIF value used will be 5. The overall procedure to remove the variables via multicollinearity by using VIF is shown in Figure 4. This can be done in SPSS by going to Analyzeà Regression à Linear à Enter the required variables. The variable(s) with VIF>5.0, would be eliminated first. Subsequent elimination was carried out until the variables left were of VIF values less than 5.0.^{18} After the Multicollinearity Test, the next step was to conduct the Coefficient Test. Coefficient test was done by eliminating sequentially one at a time the variable(s) that had pvalues more than 0.05.
The coefficient test was also applied to test the coefficients of the corresponding variables. The variable with a condition that the highest pvalue and greater than α=0.05 would be removed where an example of model labelling is given as: let say M19 as the ‘parent’ model with 5 multicollinearity variables removed, and 2 insignificant variables eliminated from the coefficient test .^{21}
Phase 3  best model: In order to achieve the best model, Eight Selection Criteria (8SC) as in Table 1 were also used in this paper.^{17}
AIC: $\left(\frac{\text{SSE}}{\text{n}}\right){\text{e}}^{\frac{\text{2}\left(\text{k+1}\right)}{\text{n}}}$ (Akaike^{22}) 
GCV:$\left(\frac{\text{SSE}}{\text{n}}\right){\left(\text{1}\frac{\text{k+1}}{\text{n}}\right)}^{\text{2}}$ (Golub^{26}) 
FPE: $\left(\frac{\text{SSE}}{\text{n}}\right)\frac{\text{n+k+1}}{\text{n}\left(\text{k+1}\right)}$ (Akaike^{23}) 
SHIBATA: $\left(\frac{\text{SSE}}{\text{n}}\right)\frac{\text{n+2}\left(\text{k+1}\right)}{\text{n}}$ (Shibata^{27}) 
SCHWARZ: $\left(\frac{\text{SSE}}{\text{n}}\right){\left(\text{n}\right)}^{\frac{\text{k+1}}{\text{n}}}$ (Schwarz^{24}) 
RICE: $\left(\frac{\text{SSE}}{\text{n}}\right){\left(\text{1}\frac{\text{2}\left(\text{k+1}\right)}{\text{n}}\right)}^{\text{1}}$ (Rice^{28}) 
HQ: $\left(\frac{\text{SSE}}{\text{n}}\right){\text{(lnn)}}^{\frac{\text{2}\left(\text{k+1}\right)}{\text{n}}}$ (Hannan & Quinn^{25}) 
SGMASQ: $\left(\frac{\text{SSE}}{\text{n}}\right){\left(\text{1}\frac{\text{k+1}}{\text{n}}\right)}^{\text{2}}$ (Ramanathan^{29}) 
Table 1 Eight Selection Criteria (8SC)
Phase 4  goodnessoffit: Lastly, goodnessoffit (GOF) test was also used to ensure how well the model fits into the problem or data based on the standardized of residuals. To determine the randomness of the residuals, a randomness test is done to determine them. If the value obtained is larger than 0.05, thus the null hypothesis is accepted and vice versa.
The Mean Absolute Percentage Error (MAPE) is used to check the accuracy of the model as it produces a measure of relative overall fit.^{30} The purpose of MAPE is to verify the reliability of the best model which was obtained in the phase three. MAPE measures the error size of the model and usually expresses accuracy as a percentage.^{31} MAPE is defined by the formula as shown in (4):
$MAPE=\frac{1}{m}{\displaystyle \sum _{t=1}^{m}\left\frac{{A}_{t}{F}_{t}}{{A}_{t}}\right}x100\%$ (4)
Where, 𝑚= sample size of reserved data, 𝐴_{𝑡}= actual value of dependent variable given, and 𝐹_{𝑡}= estimated value of dependent obtained. The interpretation of different values of MAPE is shown in Table 2.
MAPE 
Criterion 
MAPE < 10% 
Very Good 
10% < MAPE < 20% 
Good 
20% < MAPE < 50% 
Reasonable 
MAPE ≥50% 
Not Accurate 
Table 2 Interpretation of MAPE Values
The best model is accepted if the percentage value of MAPE is from below 10% up to 15%. However, the model is still acceptable if the value of MAPE is less than 25%. A lower MAPE value would indicate that the best model can be used in forecasting or prediction. Otherwise, the best model would be rejected.
Variables identification
In this study, the heavy metals concentrations of Zinc, Lead, Copper, Cromium and Cadmium respectively given by their atomic symbols (Zn, Pb, Cu, Cr, and Cd) in total soft tissue of G.similis were used as dependent variables (DV’s). Meanwhile, there were seven independent variables (IV’s) in terms of physical factors studied as shown in Table 3. The data were identified with the symbol given below to ease in the data preparation procedures. The symbols were labelled before relevant transformations were performed.
No. 
Variables 
Symbol 
Type of variable 
1 
Concentration of Heavy Metal Zinc 
Zn 
Dependent 
2 
Concentration of Heavy Metal Copper 
Cu 
Dependent 
3 
Concentration of Heavy Metal Lead 
Pb 
Dependent 
4 
Concentration of Heavy Metal Cadmium 
Cd 
Dependent 
5 
Concentration of Heavy Metal Chromium 
Cr 
Dependent 
6 
Length (cm) 
L 
Independent 
7 
Height (cm) 
H 
Independent 
8 
Width (cm) 
W 
Independent 
9 
Dry Weight (g) 
DWt 
Independent 
10 
Wet Weight (g) 
WWt 
Independent 
11 
Length Tissue (cm) 
LT 
Independent 
12 
Width Tissue (cm) 
WT 
Independent 
Table 3 List of dependent and independent variables
Factor analysis
Table 4 showed the results of the factor analysis that was carried out to identify the dummy variables. It could be seen that five independent variables which were length, height, width, wet weight and dry weight were more important since they showed higher number of possible causes greater than 0.5. Meanwhile, width tissue and length tissue variables (highlighted yelow in Table 4) were of lesser importance with lower number of possible causes that were lower than 0.5. Hence, these variables were chosen as dummy variables and then were converted into categorical variables in this study.
Rotated component matrixa 

Component 

1 
2 

Length (cm) 
675 
_558 
Height (cm) 
0.606 
0.643 
Width (cm) 
0.876 
0.182 
Wet weight (g) 
0.774 
0.549 
Dry weight (g) 
_740 
_465 
Width tissue (cm) 
0.446 
0.805 
Length tissue (cm) 
0.231 
0.895 
Extraction Method: Principal Component Analysis. 

Rotation Method: Varimax with Kaiser Normalization. 

a. Rotation converged in 3 iterations. 
Table 4 Rotated component matrix for physical properties of G.similis
Dummy transformation
With 90 observations considered, the median values for the dummy variables (width tissue and length tissue) were computed from SPSS as shown in Table 5. Dummy code 0 will be assigned to observations below the median values of 3.800 and 4.050 for width tissue (WT) and length tissue (LT) respectively, while code 1 is assigned to values greater than the median values. These dummy codes are interpretated as value 0 for small size of G.similis, and value 1 for big size of G.similis. The width tissue variable will be then labelled as A, while the length tissue variable is labelled as B in the regression equations.
Statistics 


Width tissue 
Length tissue 
N Valid 
90 
90 
Missing 
0 
0 
Median 
3.8 
4.05 
Table 5 Median value for variables width tissue and length tissue
Checking and treating outliers
Figure 5 and Figure 6 below depicted the presence of outliers of the physical properties of G.similis and concentrations of heavy metals in the soft tissues respectively. It can be seen that wet weight and dry weight of the physical properties both have outliers in the boxplot graphs, while other physical properties do not have any outliers. Meanwhile, variables in Figure 6 were arranged accordingly to the concentration of heavy metals: Zinc (Zn), Copper (Cu), Lead (Pb), Cadmium (Cd) and Chromium (Cr).
Outliers are needed to be treated and not removed so as to improve data robustness for modelling. Table 6 showed that there was only one outlier found in Wet Weight (WWt), precisely in case number 71 with value of 28.16380. However, extreme values and outliers were highly detected too in the Dry Weight (DWt) variable. Table 7 showed details of the extreme values and outliers of the heavy metal concentrations taking regard about the case numbers from the 90 observations as well as the values of observed data. A total of six extreme values and eight outliers were detected so these have to be treated for robust estimation and prediction.
Physical properties 
Symbol 



Casa number 
Value 

Wet Weight (g) 
WW 
Outlier 
71 
28.1638 
Dry Weight (g) 
DW 

67 
3.9455 
73 
3.7667 

65 
3.7198 

Outliers 
70 
3.5852 

66 
3.1421 

69 
3.01 

63 
2.94 

71 
2.93 

68 
2.81 

72 
2.7 

62 
2.63 
Table 6 Extreme values of the outliers of physical properties wet weight and dry weight
Concentration 
Symbol 
Extreme values 


Case number 
Values 

Zinc 
Zn 
Extreme Value 
82 
235.71 
Outliers 
49 
196.52 

85 
186.82 

84 
159.21 

Copper 
Cu 
Extreme Value 
11 
81.32 
Outlier 
14 
46.04 

Lead 
Pb 
Extreme Value 
23 
42.51 
28 
25.57 

Outliers 
29 
17.77 

60 
17.5 

53 
14.73 

54 
13.66 

Cadmium 
Cd 
Extreme Value 
31 
7.04 
Chromium 
Cr 
Extreme Value 
5 
11.35 
Outliers 
47 
8.82 
Table 7 Extreme values of the outliers of heavy metal concentrations in G.similis
Outliers detected were treated by standard statistical procedure called winsorization. Winsorization procedures were proposed to replace extreme values with less extreme values, effectively moving the original extreme values toward the centre of the distribution (Table 8).^{32}
Physical properties 
Symbol 
Case number 
Next highest value 
Wet Weight 
WWt 
63 
26.4773 
Dry Weight 
DWt 
64 
2.43 
Table 8 Winsorization based on the next highest value within the distribution
For independent variable, wet weight WWt with only one outlier was determined, the outlier with value 28.16380 will be modified into 26.47730 which is the next highest value within the distribution. 26.47730 were originated from the case number 63 out of the other observed values. On the other hand, the extreme values and outliers for variable dry weight; DW will be modified into value 2.43, the highest value from case number 64 which was within the distribution. After treating outlier for independent variable, outliers and extreme values for each dependent variable were also replaced with the values as in the Table 9 below:
Concentration of heavy 
Symbol 
Case number 
Next highest 
Zinc 
Z11 
50 
147.43 
Copper 
Cu 
38 
37.21 
Lead 
Pb 
25 
11.55 
Cadmium 
Cd 
3 
3.69 
Chromium 
Cr 
14 
7.92 
Table 9 Winsorization based on the next highest value within the distribution
Descriptive statistics
For concentration of heavy metals, the summaries showed all the variables were positively skewed with higher mean value compared to the median as well. All the kurtosis values lie within the range of the rule of thumb proposed. This indicated that no measurement of extremity tails of the distribution. Only variables Zn and Cu showed approximately symmetric distribution with skewness value 0.4420 and 0.4050 respectively. Further tests will be conducted with graphs to prove the distribution of each variable. Table 10 depicts the descriptive statistics of the heavy metal concentrations in this study while Table 11 indicated the variables, types and symbols used for further modelling.
Statistic 
Variables 

Zn 
Cu 
Pb 
Cd 
Cr 

Mean 
72.35 
15.7 
4.1301 
1.209 
2.3502 
Standard Error 
3.6 
1.1739 
0.3709 
0.09421 
0.2013 
Median 
72.51 
15.14 
3.5 
0.87 
2.19 
Std. Deviation 
32.4 
10.565 
3.338 
0.8479 
1.8118 
Sample Variance 
1049.91 
111.623 
11.144 
0.719 
3.282 
Kurtosis 
0.04 
0.667 
0.066 
0.217 
0.368 
Skewness 
0.442 
0.405 
0.923 
0.866 
0.623 
Range 
133.53 
6.23 
11.45 
3.36 
6.51 
Maximum 
13.9 
37.21 
11.55 
3.47 
6.56 
Minimum 
147.43 
0.098 
0.1 
0.11 
0.05 
Table 10 Descriptive statistics of the heavy metal concentrations
No 
Variables 
Symbol 
Type of 
1 
Concentration of Heavy Metal Zinc 
Y_{1} 
Dependent 
2 
Concentration of Heavy Metal Copper 
Y_{2} 
Dependent 
3 
Concentration of Heavy Metal Lead 
Y_{3} 
Dependent 
4 
Concentration of Heavy Metal Cadmium 
Y_{4} 
Dependent 
5 
Concentration of Heavy Metal Chromium 
Y_{5} 
Dependent 
6 
Length (cm) 
X_{1} 
Independent 
7 
Height (cm) 
X_{2} 
Independent 
8 
Width (cm) 
X_{3} 
Independent 
9 
Dry Weight (g) 
X_{4} 
Independent 
10 
Wet Weight (g) 
X_{5} 
Independent 
11 
Length Tissue (cm) 
A 
Dummy 
12 
Width Tissue (cm) 
B 
Dummy 
Table 11 Variables, types and symbols used in model equations
For this study, q=5 (excluded the 2 dummy variables), the number of all possible models would be: $N=1({}^{5}C{}_{1})+2({}^{5}C{}_{2})+3({}^{5}C{}_{3})+4({}^{5}C{}_{4})+5({}^{5}C{}_{5})=80$ , as shown in Table 12 below. The transformation of nonlinear to linear model equations was partially shown below. From Equation (1), nonlinear exponential equation can be given as: ${P}_{i}={a}_{i}{e}^{{b}_{i}({X}_{i})},{b}_{i}>0$ . From equation (2), the transformed equation is in the form of $Lnp=\mathrm{ln}\alpha +({\beta}_{1}{X}_{1}+{\beta}_{A}A+{\beta}_{B}B+\mu )$ . Assume that, 𝑌=ln𝑃, and 𝛽_{0}=ln𝛼, then, the equation will be in the form: $Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{A}A+{\beta}_{B}B+\mu $ . The 80 transformed model equations are applied to each of the five data sets of dependent variables on the heavy metals, thus amounting to 400 possible exponential model equations obtained.
Variables 
Interactions 
Total 

Zero 
First 
Second 
Third 
Fourth 

1 
5 
5 

2 
10 
10 
20 

3 
10 
10 
10 
30 

4 
5 
5 
5 
5 
20 

5 
1 
1 
1 
1 
1 
5 
Total 
31 
26 
16 
6 
1 
80 
Model 
M1M31 
M32M57 
M5873 
M7479 
M80 
 
Table 12 The total number of all possible models for five independent variables
Model building procedures of Phase 1 to Phase 4 were carried out on the regression equations. Table 13 below showed the summary of the selected models of Phase 2 on heavy metal Zn denoted by Data set 1.
Concentration of heavy metal Zn, Y_{1} 

Selected 
Summary 
(k + 1) 
M1.0.2 
Y = βo + βAA +μ 
2 
M6.0.1 
Y = βo +β1X1 +β2X2 + βAA +μ 
4 
rvi11.o.1 
Y = βo +β2X2 + β4X4 + βAA+μ 
4 
M12.0.1 
Y = βo +β2X2 + β5X5 + βAA+μ 
4 
M33.3.3 
Y = βo + (β3AX3A)+ μ 
2 
M45.9.4 
Y = βo + (β1AX1A)+ μ 
2 
M48.9.2 
Y = βo +β2X2 + β4X4 + β2AX2A +μ 
4 
M49.8.3 
Y = βo +β2X2 + β5X5 +(β3AX3A) +(β5BX5B) + μ 
5 
M52.14.3 
Y = βo +β2X2 + β4X4 + β3AX3A +μ 
4 
Fv156.11.5 
Y = βo +β2X2 + β5X5 +(β3AX3A) +μ 
4 
M69.17.3 
Y = βo +β2X2 + β35X35 +(β3AX3A) +(β5BX5B) + μ 
5 
M76.17.5 
Y = βo +β2X2 + β4X4 + βAA+ β15X15+μ 
4 
M79.33.6 
Y = βo +β15X15+(β3AX3A) + μ 
3 
Table 13 Summary for selected Models in Data set 1 (Zn)
Table 14 showed the values of the eight selection criteria of Phase 4 of the model building procedures. It can be seen that model M48.9.2 shows the lowest value among all the other models on Zinc. Therefore, it can be concluded that the model M48.9.2 is the best model with respect to the heavy metal (Zinc) concentration. The selected general equation of M48.9.2 is given as: ${\stackrel{\u2322}{Y}}_{1}={\beta}_{0}+{\beta}_{2}{X}_{2}+{\beta}_{4}{X}_{4}+{\beta}_{12}{X}_{12}$ . Similar procedures were carried for all the other dependent variables of Lead, Copper, Cadmium and Chromium respectively. Table 15 listed the best model equations of all the heavy metal concentrations, except for Lead which had all its model equations being removed due to high multicollinearity and insignificant variables.
Model 
SSE 
R2 
k+1 
n 
8SC 

AIC 
FPE 
GCV 
HQ 
RICE 
SCHWARZ 
SGMASQ 
SHIBATA 

M1.0.2 
15.12 
0.298 
2 
81 
0.19612 
0.3892 
0.19624 
0.20082 
0.212656 
0.20806 
0.19139 
0.38255 
M6,0.1 
13.219 
0.386 
4 
81 
0.17525 
0.4092 
0.17569 
0.18376 
0.19547 
0.19724 
0.16701 
0.65074 
M11.0.1 
12.86 
0.403 
4 
81 
0.17875 
0.40292 
0.1792 
0.18743 
0.19938 
0.20119 
0.17035 
0.66375 
M12.0.1 
13.117 
0.391 
4 
81 
0.19792 
0.40003 
1900 
0.20267 
0.20846 
0.20997 
0.19315 
0.38607 
M33.3.3 
15.259 
0.292 
2 
81 
0.19992 
0.39295 
0.20004 
0.24172 
0.21057 
0.21209 
0.1951 
0.38996 
M45.9.4 
15.413 
0.285 
2 
81 
0.17014 
0.38065 
0.17057 
0.1784 
0.18977 
0.19149 
0.16214 
0.63177 
M48.9.2 
12.485 
0.42 
4 
81 
0.17011 
0.39333 
0.18086 
0.19115 
0.17723 
0.19184 
0.1617 
0.31577 
M49.8.3 
13.246 
0.385 
5 
81 
0.18502 
0.18505 
0.18576 
0.19632 
0.21284 
0.21449 
0.17429 
0.83784 
M52.14.3 
13.072 
0.393 
4 
81 
0.18051 
0.32296 
0.18096 
0.18928 
0.20134 
0.20316 
0.17203 
0.67027 
M56.11.5 
13.246 
0.385 
4 
81 
0.17901 
0.38591 
0.17973 
0.18995 
0.20593 
0.20753 
0.16863 
0.81064 
M69.17.3 
12.816 
0.405 
5 
81 
0.18656 
0.36653 
0.18703 
0.19562 
0.20309 
0.20997 
0.17779 
0.69274 
M76.17.5 
13.69 
0.365 
4 
81 
0.20006 
0.37164 
0.20034 
0.20731 
0.21669 
0.21861 
0.19292 
0.57103 
Table 14 The Corresponding of Eight Selection Criteria for Data Set 1 (Zn, ??_{1})
Data set 
Yj 
Best model and exponential regression equation 
1 
Y_{1} 
$\text{M48}\text{.9}{\text{.2:P}}_{\text{Zn}}\text{=0}{\text{.787e}}^{\text{0}{\text{.587X}}_{\text{2}}\text{+0}{\text{.363X}}_{\text{4}}\text{0}{\text{.159X}}_{\text{2}}\text{A}}$ 
2 
Y_{2} 
$\text{M34}\text{.3}{\text{.5:P}}_{\text{Cu}}\text{=0}{\text{.698e}}^{\text{0}{\text{.254X}}_{\text{3}}\text{0}{\text{.779X}}_{\text{23}}}$ 
4 
Y_{4} 
$\text{M1}\text{.0}{\text{.2:P}}_{\text{Cd}}\text{=0}{\text{.034e}}^{\text{0}{\text{.617X}}_{\text{4}}}$ 
5 
Y_{5} 
$\text{M69}\text{.17}{\text{.3:P}}_{\text{Cr}}\text{=0}{\text{.045e}}^{\text{0}{\text{.142X}}_{\text{4}}\text{+0}{\text{.216X}}_{\text{5}}}$ 
Table 15 Summary of the list of best models
Table 16 above shows the goodnessoffit tests, namely the Runs test and normality test for Zn. Since the value of the asymptote significant is 0.105 which is >0.05, then the null hypothesis is not rejected. In other words, the standardized residual, u_{i} are randomly distributed. Since the sample size is more than 50, normality test based on KolmorovSmirnov shows a pvalue of more than 0.05. This also indicates that the residuals are normality distributed.
Runs test 
Normality test 



Table 16 Goodnessoffit tests on data set 1 (Zn)
Exponential Smoothing was conducted to examine the accuracy of the model. Exponential smoothing technique is one of the most important quantitative techniques in forecasting. The accuracy of forecasting of this technique depends on exponential smoothing constant. Choosing an appropriate value of exponential smoothing constant is very crucial to minimize the error in forecasting.
For illustration purposes, Table 17 showed the actual value and the calculated forecasted value for Zinc. The values were then substituted in the equation so as to calculate MAPE as shown below.
$\begin{array}{c}MAP{E}_{Zn}=\frac{1}{m}{\displaystyle \sum _{t=1}^{m}\left\frac{{A}_{t}{F}_{t}}{{A}_{t}}\right}x100\%=\frac{1}{9}(0.3089999969)x100\%\\ =3.43\%\end{array}$
m 
At 
Ft 
Ft1 
1 
25.81 
24.99 
0.031771 
2 
54.83 
53.83 
0.018238 
3 
93.62 
95.1 
0.015809 
4 
29.57 
30.2 
0.021305 
5 
38.49 
39.22 
0.018966 
6 
85.61 
87.76 
0.025106 
7 
92.26 
91.1 
0.012573 
8 
101.14 
111.15 
0.098972 
9 
72.98 
77.82 
0.066261 


Total 
0.309 
Table 17 MAPE table for best model data Set 1 (Zn) (M48.9.2)
Similarly, MAPE for the other heavy metals, namely for Cu(Y_{2}) , Cd(Y_{4}) and Cr(Y_{5}) were calculated, and were given as Cu: 44.3%, Cd: 19.98%, and Cr: 9.32% respectively. It can be seen that exponential model from Zinc is the best to be used for forecasting the heavy metal concentration accumulation of G.similis.
The best models are obtained by calculating the 8SC based on each data set of heavy metal concentration, namely, Zinc, Copper, Cadmium, and Chromium, except for Lead, since all the variables are highly correlated and insignificant. Based on the results obtained in Table 15, the best models for the heavy metals (Zn, Cu, Cd, Cr) produced an exponential curve which is almost linear. The physical properties which affect the concentration of accumulation are height (X_{2}), width (X_{3}), wet weight (X_{4}), dry weight (X_{5}), and length tissue (A), except for length (X_{1}), and width tissue (B). The best model for heavy metal Zinc concentration accumulation is M48.9.2. The reduced model equation of the best model is shown as below: M48.9.2: ${\stackrel{\u2322}{Y}}_{1}=ae{\beta}_{2}{X}_{2}+{\beta}_{4}{X}_{4}+{\beta}_{2A}{X}_{2}A$ . The best model can be written in the form of estimated model with the coefficient value as shown as: ${\widehat{Y}}_{1}=0.787+0.587{X}_{2}+0.363{X}_{4}0.159{X}_{2}A$ , where, ${\widehat{Y}}_{1}$ = concentration of heavy metal Zinc; 𝑋_{2}= height; 𝑋_{4}=wet weight and 𝑋_{2}𝐴= interaction between height and length tissue. By log transformed, the exponential equation is thus given by ${P}_{Zn}=0.787{e}^{0.587{X}_{2}+0.363{X}_{4}0.159{X}_{2}A}$ . The model has two single independent variables and one first order interaction. The positive coefficient values show that the concentration of heavy metal Zinc would increase if the corresponding variables, 𝑋_{2} and 𝑋_{4} increase. Model equation also shows that Zinc concentration is positively affected by the increment in the zero interaction of height, 𝑋_{2} and the wet weight, 𝑋_{4} of G.similis, and is negatively affected by the first order interaction between height (𝑋_{2}) and length tissue (A) . This thus indicated that these two variables, height and wet weight, are the main single contributors to the concentration of Zinc accumulation in the soft tissue of G.similis. The concentration of Zinc showed a positive value as the constant was positive, no matter of any increment for the height and wet weight. This was because the model showed a positive intercept which was 0.787. For every additional of one unit in height, 𝑋_{2} will directly increase the concentration of Zinc by 0.587 and for every additional of one unit in wet weight, 𝑋_{4} will directly increase the concentration of Zinc by 0.363. This also shows that 𝑋_{2} is more dominant than 𝑋_{4}.
Active feeding behaviour possessed by the mollusc may raise the concentration of heavy metals in its tissue.^{33} Mollusc is also exposed to different food suspensions consisting mixtures of sediment, particulate matter. The different concentrations of pollutants may affect the growth of mollusc in aquatic environment as well. There are a lot of factors that can affect the concentration of Geloina in realworld phenomena. It is recommended that the study of heavy metals concentration shall not just be based on the physical properties. Hence, further works on these aquatic environmental factors in this study site and other similar sites such as this, are recommended.^{34}
This study had identified the relationship between the concentration of heavy metals accumulation and the physical properties of Geloina Similis via exponential regression models. Exponential regression modelling techniques are exemplified as well as illustrated. Modelling procedures cum statistical tests are employed, and have proven to obtain a robust model for prediction and forecasting. This study had identified the variables that are affected by the concentration of heavy metals accumulation on G.similis via the nonlinear exponential regression. The relationships between the concentration of heavy metals and the G.similis physical properties have all involved the height, width, wet weight, dry weight and length of soft tissue, particularly Zinc, where height and wet weight are significant contributors. The validity of the model has been tested based on the goodnessof fit test, and the accuracy and reliability of the model is obtained via MAPE (3.43%), which thus indicates that the exponential model is a very good model for estimation and prediction. All these statistical tests and analyses had thus indicated that modelling using exponential regression models gives robustness and are excellent in giving good estimates in prediction.
The authors would like to thank Universiti Malaysia Sabah for providing the fund for this research under the grant number SGK0009STWN2015. Our thankful appreciation to Ms. Lau Wei Eng for helping out partially in the statistical analyses performed during modelling.
None.
Authors declare no conflict of interest exists.
©2020 Abdullah, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work noncommercially.