Research Article Volume 3 Issue 5
Department of Civil Engineering, Cross River University of Technology, Calabar, Nigeria
Correspondence: Nkpa M Ogarekpe, Department of Civil Engineering, Cross River University of Technology, Calabar, Nigeria
Received: August 26, 2019 | Published: October 2, 2019
Citation: Ogarekpe NM. Comparison of predicted annual maximum rainfall for Calabar metropolis using statistical approach. Int J Hydro. 2019;3(5):400?406. DOI: 10.15406/ijh.2019.03.00205
The frequency factor approach was used for the prediction of the annual maximum rainfall based on the Log Pearson Type III (LP3) and Pearson Type III (P3) distributions while the sample moments and the reduced variate were utilized in the development of the Gumbel Type 1 (EVI) model for Calabar metropolis. The EVI, LP3 and P3 predicted and observed values were compared using regression analysis approach. Twenty-Three years annual maximum rainfall dataset was obtained and analyzed using the Weibull plotting position. The results of the Adjusted R square values, p value, F, t and Durbin–Watson (DW) statistics corresponding to the P3, EVI and LP3 distributions are as follows: Adjusted R2 of 0.999, F value of 4473.175, t=66.882, p<0.01, DW value of 1.033; Adjusted R2 of 0.998, F value of 3548.123, t=59.566, p<0.01, DW value of 1.199 and Adjusted R2 of 0.990, F value of 509.372, t=22.569, p<0.01, DW value of 0.867. The DW statistics value of 0.867 obtained for the LP3 predictions revealed the possibility of the presence of a positive autocorrelation between adjacent residuals. Therefore, the P3 and EVI distributions are more suitable for the fitting of the rainfall of Calabar metropolis compared to the LP3 distribution. The Breusch–Pagan and Koenker tests of homoscedasticity revealed that there is no heteroscedasticity (p<0.05) in the data, hence, the correctness of the explanatory power of our models.
Keywords: rainfall, distribution, frequency factor, comparison
Intensity – Duaration – Frequency (IDF) relationships and curves have been developed for several parts of the world. Dupont & Allen stated that rainfall Intensity Duration curves are graphical representation of the amount of water that falls within a given point in time in each catchment area. Intensity – Duaration – Frequency (IDF) relationships of rainstorm events have in the past been utilized in sizing of hydraulic structures and general planning, design and development of water resources schemes. Consequently, an inaccurate IDF curve would lead to erroneously sized hydraulic structures and inaccurate forecasts. Ogarekpe1 opined that the need to have reliable estimates of these values for localities have become expedient consequent upon devastations caused by flood in different parts of the world. Uncertainties occasioned by climate change further underscores the need for further research on IDF curves. Tfwala2 suggested that it was important to analyze the trends of annual precipitation maxima before developing IDF curves, his argument premised on the assertion that climate change will alter the spatial and temporal variability of precipitation patterns, which may lead to inaccuracies in the estimation of IDF curves. Nhat et al.3 developed IDF curves for seven stations in the monsoon area of Vietnam and a generalized IDF formula using base rainfall depth, and base return period for Red River Delta (RRD) of Vietnam. Berhanu et al.4 determined IDF curves for homogeneous regions identified in Botswana. Rambabu et al. developed an equation analyzing rainfall characteristics for some stations. Antigha and Ogarekpe5 developed IDF curves for Calabar metropolis using EVI. Reed6 studied rainfall frequency analysis for flood design. Ferreri and Ferro7 studied the studied the applicability of Bell’s rainfall-duration relation in Sicily and Sardinia. Fitzgerald8 carried out the analysis of rainfall extremes for a single station and at a regional scale using a Generalized Pareto Distribution (GPD) for the exceedances. Naghavi et al.9 compared five popular distributions and three parameter estimation methods using Louisiana rainfall data sets.
The variability of stochastic events such as rainfall necessitates the need for analyses on spatiotemporal basis. The frequency distribution approaches in the analyses of historical annual maximum rainstorm events have been extensively researched. Storm rainfalls are most commonly modeled by EVI distribution.10,11 For this study, the EVI, P3 and LP3 distributions were considered. The parameters in terms of sample moments of the EVI distribution can be evaluated thus12
(1)
(2)
Gumbel13 has shown that substituting the reduced variate considering the return period T as an alternate axis to y yields
(3)
For the EVI distribution, xT is related to yT by Eq. 5
(4)
Where u and are the location and shape parameter of the Gumbel distribution, s is the standard deviation of the sample. The P3 distribution is computed using the frequency factor approach. Chow14 stated that most frequency functions can be generalized to
(5)
Where X is rainfall intensity of specified probability , is the mean of the series, is the standard deviation of the series and K, a frequency factor defined by a specific distribution, is a function of the probability level of X, is the mean of the observed dataset. The frequency factors are read at the corresponding probabilities at the appropriate skewness. The LP3 computational procedure entails the conversion of the observed annual maximum rainfall intensities to a logarithmic (log to base 10) series. The mean, standard deviation and the skewness are determined as shown in Eqs. 6, 7 and 8.
(6)
(7)
(8)
Where Cs is the coefficient of skewness, is the standard deviation and is the mean of the log transformed data. The values of yi for probability levels corresponding to return periods of 2, 5, 10, 50, 100 and 200 years were computed using Eq. 9.
(9)
Where K is frequency factor, read at the corresponding probabilities at the appropriate skewness. The rainfall intensities XT were obtained for corresponding recurrence intervals using Eq. 10
(10)
There is no physical rationale for the selection of a function.15 The choice of probability distribution model is made on arbitrary basis.16 This paper therefore seeks to compare the EVI, LP3, P3 distributions for the prediction of annual maximum rainfall for Calabar metropolis using regression analysis. The results of each distribution were compared with the observed values considering the Adjusted R square values, p value, F, t and Durbin–Watson (DW) statistics.
Description of area of study
Calabar Metropolis, which comprise of Calabar Municipality and Calabar South Local Government Areas (Figure 1), has a total land area of 328.23km2.4 Calabar is the administrative capital of Cross River State. The town is flanked on its eastern and western borders by two large perennial streams namely: The Great Kwa River and the Calabar River, respectively. The climate is equatorial and semi-equatorial in nature, characterized by high humidity and substantial rainfall.17 Precipitation characteristically occurs during the wet season (April – October) shows two peaks in June/July and September/October. A short dry season usually called “August break” separate the peaks. In some years, rainfall reading has been observed to go up to over 3000mm.18
Description of materials and methods
The materials used for this work is rainfall data set. One of the major challenges encountered in the study of rainfall analysis is inadequate or lack of data, especially in third world countries like Nigeria. Thankfully, Twenty-Three (23)years rainfall data was obtained from Nigeria Meteorological Centre (NIMET) office, Calabar, Cross River State, Nigeria. This office is statutorily responsible for rainfall data gathering in Calabar and its environs. The total land area of Calabar metropolis (328.23km2) yields an acceptable rain gauge density in respect of the recommendation of the World Meteorological Organisation (World Meteorological Organization [WMO], 1965). According to the WMO (1965) report in the study by Ngene et al.,19 Nigeria requires 600-900Km2/gauge. The Twenty-Three years’ annual maximum rainfall intensities dataset was utilized in fitting the LP3 and P3 distributions using the frequency factor approach while the sample moments and the reduced variate were utilized in the development of the Gumbel Type 1 model of the study area. Annual maximum rainfall intensities for each year were arranged in decreasing order of magnitude. The ranked annual maximum series of rainfall intensity values were analyzed using the Weibull plotting position. Annual maximum rainfall intensities were predicted for return periods of 2, 5, 10, 25, 50, 100 and 200years for distribution under review. The observed and predicted intensities were compared using the approach of simple linear regression as shown in Section 4. The test for normality was carried out using the Shapiro-Wilk test. The EVI distribution parameters were computed using Eqs. 1–3. The determined parameters were then substituted in Eq. 4. The P3 distribution probabilities of exceedance were determined based on the mean, standard deviation, and coefficient of skewness calculated from the datasets. The probabilities of exceedance corresponding to K were read off the Pearson tables as a function of the coefficient of skewness. The observed annual maximum rainfall intensities were converted to the logarithmic series. The log transformed data were utilized for the determination of the LP3 distribution. The mean, standard deviation, and coefficient of skewness were calculated for the logarithms of the data. The probabilities of exceedance of the log transformed dataset corresponding to K were read off from the Pearson tables as a function of the coefficient of skewness. The values of log X for any probability level were computed from Eq. 9. The rainfall intensities XT were obtained for corresponding recurrence intervals using Eq. 10.
The probability distribution on a linear plot of EVI as a function of the reduced variate yT for the study area and data under consideration, is given in Equation (11). The reduced variate yT was computed at return periods of 2, 5, 10, 50, 100 and 200 years yielding corresponding rainfall intensities as shown in Figure 2.
(11)
The P3 distribution as a function of the frequency factors (which are dependent on the coefficient of skew) for the study area and data under consideration, is given in Equation (12). The rainfall intensities XT considering the P3 distribution was computed at return periods of 2, 5, 10, 50, 100 and 200years as shown in Figure 2.
(12)
Where KT is the frequency factor which depends on the recurrence interval T.
The LP3 distribution as a function of the frequency factors (which are dependent of the coefficient of skew) for the study area and data under consideration, is given in Eq. (13). The rainfall intensities XT were obtained for corresponding recurrence intervals (Figure 1) using Eq. (10)
(13)
To draw conclusions about a population based on a regression analysis done on a sample, several assumptions must be true.20 Consequently, test on normality, homoscedasticity, test on lack of autocorrelation (Durbin-Watson test) and the adjusted coefficient of determination.
Test of normality
The statistics of frequencies of the observed data and the predicted values were obtained using the Statistical Package for the Social Sciences (IBM SPSS Statistics 22) software. Normality of a set of data can be predetermined using several statistics such as Skewness and Kurtosis.21 Skewness and Kurtosis were tested for the observed and predicted values. These tests are very important as they are used to establish the asymmetry and peakedness characteristics exhibited by data set.22,23 The Skewness values of -0.143, 0.175, -0.261 and -0.253 were obtained corresponding to observed data set, LP3, EVI and P3 predicted data sets, respectively. The sample distributions of the observed data, EVI and P3 models fitted data were negatively skewed. This implying that the frequent scores are clustered at the higher end and the tails point towards the lower scores.24 On the contrary, the LP3 model fitted data yielded a positive skew implying that the frequent scores are clustered at the lower end and the tails point towards the higher scores.25 The results of skewness showed that the EVI and P3 models portrayed similarities to the observed data as opposed to the LP3 distribution. However, it is worthy of note that the observed data set skew closer to a normal distribution than the said models. Likewise, Kurtosis test conducted revealed values of -1.181, -0.961, -1.006 and -1.057 were obtained corresponding to observed data set for LP3, EVI and P3 models fitted data sets, respectively. As earlier mentioned, this statistical value exposes the peakedness and thus revealed that the distributions are platykurtic in nature.
The probability values of the z-scores of skewness and kurtosis were tested using the following hypothesis:
Ho: That data is from a normal distribution
Ha: That data is not from a normal distribution.
The skewness z-scores of -0.180, 0.220, -0.329 and -0.319 were obtained for the observed data set, LP3, EVI and P3 distributions, respectively. Also, kurtosis z-scores of -0.744, -0.606, -0.634 and -0.666 were obtained for the observed data set, LP3, EVI and P3 distributions, respectively. The results indicate non-significant skewness and kurtosis (at p>0.05). It can therefore be inferred that the observed data set and the LP3, EVI, P3 data sets are from a normal distribution.
Shapiro-wilk test
The Shapiro-Wilk test was used in determining whether the observed data and fitted data are skewed or otherwise. The appropriate null hypothesis (Ho) and alternate hypothesis (Ha) of the Shapiro-Wilk test of normality for the observed and predicted intensities are stated thus:
Ho: That data is from a normal distribution given that p>0.05
Ha: That data is not from a normal distribution given otherwise
For the observed and predicted data sets, the results of Shapiro-Wilk test (Table 1) revealed that corresponding p-values were greater than 0.05 for all distribution models (at p>0.05), thus implying that the distributions are normally distributed.
|
Shapiro-wilk |
|
|
|
Statistic |
df |
Sig. |
Observed_Intensities |
0.976 |
7 |
0.941 |
LogPearsonTypeIII |
0.984 |
7 |
0.975 |
Gumbel |
0.977 |
7 |
0.945 |
PearsonTypeIII |
0.976 |
7 |
0.937 |
Table 1 Test of normality
P-P plots
The expected cumulative probability of normality was plotted against the observed and fitted data as shown in Figure 3. The plots confirm that there are minimal deviations from the diagonal line. Rather the points fall very close to the ideal implying that the sample data are normal.
Comparison of the observed and predicted data sets
The results of the predicted and observed values were compared using the approach of simple regression.
Log pearson type III distribution
The comparison of the observed rainfall intensities and the predicted intensities were carried out using simple regression analysis. The comparison of the observed rainfall intensities and the predicted results obtained by fitting the LP3 probability model yielded a Pearson Correlation coefficient of 0.995 with an Adjusted R2 of 0.988. The value of R2 of 0.990 suggest that the observed data accounts for 99% of the variation in the simulated data (i.e considering the use of LP3 probability distribution model). For these data, F is 509.372, which is significant at p<0.01 (Tables 2&3). Therefore, we can conclude that our regression model results in significantly better prediction of simulated intensities than the use of the mean value of the simulated intensities. The regression coefficient shows that if the observed intensity is increased by 1, then our simulated intensity will increase by a factor of 1.136. For this model, the observed intensities (t=22.569, p<0.01) is shown to be a significant predictor. In addition, the DW statistics of the model generated a value of 0.867. This dimensionless statistic has been reported by Montgomery et al.25 to investigate the presence of correlated errors of residuals during modelling and used to identify multicollinearity of variables or values in simulation.21,22 Therefore, the value of DW below unity is indicative of the possibility of the presence of a positive autocorrelation between adjacent residuals. The DW statistics value of 0.867 is below 1 and therefore a cause for concern.24 Homoscedasticity was tested for using the Breusch–Pagan and Koenker tests. The results (Table 4) revealed that there is no heteroscedasticity in the data, hence, the correctness of the explanatory power of our regression model.
Model |
R |
R Square |
Adjusted R square |
Std. error of the estimate |
Durbin-watson |
1 |
.995a |
0.99 |
0.988 |
17.33601 |
0.867 |
Table 2 Model Summaryb for Log Pearson Type III distribution
Model |
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
1 |
Regression |
153085.3 |
1 |
153085.3 |
509.372 |
.000b |
Residual |
1502.686 |
5 |
300.537 |
|||
|
Total |
154588 |
6 |
|
|
|
Table 3 ANOVAa for Log Pearson Type III distribution
|
LM |
Sig |
BP |
0.729 |
0.393 |
Koenker |
1.81 |
0.178 |
Table 4 Results of homogeneity of variance test for Log Pearson Type III distribution
------- Breusch-Pagan and Koenker test statistics and sig-values --------
Null hypothesis: heteroskedasticity not present (homoskedasticity).
If sig-value less than 0.05, reject the null hypothesis.
Gumbel type I distribution
The comparison of the observed and predicted values yielded a Pearson correlation coefficient of 0.999 with an Adjusted R square of 0.998. The value of R2 0.999 suggest that the observed data accounts for 99.9% of the variation in the simulated data. For these data, F is 3548.123, which is significant at p<0.01 (Tables 5&6). Therefore, we conclude that our regression model results in significantly better prediction of simulated intensities than if the mean value of the predicted intensities was used. The regression coefficient shows that if the observed intensity is increased by 1, then our simulated intensity will increase by a factor of 0.686. For this model, the observed intensities (t=59.566, p<0.01) is shown to be a significant predictor. In addition, the DW statistics of the model generated a value of 1.199, which is neither less than 1 nor greater than 3. Therefore, auto-correlation is expunged from the model.24 Homoscedasticity was tested for using the Breusch–Pagan and Koenker tests. The results (Table 7) revealed that there is no heteroscedasticity in the data, hence, the correctness of the explanatory power of our regression model.
Model |
R |
R Square |
Adjusted R square |
Std. error of the estimate |
Durbin-watson |
1 |
.999a |
0.999 |
0.998 |
3.9672 |
1.199 |
Table 5 Model Summaryb for Gumbel Type I distribution
Model |
Sum of squares |
df |
Mean square |
F |
Sig. |
|
1 |
Regression |
55842.87 |
1 |
55842.87 |
3548.123 |
.000b |
Residual |
78.694 |
5 |
15.739 |
|||
Total |
55921.56 |
6 |
Table 6 ANOVAa for Gumbel Type I distribution
|
LM |
Sig |
BP |
1.891 |
0.169 |
Koenker |
3.596 |
0.058 |
Table 7 Results of homogeneity of variance test for Gumbel distribution
------- Breusch-Pagan and Koenker test statistics and sig-values --------
Null hypothesis: heteroskedasticity not present (homoskedasticity).
If sig-value less than 0.05, reject the null hypothesis.
Pearson type III distribution
The comparison of the observed rainfall intensities and the predicted results obtained by fitting the P3 probability model yielded a Pearson Correlation coefficient of 0.999 with an Adjusted R square of 0.999. The R2 value of 0.999 suggest that observed data can account for 99.9% of the variation in the simulated data. For these data, F is 4473.175, which is significant at p<0.01 (Tables 8&9). Therefore, we can conclude that our regression model results in significantly better prediction of simulated intensities than the use of the mean value of the simulated intensities. The regression coefficient shows that if the observed intensity is increased by 1, then our simulated intensity will increase by a factor of 0.746. For this model, the observed intensities (t=66.882, p<0.01) is shown to be a significant predictor. In addition, the DW statistics of the model generated a value of 1.033, which is neither less than 1 nor greater than 3. Therefore, auto-correlation is expunged from the model.24 Homoscedasticity was tested for using the Breusch–Pagan and Koenker tests. The results (Table 10) revealed that there is no heteroscedasticity in the data, hence, the correctness of the explanatory power of our regression model.
Model |
R |
R square |
Adjusted R square |
Std. error of the estimate |
Durbin-watson |
1 |
.999a |
0.999 |
0.999 |
3.84246 |
1.033 |
Table 8 Model Summaryb for Pearson Type III distribution
Model |
Sum of squares |
df |
Mean square |
F |
Sig. |
|
1 |
Regression |
66044.34 |
1 |
66044.34 |
4473.175 |
.000b |
Residual |
73.823 |
5 |
14.765 |
|||
|
Total |
66118.16 |
6 |
|
|
|
Table 9 ANOVAa for Pearson Type III distribution
|
LM |
Sig |
BP |
0.898 |
0.343 |
Koenker |
2.112 |
0.146 |
Table 10 Results of homogeneity of variance test for Log Pearson Type III distribution
Null hypothesis: heteroskedasticity not present (homoskedasticity).
If sig-value less than 0.05, reject the null hypothesis.
Comparison of the models
The presence of autocorrelation between the observed and LP3 data sets necessitated the comparison of the P3, LP3 and EVI distributions only. The comparison of the three models was carried out using the Analysis of Variance. The EVI was used as the baseline group while Dummy codes 1 and 2 compared the LP3 versus EVI and P3 versus EVI, respectively. The null hypothesis under consideration is that the means of each group are the same while the alternative hypothesis is that the means of the groups are different. The result revealed that there is no significant difference in the groups mean, F=0.204, p>0.05 (Table 11). The F-critical value at 2 and 18 degrees of freedom and 5% level of significance is 3.55. Since the observed F-value is smaller than the critical F-value, the null hypothesis is accepted; that the use of the overall mean is better than the use of the group means.
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
1 |
Regression |
6266.36 |
2 |
3133.18 |
0.204 |
.817b |
Residual |
276627.7 |
18 |
15368.21 |
|||
|
Total |
282894.1 |
20 |
|
|
|
Table 11 ANOVAa
This comparison of the EVI, LP3, P3 distributions for the prediction of annual maximum rainfall for Calabar metropolis using regression analysis was carried out. The results were compared considering the Adjusted R square values, p value, F, t and Durbin–Watson (DW) statistics. The results revealed that the P3 distribution is suitable for the fitting of the annual maximum rainfall of Calabar metropolis. Also, the EVI distribution was suitable for the fitting of the annual maximum rainfall of the study area. The DW test results showed that there is autocorrelation between the LP3 distribution and the observed values. Therefore, there is need for the use of comparatively long-term data to fit the LP3 distribution. Results of similar research carried out by Olofintoye et al.26 for Calabar revealed that the EVI and P3 distributions performed better than the LP3 distribution. Olofintoye et al.26 studied Fifty-Four years data using goodness of fit tests such as chi-square, Fisher’s test, correlation coefficient and coefficient of determination to determine how best the data fits the models. The comparison of the results of the three models was carried out using ANOVA. The result revealed that there is no significant difference in the groups mean, F=0.204, p>0.05. In other words, the use of the overall mean is better than the use of the group means.
The comparison of observed rainfall data set and its predicted counterpart using the frequency factor approach for LP3 and P3 distributions and the sample moment and the reduced variate method for EVI for Calabar metropolis, was carried out using regression analysis approach. The results were compared considering the adjusted R-square values, F, t and DW statistics. The results revealed that the P3 and EVI distributions are suitable for the prediction of rainfall intensities for Calabar Metropolis. The correctness of the results obtained using the regression analysis approach was further corroborated as the Breusch–Pagan and Koenker tests revealed that there is no heteroscedasticity (p<0.05) in the data. The presence of autocorrelation between the observed and the LP3 distribution necessitates the use of comparatively long-term data to fit this model. The comparison of the results of the three models was carried out using ANOVA. The result revealed that there is no significant difference in the groups mean, F=0.204, p>0.05. In other words, the use of the overall mean is better than the use of the group means.
None.
The author declares that there is no conflict of interest.
None.
©2019 Ogarekpe. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.