Research Article Volume 2 Issue 4
^{1}College of Hydraulic & Environmental Engineering, China Three Gorges University Yichang, China
^{2}International Water Management Institute (IWMI) Lahore, Pakistan
^{3}Center of Excellence in Water Resources Engineering (CEWRE), UET, Lahore, Pakistan
^{4}Centre for Integrated Mountain Research (CIMR), University of the Punjab, Pakistan
Correspondence: Muhammad Imran Azam, College of Hydraulic & Environmental Engineering, China Three Gorges University Yichang, China, Tel 861 5672489878
Received: July 30, 2018  Published: August 30, 2018
Citation: Azam MI, Bhatti MT, Xiaotao S, et al. Flood occurrence exploration for ungauged river catchment at Jhelum river basin of Pakistan. Int J Hydro. 2018;2(4):520526. DOI: 10.15406/ijh.2018.02.00120
The Jhelum River catchment includes gauged and ungauged subcatchments with hydrological similarities. In this study, Canonical Correlation Analysis (CCA) was used to explore the correlation between ungauged and gauged subcatchments. Flow data from fifteen gauging stations was obtained for detailed analysis. Linear regression was applied to transfer flood data of gauged catchments to the ungauged subcatchments showing high correlation based on CCA. The flood series of the ungauged subcatchments were analyzed using different flood frequency methods. Floods of different return periods were compared with the floods of identical return periods estimated through graphical method. The results of CCA showed that two main characteristics of the gauged and ungauged subcatchments i.e. catchment area and main channel slope possess different levels of correlation. High correlation (R^{2}=0.95) was observed in case of catchment area while main channel slope has limited correlation (R^{2}=0.58). Furthermore, the relationship between dependent and independent variables showed that high correlation between length of channel, catchment area (R^{2}=0.94) Rainfall and main channel slope (R^{2}=80). In general, the results of CCA based on selected multivariate tests of significance showed that four ungauged subcatchments are fairly correlated with the gauged catchments and flow data from these gauged subcatchments can be transferred to the ungauged subcatchments through linear regression. The flood frequency analysis for the ungauged subcatchments showed that CCA method is more suited for the estimation of floods of different return periods as compared to Graphical method of estimation.
Keywords: Jhelum, catchment, flood, canonical correlation analysis, regression
In developing countries like Pakistan, there are a low number of stream flow recording stations with poor management and low standards. Many of the available flow time series data are either too short to allow for a reliable estimation of extreme events or there is no flow record available at the concern site.^{1} For the planning and designing of hydraulic structure (dams, barrages, head works etc.) flow record of long duration are required. However, in many cases where limited stream flow data is available for analysis, it is generated at the desired location by using the raw and simple technique such as area reduction method. Area reduction method considers the area of catchment as the only parameter for flow estimation at the ungauged site or uses some coefficient representing the catchment characteristics. Therefore, this technique doesn’t establish a direct relationship between the catchment characteristics of gauged and ungauged catchments. The Jhelum River is one of the western rivers authorized to Pakistan according to the Indus Waters Treaty of 1960. The Jhelum River is an eastern tributary of the IBRS (Indus Basin River System) with catchment area of 33,000 square kilometer and length of 500 kilometer up to the Mangla Dam lying in the disputed territory of Kashmir.^{2} In regional frequency analysis, a hydrologically homogeneous region from the statistical point of view is considered. Longterm data from the neighboring catchments are tested for homogeneity and a group of hydrological stations is formed to establish a region. Data from all hydrological stations of this region is polled and analyzed as a group to find frequency characteristics of the specific region.^{3} Dalrymple^{4} has discussed a test to determine flood frequency curves in a region considering it as a homogeneous. Ouarda et al.^{5} used Canonical Correlation Analysis to estimate the flood characteristics of ungauged basins in Ontario (Canada). This method emphasizes graphical and quantitative analysis of the relationship between the flood variables before the data of the gauged basin is used for estimating the flood variables at ungauged sites. RibeiroCorrea et al.^{6} presented a theoretical framework for determining the hydrologically neighboring of a drainage basin based on CCA. Kjeldsen et al.^{7} figured out the use of the index flood method at ungauged sites for estimation of index flood parameters at these sites. Crochet^{8} used regional flood frequency analysis to present estimating the Tyear flood peak discharge with fixed duration for poorly gauged and ungauged catchments. The research study incorporated scaling of a regional flood frequency distribution by the socalled index flood of the catchment (Index Flood Method). Zakaria et al.^{9 }used support vector machines (SVM) model for river flow forecasting at ungauged sites, and compared the performance SVM with other statistical method of multiple linear regression (MLR). Burn^{10} used the approach of region of influence (ROI) framework and derived the information from flood magnitude for examining the homogeneity of flood regions. Chavoshi & Soleiman^{11} applied conventional cluster analysis as well as Fuzzy Logic theory on regionalization of 70 catchments in north of Iran. Grandry et al.^{12} worked on low flow calculations in an ungauged catchment. Mirghani et al.^{13} worked on the regionalization of the Nile water resources for low flow frequency estimated ultimately contribution. Ouarda et al.^{14} presented an adaptation of some regional assessment approaches and a comparison of their performance on the basis of their application to data from the Balsas, Lerma and Pa´nuco River Basins located in Mexico. Four approaches were used in this study for the delineation of homogeneous regions:
A data set of 29 stations from numerous Mexican River catchments of the Balsas region was used. Results demonstrated that CCAbased methods lead to best performances as compare to hierarchical clustering seems generally to lead to less biased quantile estimates; the lowest root mean square error values are almost consistently obtained for the CCAbased methods. The method of canonical kriging does not seem to be more sensitive to the database quality than the other two CCAbased methods. Badyalina & Shabri^{15} studied Model based on canonical correlation analysis (CCA) and group method of data handling (GMDH) were illuminated to obtain a better flood magnitudes estimation at ungauged catchments. CCA was used to form a canonical physiographical space by relating the site characteristics from gauged station. Chebana et al.^{16} The aim of the present paper was to take into account this nonlinearity by introducing the generalized additive model (GAM) in the estimation step of RFA.
A neigh boyhood approach using canonical correlation analysis (CCA)is used to delineate homogenous regions. GAMs possessed a number of advantages such as ﬂexibility in shapes of the relationships as well as the distribution of the output variable. The regional model was applied on a dataset of 151 hydrometrical stations located in the province of Québec, Canada. A stepwise procedure is employed to select the appropriate physiometeorological variables. A comparison was performed based on different elements (regional model, variable selection, and delineation). Badyalina & Shabri^{15} GMDH model is used to distinguish the functional relationship between flood quantiles and the physiographic variables in the CCA space. The proposed model is applied to 70 catchments in Peninsular Malaysia. The jackknife procedure is used to evaluate the results of proposed model. Result of proposed model compared with Traditional CCA model, linear regression (LR) model and GMDH model. The results indicated that the proposed model CCAGMDH deliver the best performance among all models in terms of extrapolation precision. Komi et al.^{17} flood frequency estimates are important for disaster risk management. This study aimed to improving knowledge of flood frequencies in the Volta River Basin through regional frequency analysis based on Lmoments. Hence, three homogeneous groups had been identified based on cluster analysis and a homogeneity test. By using Lmoment diagrams and goodness of fit tests, the generalized extreme value and the generalized Pareto distributions are found suitable to yield accurate flood quantiles in the Volta River Basin. Finally, regression models of the mean annual flood with the size of the drainage area, mean basin slope and mean annual rainfall were proposed to enable flood frequency estimation of ungauged sites within the study area. Hailegeorgis & Alfredsen^{18} performed regional flood frequency analysis (RFFA) using the Lmoments method and annual maximum series (AMS). Hailegeorgis & Alfredsen^{18} used similarity in atsite and regional parameters of distributions, high flow regime and seasonality, and runoff response from rainfall runoff models to identify homogeneous catchments, bootstrap re sampling for estimation of uncertainty and regression methods for prediction in ungauged basins (PUB). New hydrological insights for the region: The rigorous similarity criteria were useful for identification of catchments. Resemblance in runoff response has the least identification power. For the PUB, a linear regression between indexflood and catchment area (R^{2}=0.95) performed superior to a powerlaw (R^{2}=0.80) and a linear regression between atsite quantiles and catchment area (e.g. R^{2}=0.88 for a 200 year flood). There is considerable uncertainty in regional growth curves (e.g.−6.7% to−13.5% and +5.7% to +24.7% respectively for 95% lower and upper confidence limits (CL) for 2–1000 years return periods). The peaks of hourly AMS are 2–47% higher than that of the daily series. Quantile estimates from atsite flood frequency analysis (ASFFA) for some catchments are outside the 95% CL. Uncertainty estimation, sampling of flood events from instant observations and comparative evaluation of RFFA with ASFFA are important. Similarly, many other scientists worked on conical correlation and flood frequency analysis to produce better hydrological analysis of an area. The present study was conducted for the identification of hydrological similarities between ungauged and gauged sites for flood frequency analysis using Canonical Correlation Analysis.
The Jhelum River catchment was selected for detailed analysis. The catchment area of the Jhelum River and its tributaries is 33000km^{2}. Fourteen gauging stations (Sopore, Chinari, Domel, Dudhnial, Nosheri, Muzaffarabad, Naran, Garihabibullah, Dollai, Kohala, AzadPattan, Palote, Kotli and Mangla) were selected for detailed analysis out of which eight were located at the Jhelum River, three on the Neelum River, two on the Kunhar River and one each on Poonch and Kanshi Rivers, respectively. The location of these subcatchments within the Jhelum River catchment is shown in (Figure 1).
Data collection
Flow records: For the present study, the mean daily flow records of the selected stations were collected from Surface Water Hydrology Project (SWHP) of WAPDA. Characteristics of these selected gauging sites are given in Table 1. Jhelum at Azad Pattan has the largest subcatchment among the selected subcatchments with an area of 26485 km^{2. }On the other hand, Jhelum at Sopore is the smallest catchment with catchment area of only 4905km^{2} lying in Indian occupied Kashmir. The information about the length of flow records at different stream gauging stations of the Jhelum River catchment are shown in (Table 2). The longest flow records are available at Kotli (19612009). Domel, Dollai, Dudhnial and Sopore have the short data length
Sr. No. 
Stations 
Catchment area (km2) 
Records 
No. of years 
River 
Sr. no. 
Stations 
Catchment area (km2) 
Records 
No. of years 
River 
1 
Chinari 
13598 
19702012 
42 
Jhelum 
8 
Palote 
1111 
19712012 
42 
Kanshi 
2 
Azad Pattan 
26485 
19782012 
35 
Jhelum 
9 
Kotli 
3238 
19612012 
52 
Poonch 
3 
Mangla 
33411 
19672012 
46 
Jhelum 
10 
Naran 
1036 
19702012 
42 
Kunhar 
4 
Kohala 
24890 
Jhelum 
11 
Domel 
14504 
19761977,19842001 
19 
Jhelum 

5 
Nosheri 
6809 
19802009 
29 
Jhelum 
12 
Dollai 
24406 
19901994,1996 
5 
Jhelum 
6 
Muzaffarabad 
7278 
19632012 
50 
Neelum 
13 
Sopore 
4905 
19701988 
18 
Jhelum 
7 
Gariihabibullah 
2382 
19612012 
52 
Kunhar 
14 
Dudhnial 
6500 
19821992 
11 
Jhelum 
Table 1 List of the gauged & ungauged stations at Jhelum river catchment
Catchment characteristics data
The catchment characteristics are important in regional studies. The selected catchment characteristics (Catchment area, Main channel length, Channel slope, mean elevation of the catchment, mean annual precipitation) were collected from the topographic maps prepared by Soil Survey of Pakistan and by processing digital elevation models using ArcGIS software. The standard 1:50000 scale topographic maps were used to select catchment characteristics.
Canonical correlation analysis (CCA)
CCA was used to establish hydrological similarities between gauged and ungauged catchments of the Jhelum River. Canonical correlation is considered to be the general model on which many other multivariate techniques are based because it can use both metric and nonmetric data for either the dependent or independent variables. The general form of canonical analysis is given in the equation 1 as below;
${Y}_{1}+\text{}{Y}_{2}+\text{}{Y}_{3}+\text{}\mathrm{...}\text{}+\text{}{Y}_{n}=\text{}{X}_{1}+\text{}{X}_{2}+\text{}{X}_{3}+\text{}\mathrm{...}\text{}+\text{}{X}_{n}$ (1)
Dependent Variables=Independent Variables
The details of CCA for analysis of ungauged catchments can be seen from a paper by Ouarda et al.^{5} For CCA Statistical Product and Service Solutions (SPSS) software was used to determine the correlation and significance level among gauged and ungauged catchments. All the selected catchments (gauged and ungauged) were analyzed for dependent and independent variables (Table 2) to establish hydrological similarities between the subcatchments. Statistical Product and Service Solutions (SPSS) software was used for the canonical correlation analysis to find out the correlation between dependent and independent variables.
Sr. no 
Variable name 
Nature 
Catchment 
1 
Latitude(dd) 
Independent 
Gauged 
2 
Longitude 
Independent 
Gauged 
3 
Elevation (m) 
Independent 
Gauged 
4 
Length of the Channel (km) 
Independent 
Gauged 
5 
Catchment Area (km2) 
Dependent 
Ungauged 
6 
Main Channel slope (m/km) 
Dependent 
Ungauged 
7 
Mean Annual Rainfall (mm) 
Independent 
Gauged 
Table 2 Dependent and independent variables used for canonical correlation analysis
Linear regression
Generally, the objective of such a model is to provide a means of predicting or estimating one variable (the dependent variable) from information of a second variable (the independent variable). The general form of linear regression is given in equation 2;
$Y=\text{}\alpha \text{}+\text{}\beta X$ (2)
Where
(α,β) =Constants, X= Independent Variable, Y=Dependent Variable
Analytical frequency analysis
Two probability distributions were used to determine the flood magnitudes at the ungauged sites as well as gauged sites:
Regional flood frequency analysis
Regional flood frequency analysis is a commonly used method to overcome the problems associated with ungauged catchments. The application of this method consists of developing two curves using flood data of the gauged site in the region. The first curve shows the mean annual peak flood versus the catchment area. Once these two curves are developed for the region, a flood frequency curve for any other ungauged catchment in the same region can be constructed. The procedure to develop the two curves can be found in any hydrology book.
Canonical correlation analysis
The relationship between independent variables and dependent variables {catchment area (CA) and main channel slope (MCS)} is given in Table 3. Positive values show direct relation between dependent and independent variables. Whereas negative values denote indirect relationship. Length of the channel (LC) is directly correlated with CA with 0.94 values and rainfall is directly related with MCS with value of 0.80. To test goodness of fit between dependent and independent variables, confidence level was checked in SPSS software. Values of variables have good relations showing direct impact on hydrological homogeneity of the catchment. The significance test of the canonical correlations is straightforward in principle. Different canonical correlations were tested, one by one, beginning with the largest one. Table 4 gives the significance level of canonical correlation with acceptable value for the interpretation is 0.05. To find out the significance level F test was applied in the SPSS, Table 4 presents that all the tests applied to find the significance level are showing high level of significance. Among all the selected tests, Wilks test is showing the best result with the minimum test value of 0.016. Once correlation between dependent variables of ungauged and gauged catchments is established, the next step was to transfer data from the neighboring gauged catchments to the ungauged catchment using linear regression technique.
Covariate 
CA 
MCS 
Latitude 
0.40 
0.30 
Longitude 
0.18 
0.20 
Elevation 
0.51 
0.41 
Length of Channel 
0.94 
0.26 
Rainfall 
0.27 
0.80 
Table 3 Correlations between dependent and independent variables
Sr. no. 
Test name 
Values 
1 
Pillais 
1.52 
2 
Hotellings 
26.76 
3 
Wilks 
0.016 
Table 4 Multivariate tests of significance
Linear regression analysis
To transfer the data of gauged subcatchment to ungauged subcatchment linear regression analysis was performed. For Dudhnial station (ungauged) the data was transferred from Nosheri site. Figure 2(A) represents the regression between Nosheri and Dudhnial for similar length of the record, the regression equation was determined to find out the missing data at Dudhnial station. R^{2} is 0.91 which is showing strong correlation between Dudhnial and Nosheri data. Figure 2(B) represents the regression between Sopore and Chinari for similar length of the record, the regression equation was determined to find out the missing data at Sopore station. R^{2} is 0.91 which is showing strong relation between Sopore and Chinari data. Figure 2(C) represents the regression between Chinari and Domel for similar length of the record, the regression equation was determined to find out the missing data at Domel station. R^{2} is 0.98 which is showing strong relation between Chinari and Domel data. Figure 2(D) represents the regression between Kohala and Dollai for similar length of the record. The regression equation was determined to find out the missing data of Dollai station. R^{2} is 0.98 which is showing strong relation between Dollai and Kohala data.
Flood frequency analysis
Ungauged subcatchments
Flood frequency analysis was performed using past records of peak flow to fabricate the guidance about the probable behavior of future flooding. The analysis provided the information about possible flood magnitude on different return periods and frequency with which certain flood occurred. The Gumbel and Log Pearson TypeIII Distributions were applied on historical record of flows of Sopore, Dudhnial, Dollai and Domel. Chi square test was performed to test the goodness of the fit of the selected distributions. The Chi square test should be less than or equal to 12. Table 5 showing the values of frequency analysis is less than 12.
Sr. no. 
Ungauged stations 
Chisquare test 
Selected distribution 

Gumbel 
Log pearson III 

1 
Sopore 
5 
9 
Gumbel 
2 
Domel 
9 
12 
Gumbel 
3 
Dollai 
6 
9 
Gumbel 
4 
Dudhnial 
4 
4.1 
Gumbel 
Table 5 Chisquare test values of gumbel and log pearson typeIII of ungauge stations
Gauged subcatchments
Flood frequency analysis was performed using past records of peak flow to fabricate the guidance about the probable behavior of future flooding of gauged subcatchments. Table 6 presents the summary of Chisquare test results for Gumbel and Log Pearson TypeIII distributions applied at gauged stations.
${X}_{C}=\text{}\frac{\left(\text{O}\text{E}\right)2}{\text{E}}$
Where
O=observed Values
E=Expected Values
Sr. no. 
Gauged stations 
Chisquare test 
Selected distribution 



Gumbel 
Log pearson III 

1 
Chinari 
26 
6 
Log Pearson III 
2 
Azad Pattan 
9.7 
6.09 
Log Pearson III 
3 
Nosheri 
9 
12 
Gumbel 
4 
Muzaffarabad 
10.2 
14.2 
Gumbel 
5 
GhariHabibullah 
2 
4 
Gumbel 
6 
Palote 
11.5 
9.6 
Log Pearson III 
7 
Kotli 
9.6 
7.1 
Log Pearson III 
8 
Kohala 
12 
10 
Log Pearson III 
9 
Mangla 
18 
10.5 
Log Pearson III 
10 
Naran 
4 
5.4 
Gumbel 
Table 6 Chisquare test values at gauged stations
The Chisquare test value helps in selection of appropriate frequency distribution, the chisquare test value should be less than 12.^{19} However, if the values for many distributions remain below 12 then the distribution with relatively lower value is selected. In our analysis, Chisquare values for selected distributions mostly below 12 except Chinari and Log Pearson TypeIII at Muzaffarabad.
Graphical method for regional flood frequency analysis
Figure 3 shows the catchment areamean annual flood curve. The mean annual flood is the flood with the return period of 2.33 years at the gauged sites of the Jhelum River basin. The relation of catchment areamean annual flood assumes a straight line. As next step, peak flood ratios were calculated. The peak flood Q at different return periods (i.e. 1.25, 2, 5, 10, 20, 50, 100, 200, and 1000) was divided by the mean annual flood (Q_{2.33}) to obtain the ratios. The regional flood frequency curve was then plotted between median of peak flood ratio and the selected return periods as shown in (Figure 4). The catchment areas of ungauged sites were measured from topographic maps as well from DEM using GIS software. For the respective catchment area of each ungauged site, mean annual flood (Q_{2.33}) was read out from Figure 4. The ratio of median Q/Q_{2.33 }is available from Figure 4 at different return periods. The ratio was multiplied with the Q_{2.33 }to obtain flood magnitude at different return periods for the selected ungauged subcatchments. The results of calculation are presented in (Table 7). CCA method was showed slightly increasing trend as compare to Graphical method. Figure 5(C) presents the comparison of flood magnitude on different return periods at Sopore by applying Graphical method and CCA method. CCA and Graphical method were showed same value on 4year return period with flood magnitude of 700cumec. Therefore, flood magnitude overestimated after the 4year return period. There was large difference between both methods on higher return periods. Figure 5(D) presents the comparison of flood magnitude on different return periods at Dudhnial by applying Graphical method and CCA method. Graphical Method over estimating the flood magnitudes
$(Q/{Q}_{2.33})*{Q}_{2.33}$ 

Return periods 
Domel 
Dollai 
Sopore 
Dudhnial 
1.25 
828 
864 
720 
852 
2 
952.2 
993.6 
828 
979.8 
10 
1110.9 
1159.2 
966 
1143.1 
20 
1131.6 
1180.8 
984 
1164.4 
50 
1621.5 
1692 
1410 
1668.5 
100 
1518 
1584 
1320 
1562 
200 
1104 
1152 
960 
1136 
1000 
1035 
1080 
900 
1065 
Table 7 Flood peaks at ungagged subcatchments
Graphical method estimates lower flood peaks than the Canonical Correlation method especially for Dollai where substantial difference (148295%) exists between the floods estimated by the two methods at all return periods. For Domel, Sopore and Dudhnial the graphical method estimated floods closer to the estimated by canonical correlation method at smaller return periods. However, again the estimated floods at higher return periods are much less than estimated by CCA (Canonical Correlation Analysis) method. The comparisons of flood estimated by two methods infer that the graphical method may have underestimated the flood peaks. The performance of CCA method seems to be better approach in flood estimation because it takes into consideration hydrological similarities of the ungauged and gauged subcatchments. Moreover, the frequency analysis is performed on the transported data using most appropriate frequency distribution. Therefore, it can be concluded that canonical correlation method should be preferred over graphical method.
This research was supported by the National Natural Science Foundation of China. The authors would also like to acknowledge the Pakistan Water and Power Development Authority (WAPDA), & Pakistan Meteorological Department (PMD) for providing data for the study.
None.
©2018 Azam, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work noncommercially.