Research Article Volume 1 Issue 1
^{1}Department of Civil and Environmental Engineering and Water Resources Research Center, University of Hawaii at Manoa, USA
^{2}Department of Water Engineering, University of Tabriz, Iran
^{3}Department of Soil Science, University of Tehran, Iran
Correspondence: Sayed M Bateni, Department of Civil and Environmental Engineering and Water Resources Research Center, University of Hawaii at Manoa, Hawaii, USA, Tel 8089564249, Fax 8089565014
Received: May 19, 2017  Published: July 27, 2017
Citation: Lum M, Bateni SM, Shiri J, et al. Estimation of reference evapotranspiration from climatic data. Int J Hydro. 2017;1(1):2530. DOI: 10.15406/ijh.2017.01.00005
This study investigated the capability of M5 Model Tree (M5MT) to predict reference evapotranspiration (ET_{0}). M5MT was trained and tested with climatic data from eight weather stations located in coastal areas of Iran for the years 20002008. It was validated with climatic data from seven California Irrigation Management Information System (CIMIS) weather stations for the year 2015. Four different data combinations were utilized to train, test, and validate the M5MT model. These were: daily mean air temperature, wind speed, relative humidity, and solar radiation (configuration 1); daily mean air temperature and solar radiation (configuration 2); daily mean air temperature and relative humidity (configuration 3); and daily maximum, minimum, and mean air temperature, and extraterrestrial radiation (configuration 4). The PenmanMonteith (PM) equation was used as a standard method to provide target ET_{0} values. Mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R_{2}) were used to evaluate the performance of the M5MT models developed with different input configurations. Results indicated that M5MT was able to successfully estimate ET_{0}. Configuration 1 provided the most accurate results. Configuration 2 showed to have the variables that have a greater influence on ET_{0} than configuration 3. Configuration 4 performed the worst. MAE of ET_{0} estimates from M5MT_{1} was respectively 29%, 55%, and 91% lower than that of M5MT_{2}, M5MT_{3}, and M5MT_{4}, when the model is validated in California. Also, RMSE from M5MT_{1} was 29%, 59%, and 125% smaller than that of M5MT_{2}, M5MT_{3}, and M5MT_{4}, respectively.
Evapotranspiration (ET) is an important component of the hydrologic cycle, which significantly influences crop water requirement and water resource management.^{1 }Accurate ET estimates enable the proper determination of water budgeting and allocation, and thus improves water use efficiency of irrigation systems. In situ methods are often used to measure ET in a controlled crop area, but they are costly, labor intensive, and only provide localized estimates .^{2,3 }To avoid the high costs, empirical, artificial intelligence, and physical models have been developed to estimate reference ET (ET_{0}).^{4} ET_{0} is the combined process of evaporation and transpiration from a theoretical grass surface with an assumed height of 0.12meter, a surface resistance of 70 s/m, and a surface albedo of 0.23.^{5} The PenmanMonteith (PM) equation has been accepted as a standard approach to estimate ET_{0}. However, this method requires many climatic variables that are typically unavailable.^{6} Due to the drawbacks of in situ methods and the PM equation, Artificial Intelligence (AI)based approaches have been used to estimate ET_{0}^{7} utilized Artificial Neural Network (ANN) to approximate ET_{0} in the arid, semiarid, and subhumid regions of Inner Mongolia. In comparison with Multiple Linear Regressions (MLR_{s}), ANN showed more accurate estimates.^{4 }estimated daily ET_{0} in Northern Spain using Gene Expression Programming (GEP) and compared its performance with those of the Adaptive NeuroFuzzy Inference System (ANFIS), HargreavesSemani, and PriestleyTaylor models. Results indicated that GEP provided the most accurate estimates followed by ANFIS ^{8 }used ANN to predict ET_{0} in arid and semiarid areas of northwest China. ANN was found to estimate ET_{0} more accurately than MLRs, PriestleyTaylor, HargreavesSemani, and PenmanMonteith (PM) equations ^{9} predicted ET_{0} in northern, mid, and southern part of Iraq using Extreme Learning Machines (ELM). Compared to the PM equation and Feed Forward Back Propagation (FFBP) models, ELM estimated ET_{0} better Recently, M5 Model Tree (M5MT) has been used in many engineering problems, and showed promising results.^{10−12 }M5MT is an extension of a regression tree and provides the user with multiple linear functions.^{1‒13 }This approach is capable of handling high dimensional datasets and the resulting model tree is significantly smaller and more precise than regression trees.^{14} Moreover, the M5MT is not a blackbox and provides a relationship between the independent and dependent variables.^{11 }Several studies have shown M5MT to be an effective technique to provide accurate results^{15}showed M5MT is advantageous over ANN because it generated more accurate wave height estimates^{11}found that the performance of M5MT was comparable to ANN, but indicated that the training process of M5MT was faster than that of ANN^{14 }performed a comparison of M5MT and Support Vector Machines (SVM) in forecasting daily river flow. Results showed M5MT performed similar to SVM, but it is computationally less expensive^{16}concluded M5MT to be better than ANN as it provided a more straightforward structure consisting of linear regression equations. The objective of this study is to estimate ET_{0} from climatic data using M5MT. Four different combinations of climatic data were used in M5MT. These combinations were daily mean air temperature, wind speed, relative humidity, and solar radiation (configuration 1); daily mean air temperature and solar radiation (configuration 2); daily mean air temperature and relative humidity (configuration 3); and daily maximum, minimum, and mean air temperature, and extraterrestrial radiation (configuration 4). An assessment of which data combination has the most amount of information about ET_{0} was made.
Studied sites and data: Daily climatic data as well as ET_{0} estimates from the PM equation were used to train, test, and validate the M5MT model. The training dataset consisted of data from eight coastal weather stations in Iran, collected from 2000 to 2007. The testing dataset contained data from the same eight stations, but for 2008. Performance of the M5MT models was validated with seven California Irrigation Management Information System (CIMIS) weather stations in 2015. CIMIS dataset was used for models validation to evaluate their feasibility in other regions, and examine whether they are applicable in areas that they were not trained in. Figure 1 & Figure 2 show the spatial distribution of the utilized weather stations in Iran and California, respectively. The recorded data consisted of daily average relative humidity (RH_{mean}), and wind speed (Ws), daily maximum, minimum and mean air temperature (T_{max}, T_{min}, and T_{mean}), and incoming solar radiation (Rs). Table 1 lists the geographical coordinates of each weather station and the corresponding annual averages of the collected data. ET_{0} is the reference evapotranspiration (mm/d), Δ is the slope of saturation vapor pressure function (kPa/°C), R_{n} is the net radiation (MJ/m2day), Ra is extraterrestrial radiation (mm/d), G is the soil heat flux density (MJ/m2day), γ is the psychrometric constant (kPa/°C), T_{mean} is the mean air temperature (°C), T_{max} is the daily maximum air temperature (°C), T_{min} is the daily minimum air temperature (°C), Ws is the daily mean wind speed at a height of 2 m (m/s), RH is relative humidity (%), es is the saturation vapor pressure (kPa), and ea is the actual vapor pressure (kPa). The commonly used equations for the estimation of ET_{0} are presented in Table 2. Based on the proposed equations in Table 2 and the study conducted by,^{4 }four input combinations were used to predict ET_{0}. The following data configurations were used to train, test, and validate M5MT:
Configuration 1: W_{s}, RH_{mean}, T_{mean}, and R_{s} [M5MT_{1}]
Configuration 2: T_{mean} and R_{s }[M5MT_{2}]
Configuration 3: T_{mean} and RH_{mean} [M5MT_{3}]
Configuration 4: T_{mean}, T_{max}, T_{min} and R_{a} [M5MT_{4}]
Three statistical metrics (mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R_{2}) were used to compare performance of the four M5MT models (i.e., M5MT_{1}, M5MT_{2}, M5MT_{3}, and M5MT_{4}). These statistical metrics are given below:
$\text{MAE}=\frac{{\sum}_{i=1}^{n}\left{O}_{i}{P}_{i}\right}{n}$ ;(1)
$\text{RMSE}=\sqrt{\frac{{\sum}_{i=1}^{n}{\left({O}_{i}{P}_{i}\right)}^{2}}{n}}$ ;(2)
${\text{R}}^{2}={\left[\frac{{\sum}_{i=1}^{n}\left({O}_{i}\overline{O}\right)\left({P}_{i}\overline{P}\right)}{\sqrt{{\sum}_{i=1}^{n}{({O}_{i}\overline{O})}^{2}}\sqrt{{\sum}_{i=1}^{n}{({P}_{i}\overline{P})}^{2}}}\right]}^{2}$ ;(3)
Country 
Station 
Location 

Climatic Parameters 








altitude (m) 
Latitude (°) 
Longitude (°) 
Tmax (°C) 
Tmin (°C) 
Tmean (°C) 
Rs (MJ/m2d) 
Ws (m/s) 
Rhmean (%) 
Iran 
Abadan 
6.6 
30.2 
48.2 
34 
18.9 
26.3 
19.4 
3.2 
64.7 
Ahwaz 
22.5 
31.2 
48.4 
34 
19.4 
26.6 
18.5 
2.4 
65.6 

BandareAbbas 
9.8 
27.1 
56.2 
32 
23.4 
27.5 
17.4 
3.7 
78.6 

BandareLenge 
22.7 
26.3 
54.2 
33 
21.9 
27.2 
19.4 
3.7 
73.2 

Bushehr 
9 
28.6 
50.5 
30 
20.6 
25.3 
18.3 
3.5 
75.8 

Gorgan 
13.3 
36.5 
54.1 
23 
12.9 
18.2 
15.8 
2.6 
63.6 

Rasht 
8.6 
37.1 
49.4 
21 
12.3 
16.6 
15.8 
1.6 
76.5 

Sari 
23 
36.3 
53 
23 
13.5 
18 
16.3 
2.2 
75.6 

California, USA 
Atascadero 
269.8 
35.5 
121 
24 
5.8 
14.1 
16.5 
1.2 
64.2 
Delano 
91.4 
35.8 
119 
27 
9.8 
17.6 
18.6 
1.4 
55.6 

Gilroy 
56.4 
37 
122 
24 
7.5 
14.8 
17.4 
2.2 
67.2 

Arleta 
298.7 
34.3 
118 
26 
11.6 
18.3 
18.6 
1.6 
49.7 

Gerber South 
75 
40 
122 
25 
9.9 
17.2 
18.3 
2.3 
59.8 

Woodland 
25 
38.7 
122 
25 
9.8 
17.1 
17.5 
2.2 
54.8 


Diamond Springs 
624.8 
38.6 
121 
22 
10.5 
16.1 
17.7 
1.7 
49.9 
Table 1 Geographical location of stations and annual averages of climatic data
Study 
ET0 Equation 
PenmanMonteith et al. [2] 
${\text{ET}}_{0}=\frac{0.408\Delta \left({R}_{n}G\right)+\gamma \frac{900}{{T}_{mean}+273}{W}_{s}\left({e}_{s}{e}_{a}\right)}{\Delta +\gamma \left(1+0.34{W}_{s}\right)}$ 
Makkink [20] 
${\text{ET}}_{0}=0.61\frac{\Delta {R}_{s}}{\left(\Delta +\gamma \right)\lambda}0.12$ 
Romanenko [21] 
${\text{ET}}_{0}=0.0018{\left({T}_{mean}+25\right)}^{2}\left(100RH\right)$ 
Hargreaves & Samani [22] 
${\text{ET}}_{0}=0.0023\frac{{R}_{a}}{\lambda}\left({T}_{mean}+17.8\right)\sqrt{{T}_{\mathrm{max}}{T}_{\mathrm{min}}}$ 
Table 2 Different ET0 equations
Where n is the number of data points, O_{i} and P_{i }are the ith estimated ET_{0} values respectively from the PM and M5MT models, and O ̅ and P ̅ are the mean predicted ET_{0} values from the respective models. The R_{2} signifies the percentage of data that conforms to the regression line at a 45degree angle. If all points coincide with the regression line, the variation between the variables can be explained by a linear relationship and R_{2} would result in an optimal value of one. MAE describes the average of a set of absolute errors with an optimal value of zero. Since only the magnitude of the error is considered, nonnegative values are obtained with no upper bound. RMSE is a measure of difference between the observed and simulated values. The greater concentration of data around the 1:1 line, the lower the value of RMSE becomes. RMSE does not have an upper bound and its optimal value is zero.
M5 Model Tree (M5MT): M5MT is an improvement of a regression tree, which replaces specific numerical values with linear regression functions relating input variables to corresponding output variables.^{12,13 }Two different stages are involved to generate a final model tree. The first stage divides the input space into different regions that correspond to nodes within a treelike structure. The standard deviation of each region is calculated and corresponds to the amount of error for each of the nodes created. Next, the expected error reduction is calculated for every value propagating to a specific node. The calculated error uses the following formula known as the standard deviation reduction (SDR):^{1}
Where T is the set of values that reach a node, Ti is the subset of values that have the i^{th} outcome of a potential set, A is the final amount of values in set T, and sd is the standard deviation. This dividing process results in the algorithm to perform iterations, which generates subsequent nodes that will exhibit a reduction in standard deviation from the previous nodes. The algorithm will continue to iterate, considering all possible splits, and ends when the least expected error is attained.^{17}The conclusion of the first stage leaves the model tree to have large structure, which initiates pruning of the overgrown model tree (i.e., the second stage of M5MT).^{12 }Pruning will occur if the estimated error of nodes branched below a specific node is greater.^{1−18} Linear regression equations will replace the pruned nodes, resulting in a more simplified and accurate model tree. ^{15‒19}
Building M5 Model Tree: This study used WEKA (Waikato Environment for Knowledge Analysis), which is a data mining software to estimate ET_{0}. It consists of a wide variety of machine learning algorithms including M5MT. The WEKA interface provided different testing options (i.e., percentage split, traintest, and cross validation) to assist in the modeling process. Among the three aforementioned options, the TrainTest method was selected because of its better performance (Table 3).
Methods 
MAE (mm/d) 
RMSE (mm/d) 
R2 
Percentage Split 
0.255 
0.3422 
0.9904 
TrainTest 
0.2328 
0.3189 
0.9914 
Cross Validation 
0.2396 
0.3337 
0.9906 
Table 3 Performance of M5MT for different testing options
Performance of M5MT models in Iran (training and testing stages): Table 4 shows MAE, RMSE, and R_{2} of ET_{0} estimates from the four M5MT models for training and testing stages. The training process resulted in MAE, RMSE, and R_{2} values ranging between 0.33−0.76mm/d,0.47‒1.03mm/d, and 0.81−0.96, respectively. The testing stage showed similar values that ranged between 0.38‒0.77mm/d (MAE), 0.55−1.06mm/d (RMSE), and 0.800.95 (R_{2}). It was observed that the presence or absence of certain input parameters influenced the performance of the models. Comparing the models with two input variables, M5MT_{3} (whose inputs were Tmean and RHmean) had higher MAE (0.76mm/d) and RMSE (1.0mm/d) than M5MT_{2} (whose inputs were Tmean and R_{s}) in the training stage. Solar radiation tends to have a greater effect on ET_{0}, as replacing mean relative humidity by solar radiation increased accuracy in the training phase and decreased MAE and RMSE by 23% and 21%, respectively. This is in agreement with the results from the testing stage, in which a respective 20% and 15% decrease in MAE and RMSE was observed when solar radiation was used in lieu of mean relative humidity. Assessing the performance of the M5MT models with four input variables during the training stage, M5MT_{4} (whose inputs were T_{mean}, T_{max}, T_{min} and R_{a}) had larger MAE (0.53mm/d), RMSE (0.73mm/d) , lower R2 (0.91) values than M5MT_{1} (whose inputs were W_{2}, RH_{mean}, T_{mean}, and R_{s}). This is consistent with the testing phase because MAE and RMSE are decreased by 47% and 49% by using M5MT_{1} instead of M5MT_{4}. Figure 3A & Figure 3B show estimated ET_{0} values from the four M5MT models versus PM ET_{0} estimates for training and testing stages, respectively. The concentration of data around the 1:1 line in Figure 3A & Figure 3B reflects the low RMSE values obtained in both training and testing stages. In general, the small MAE and RMSE and high R_{2} suggest that M5MT can accurately estimate ET_{0}. Overall, the results in Figure 3A & Figure 3B and Table 4 indicate that the combination M5MT_{1} provided the most accurate ET_{0} estimates during the training and testing stages. In the training stage, MAE (RMSE) of ET_{0} estimates from M5MT_{1} were respectively 88% (81%), 130% (119%) and 61% (55%) lower than those of M5MT_{2}, M5MT_{3} and M5MT_{4}. A similar tendency was seen in the testing stage with MAE (RMSE) values of M5MT_{1} were 68% (67%), 103% (93%), and 47% (49%) lower than those of M5MT_{2}, M5MT_{3} and M5MT_{4}, respectively.

Training: Iran (20002007) 

Testing: Iran (2008) 





M5MT1 
M5MT2 
M5MT3 
M5MT4 
M5MT1 
M5MT2 
M5MT3 
M5MT4 
MAE (mm/d) 
0.33 
0.62 
0.76 
0.53 
0.38 
0.64 
0.77 
0.56 
RMSE (mm/d) 
0.47 
0.85 
1.03 
0.73 
0.55 
0.92 
1.06 
0.82 
R2 
0.96 
0.87 
0.81 
0.91 
0.95 
0.85 
0.8 
0.88 
Table 4 Statistical metrics of M5MT models for training and testing stages
Performance of M5MT models in California (validation stage): To validate the robustness of the M5MT models, they were applied to seven CIMIS weather stations in California. It should be noted that the CIMIS data was not used to train the M5MT models. The statistical metrics of the M5MT models were given in Table 5. MAE, RMSE, and R_{2} values ranged between 0.651.24mm/d, 0.801.80mm/d, and 0.680.90, respectively Table 5. Figure 4 illustrates plots of ET_{0} estimates from the four M5MT models versus PM ET_{0} estimates at the seven CIMIS stations. Performance of the models can be ranked as follows: M5MT_{1}, M5MT_{2}, M5MT_{3}, and M5MT_{4}. Similar to the training and testing phases, M5MT_{1} outperformed the other models when tested at the California stations. Although no CIMIS data was used to train the M5MT_{1} model, it performed well when applied to the stations in California. This implies that the M5MT_{1} can provide accurate results in other regions. Figure 4 & Figure 5 showed that the M5MT_{1} model with MAE, RMSE, and R_{2} values of 0.65mm/d, 0.80mm/d, and 0.90, respectively, can be selected as the best M5MT model for ET_{0} estimation. With only two input parameters (i.e., M5MT_{2} and M5MT_{3}), M5MT_{2} showed to provide better results than M5MT_{3}. This implies that a combination of Rs and Tmean contributes more significantly towards the estimation of ET0 than a combination of RH_{mean} and T_{mean}. Figure 5 indicates time series of ET_{0} estimates from M5MT_{1} and PM models at four CIMIS stations (i.e., Delano, Gerber South, Woodland, and Diamond Springs). As shown, the estimated ET_{0} values from M5MT_{1} agree well with those of the PM equation. Remarkably, ET_{0} estimates from M5MT_{1} captured the fluctuations of the PM ET_{0} values. Compared to M5MT_{1}, MAE values of M5MT_{2}, M5MT_{3}, and M5MT_{4} were respectively 29%, 55%, and 91% larger. Also, RMSE values from M5MT_{2}, M5MT_{3}, and M5MT_{4} were respectively 29%, 59%, and 125% greater than that of M5MT_{1}.

Validation: california (2015) 




M5MT1 
M5MT2 
M5MT3 
M5MT4 
MAE (mm/d) 
0.65 
0.84 
1.01 
1.24 
RMSE (mm/d) 
0.8 
1.03 
1.27 
1.8 
R2 
0.9 
0.93 
0.74 
0.68 
Table 5 Statistical metrics of M5MT models for validation stage
This study examined the ability of M5 Model Tree (M5MT) to estimate reference evapotranspiration (ET_{0}) from climatic data. Four combinations of data were used in the M5MT model. These combinations were daily mean air temperature, wind speed, relative humidity, and solar radiation (configuration 1); daily mean air temperature and solar radiation (configuration 2); daily mean air temperature and relative humidity (configuration 3); and daily maximum, minimum, and mean air temperature, and extraterrestrial radiation (configuration 4). The objective was to determine which data combination had the most significant amount of information on ET_{0}. The results indicated that M5MT can estimate ET_{0} accurately. Data combination 1 generated the most accurate ET_{0} estimates and thus consisted of variables that have the most amount of information on ET_{0} compared to data combinations 2, 3, and 4. Comparing M5MT models with two input variables, configuration 2 resulted in more accurate ET_{0} estimates than configuration 3. This suggests the greater importance of solar radiation and air temperature in comparison to relative humidity and air temperature to estimate ET_{0}.
None
None.
©2017 Lum, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work noncommercially.