Research Article Volume 2 Issue 4
^{1}Assistant Professor, College of Sericulture, Assam Agricultural University, India
^{2}Professor, Department of Statistics, Assam University, India
Correspondence: Hemanta Saikia, Assistant Professor, College of Sericulture, Assam Agricultural University, India
Received: June 06, 2018  Published: July 2, 2018
Citation: Saikia H, Bhattacharjee D. Survival ability of Indian and overseas batsmen on the cricket pitch in Indian premier league. MOJ Sports Med. 2018;2(3):113116. DOI: 10.15406/mojsm.2018.02.00057
Twenty20 format of cricket is a fast track ball game compared to the other formats of cricket viz. Test and Oneday International (50over a side). The Indian Premier League (IPL) is a national franchise based Twenty20 cricket tournament initiated by Board of Control for Cricket in India (BCCI). In this format, each batsman tries to score maximum runs in minimum balls. This fact increases the probability of dismissal of a batsman. As fall of wickets leads to the loss of resources of the batting side, thus it has an impact on the result of the game. This study tries to examine the survival ability of Indian and overseas batsmen in IPL 2012 season using a probabilistic model. The proposed model can be used to forecast the survival rate of the batsmen on the pitch in other format of cricket also, while the game is in progress. The findings of the study can be used to arrange the batting order of a team in Twenty20 cricket based on the match situation.
Keywords: cricket, probability, survival analysis, sport
Cricket is an outdoor game played between two teams of eleven (11) players each in a circular ground. It is administered by certain rules and regulations, where the interaction between bat and ball takes place on a 22yard hard surface in the middle of a circular ground called the cricket pitch. Unlike other sports, there are different versions of cricket. The different versions of cricket can be broadly classified as unlimited overs cricket (Test matches) and limited overs cricket (Oneday and Twenty20). Indian Premier League (IPL) is a national franchise based Twenty20 format of cricket league initiated by Board of Control for Cricket in India (BCCI) in 2008. In IPL, each team faces only twenty (20) overs in a match, therefore, within these limited overs every batsman tries to score maximum runs in minimum balls. In the process of scoring runs quickly, the batsmen are exposed to the risk of losing their wicket. This fact increases the probability of dismissal of a batsman. As fall of wickets leads to the loss of resources of the batting side, so it has an impact on the result of the game. However, it does not mean that stability of a batsman on the pitch would help a team to win the match. Evidently, he should have scored runs as quickly as possible. Thus, in Twenty20 cricket, one can atleast measure how much time a batsman can survive or how many balls a batsman can face on the cricket pitch while batting. Therefore, the study makes an attempt to measure as well as compare the standing capability of Indian and overseas batsmen on the cricket pitch in IPL 2012 using survival analysis.
Survival analysis is defined as a set of methods for analyzing data where the outcome variable is the time until the occurrence of a particular event of interest.^{1} The event could be death due to cancer, occurrence of a disease, relief from a severe back pain, etc. Let us take an example to explain mathematical definition of survival function. Suppose the actual survival time of an individual (say) t which can be regarded as the value of a variable T (i.e. associated with the survival time). It can take any nonnegative value. The different values that T can take have a probability distribution, so the variable T can be considered as a random variable. Now for the random variable T, the probability distribution function of T can be defined as F(t) and it is given by
$F(t)=P(T<t)={\displaystyle \underset{0}{\overset{t}{\int}}f(x)\text{\hspace{0.17em}}dx}$ (1)
Which represents the probability that the survival time is less than some value t. Now the survival function is defined as the probability that the survival time is greater than or equal to t. Usually, it is denoted by S(t) and given by
$S(t)=P(T\ge t)\Rightarrow 1P(T<t)\Rightarrow 1F(t)\Rightarrow 1{\displaystyle \underset{0}{\overset{t}{\int}}f(x)\text{\hspace{0.17em}}dx}$ (2)
Therefore, the survival function can be used to represent the probability that an individual survives from the time origin to some time beyond t. The survival time or time to an event of interest can be measured in days, weeks, years, etc. in which the objects or subjects are followed over a specified period of time to pinpoint the event of interest occurs. Though its uses in medical, clinical trial, actuarial science, etc. are hefty, but still the application of survival analysis in sport (especially in cricket) is limited. A few studies have found in this regard are explicitly mentioned here. Danaher^{2} applied the survival analysis to find an estimate of a cricketer’s unknown batting average^{3} based on the product limit estimator. Similar productlimit estimation technique was adopted by Kimber and Hansford^{3} for assessing the batting performance of cricketers based on runs scored. The product limit estimator (PLE) is a nonparametric estimator originally proposed by Kaplan and Meier^{4} and it is defined as 
$PLE={\displaystyle \prod _{{t}_{i}\langle \text{\hspace{0.17em}}t}\left(1\frac{{d}_{i}}{{n}_{i}}\right)}$ , ${t}_{1}{t}_{2}{t}_{3}\dots {t}_{n}$ (3)
Where t_{i} be the observed times of n samples until the event of interest occurred from a given population. However, sometimes lack of information arises when observations have some information available for the event of interest but the information is not complete. This incomplete information is termed as censoring. If there is no censoring, n_{i} is the number of survivors prior to time t_{i}. But if there is a censoring, n_{i} is the number of survivors minus the number of censored cases in the sample. To measure the survival capability of cricketers on the cricket pitch, censoring can be perceived in those situations when a batsman remains notout in limited overs cricket. It can be termed as socalled right censoring data.
Following the work of Kimber and Hansford,^{4} product limit estimator was used by Das^{6} to estimate the adjusted batting average of some selected cricketers. He argues, it has been revealed from the past information that batsmen have a variable risk of getting dismissed based on their current score in the innings. Thus, he proposed to model the batsmen’s scores using generalized geometric distribution. A similar problem was also addressed by van Staden^{7} developing a new batting criterion named as ‘survival rate’. It is defined as the number of balls faced in all innings divided by the number of completed innings. Symbolically,
$SV=\frac{Number\text{\hspace{0.17em}}of\text{\hspace{0.17em}}balls\text{\hspace{0.17em}}faced}{Number\text{\hspace{0.17em}}of\text{\hspace{0.17em}}completed\text{\hspace{0.17em}}innings}=\frac{1}{n}\left({\displaystyle \sum _{i=1}^{n}{b}_{i}}+{\displaystyle \sum _{i=n+1}^{n+m}{b}_{i}^{*}}\right)$ (4)
where b_{i} (i = 1, 2, … , n) represents the number of balls faced by the batsman in n completed innings and (i = n+1, n+2, …, n+m) represents the number of balls faced by the batsman in (n+m) notout innings. Now on the basis of equation (2) and the batting average developed by Maini and Narayanan (2007) which is based on average exposuretorisk (AV_{exposure}), van Staden et al (2010) proposed a new meaningful batting average. It is based upon exposure using survival rate and it is defined as
$A{V}_{survival}=\frac{SV}{A{V}_{\mathrm{exp}osure}}$
Where $A{V}_{\mathrm{exp}osure}=\frac{{\displaystyle \sum _{i=1}^{n}{x}_{i}}+{\displaystyle \sum _{i=n+1}^{n+m}{x}_{i}^{*}}}{{\displaystyle \sum _{i=1}^{n}{r}_{i}}+{\displaystyle \sum _{i=n+1}^{n+m}{r}_{i}^{*}}}$ (5)
Where ${r}_{1},\text{\hspace{0.17em}}{r}_{2},\text{\hspace{0.17em}}\mathrm{...},\text{\hspace{0.17em}}{r}_{n}$ and ${r}_{n+1}^{*},\text{\hspace{0.17em}}{r}_{n+2}^{*},\text{\hspace{0.17em}}\mathrm{...},\text{\hspace{0.17em}}{r}_{n+m}^{*}$ denote the batsman’s exposure in n completed innings and m notout innings respectively. Here, ${r}_{i}=1$ if the score in i^{th} innings is a completed score, which means that, the exposure is one for all completed innings.
Otherwise, ${r}_{i}^{*}=\{\begin{array}{l}{b}_{i}^{*}/\overline{b},\text{\hspace{1em}}if\text{\hspace{0.17em}}the\text{\hspace{0.17em}}score\text{\hspace{0.17em}}is\text{\hspace{0.17em}}a\text{\hspace{0.17em}}notout\text{\hspace{0.17em}}score\text{\hspace{0.17em}}and\text{\hspace{0.17em}}{b}_{i}^{*}<\overline{b}\\ 1,\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}else\end{array}$
Where $\overline{b}$ is the average number of balls faced by a batsman in his (m+n) innings and it is defined as,
$\overline{b}=\frac{Number\text{\hspace{0.17em}}of\text{\hspace{0.17em}}balls\text{\hspace{0.17em}}faced}{Number\text{\hspace{0.17em}}of\text{\hspace{0.17em}}innings}=\frac{1}{n+m}\left({\displaystyle \sum _{i=1}^{n}{b}_{i}}+{\displaystyle \sum _{i=n+1}^{n+m}{b}_{i}^{*}}\right)$
At this point, it is very crucial to clarify that the word ‘survival rate’ stands here to exemplify the capability of a batsman to remain present on the cricket pitch during a match. So that there will be no confusion with the measure defined by van Staden^{7} (c.f. equation (4)). When we mentioned the word “survival” in terms of batsman in cricket, it could be the end of the time spent by a batsman on the pitch before dismissal. As mentioned earlier, time to an event of interest can be measured in days, weeks, etc. Similarly, a batsman’s ability to stand on the pitch in the match(es) can be measured in terms of number of balls faced by him. It can be considered as an outcome variable in socalled survival analysis. Thus, if a batsman has been facing consistently colossal number of balls in the match(es) then he would have the higher probability of survival on the pitch.
The approach of this study is different than earlier studies, as it does not focus on any single performance statistics of batsman like batting average, strike rate, etc. Instead, a nonparametric estimator established by Kaplan and Meier (1958) a long back is used to calculate survival probability of batsman on the pitch based on the information of number of balls faced as well as whether the player was out or notout in the match(es). The methodology applied here is discussed below in details. Let O_{i} = 1, if a batsman out in the i^{th} ball of a match and 0 otherwise, n_{i} be the number of batsmen survives prior to b_{i} balls of a match, where b_{i} is the observed number of balls faced by the batsmen in a match. However, as mentioned earlier, to measure the survival capability of batsmen, censoring (more specifically right censoring) can be perceived in those situation of a match when a batsman remains notout in limited overs cricket. Therefore, if any batsman remains notout in a limited overs cricket then n_{i} be the number of batsmen survives minus the number of batsmen notout in prior to b_{i} balls of a match. Now the KaplanMeier (KM) estimator in terms of the game of cricket is defined as
$S(b)={\displaystyle \prod _{{b}_{i}<{b}_{i+1}}^{}\left(1\frac{{O}_{i}}{{n}_{i}}\right)}$ Where b_{1} b_{2} … and i = 1, 2, …, 120 (4)
Thus, the survival probabilities of the batsmen on the cricket pitch are computed using the above defined estimator. As we know that the precision of any estimate is reflected in the standard error of the estimate. Therefore, the standard error of KM estimate is computed as an essential aid to the interpretation of estimate. In this regard, Peto et. al. (1977) proposed a formula to compute standard error ofand it is defined as
$se\left\{\widehat{S}(b)\right\}=\frac{\widehat{S}(b)\text{\hspace{0.17em}}\sqrt{\left\{\text{\hspace{0.17em}}1\widehat{S}(b)\right\}}}{\sqrt{{n}_{i}}}$ (5)
The expression (5) is conservative for standard error of because the standard errors will tend to be larger than they actually ought to be. Thus, the formula proposed by Greenwood’s (1926) for standard error of is usually recommended. The Greenwood’s (1926) standard error formula can be defined as
$se\left\{\widehat{S}(b)\right\}=\widehat{S}(b){\left\{{\displaystyle \sum _{}^{}\frac{{O}_{i}}{{n}_{i}\left({n}_{i}{O}_{i}\right)}}\right\}}^{\frac{1}{2}}$ For b_{k}≤b≤b_{k}_{+1} (6)
Now confidence interval for the corresponding value of the survival function can be obtained based on the above standard error of KM estimate. Finally, based on the KM estimate, a survival plot is being depicted to identify the survival ability of Indian and overseas batsmen on the cricket pitch.
Hypothesis testing
Hypothesis testing is the modest way of examining survival ability of overseas and Indian batsmen on the cricket pitch. It allows assessing the extent to which whether an observed set of data are consistent with a particular hypothesis or not. Here the working hypothesis is that there is no difference between the survival ability of overseas and Indian batsmen on the cricket pitch. The wellknown logrank test is used to test the working hypothesis. This is the appropriate nonparametric test to use when the right censored data are noninformative, as the case is comparable here in case of notout batsmen. In order to apply logrank test, the survival ability of overseas and Indian batsmen computed separately. It compares observed and expected number of event of interest from both the groups. The groups in logrank test are labeled as overseas (coded as “1”) and Indian (coded as “2”) players. Now suppose there are k different distinct balls, b_{1} < b_{2} < … < b_{k}, where batsmen are out across the two groups. Let O_{1i} be the individual batsman out in overseas group and O_{2i} be batsman in Indian group. Again, suppose that n_{1i} be the total number batsmen out in overseas group and n_{2i} be the total number batsmen out in Indian group. Consequently, there are O_{i} = O_{1i}+O_{2i} batsmen out in a tournament from total of n_{i} = n_{1i }+ n_{2i} batsmen, at b_{i} ball. Now, under the null hypothesis, it is defined as
${W}_{L}=\frac{{U}_{L}^{2}}{{V}_{L}}~{\chi}_{1}^{2}$ (7)
Where ${U}_{L}={{\displaystyle \sum _{i=1}^{k}\left({O}_{1i}{E}_{1i}\right)}}^{2}$ and ${V}_{L}=Var\left({U}_{L}\right)={\displaystyle \sum _{i=1}^{k}{v}_{1i}}$
Since the different balls are independent of one another, the term ${V}_{L}$ (i.e. variance of ${U}_{L}$ ) is sum of the variances of ${O}_{1i}$ and it is given by
${v}_{1i}=\frac{{n}_{1i}{n}_{2i}{O}_{i}\left({n}_{i}{O}_{i}\right)}{{n}_{i}^{2}\left({n}_{i}1\right)}$ (8)
The statistics W_{L} summarizes the extent to which the observed number of survival balls in the two groups of batsmen deviate from those expected number of survival balls in the cricket pitch, under the null hypothesis of no group differences. The larger the value of logrank test statistics (i.e. W_{L}), greater the evidence against the null hypothesis.
In IPL 2012, each franchisee has maximum of eight overseas players per squad. However, only four of them can be played in the playing XI for each match. There were 14 league matches usually played by each of the IPL team prior to knockout stage of the tournament. Therefore, a lot of Indian as well as overseas players' performance (i.e. number of balls faced per match and out or notout) can be collected from the scorecard of the matches. All this scorecard information is collected from the website www.espncricinfo.com from 2012 season of the IPL. A total of 1123 players are considered for the study, out of which 459 are overseas players and 664 are Indian players. The results of the analyses based on the methodology discussed above are provided below. From the above table, it has been observed that there are 103 and 162 cases are censored (i.e. notout cases) corresponding to overseas and Indian players respectively. Overall 23.6% notout case is being found for a total of 1123 cases (Table 1) (Table 2) (Table 3) (Table 4).
Player Type 
Total N 
N of Events 
Censored 

N 
Percent 

Overseas 
459 
356 
103 
22.40% 
Indian 
664 
502 
162 
24.40% 
Total 
1123 
858 
265 
23.60% 
Table 1 Descriptive information of players in IPL 2012
Player Type 
Mean 

Estimate 
Std. Error 
95% Confidence Interval 

Lower Bound 
Upper Bound 

Overseas 
20.581 
0.853 
18.908 
22.253 
Indian 
18.444 
0.673 
17.124 
19.763 
Overall 
19.356 
0.534 
18.31 
20.403 
Table 2 Group means for survival time in terms of number of balls faced
Player Type 
Median 

Estimate 
Std. Error 
95% Confidence Interval 

Lower Bound 
Upper Bound 

Overseas 
15 
1.058 
12.927 
17.073 
Indian 
15 
0.728 
13.573 
16.427 
Overall 
15 
0.626 
13.774 
16.226 
Table 3 Group medians for survival time in terms of number of balls faced
Overall Comparisons 

Log Rank (MantelCox) 
ChiSquare 
Df 
Sig. 
4.289 
1 
0.038 
Table 4 Hypothesis testing based on logrank test
From the above survival plot, one can easily be observed that there is a very little difference between standing ability of overseas and Indian batsmen on the pitch in IPL 2012. Up to 18 balls there is no difference between overseas and Indian batsmen. However, after 20 balls and up to 57 balls, it has been observed that overseas batsmen have moderately more standing ability on the cricket pitch than Indian players in IPL 2012. The following logrank test has also confirmed that there is a significant difference in terms of standing ability on the cricket pitch between overseas and Indian batsmen (as pvalue is 0.038 < 0.05) in IPL 2012. Earlier researcher such as Stefani and Clarke,^{8} Harville and Smith,^{9} Clarke and Norman,^{10} Clarke and Allsopp^{11} and Allsopp and Clarke^{12} have acknowledged that there is a significant home advantage for home team in the game of cricket. Now, both survival plot as well as logrank test, have confirmed that there is a advantage for overseas batsmen than Indian batsmen. Thus, the advantage for Indian batsmen playing in their home country or home ground is become a question mark in IPL 2012. This finding can be considered as future scope of this research. May be there was no significant home advantage in IPL 2012 for the Indian batsmen (Figure 1).
This study tries to examine the survival ability of Indian and overseas batsmen on the cricket pitch using survival analysis in Indian Premier League. The study has identified that overseas batsmen have moderately more standing ability on the cricket pitch than Indian batsmen after they faced 20 number of balls in IPL 2012. However, up to 20 number of balls, there is no difference between overseas and Indian batsmen in terms of standing ability on the cricket pitch. Since overseas batsmen have moderate advantage than Indian batsmen in IPL 2012; therefore, the home advantage for Indian players in IPL 2012 can be considered as future scope of the study. The proposed survival model can be used to forecast the survival rate of the batsmen in other format of cricket also, while the game is in progress. It can also be used to arrange the batting order of a team in the game of cricket based on the match situation.^{13‒15}
None.
Author declares that there is no conflict of interest.
©2018 Saikia, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work noncommercially.