There are plenty of statistical distributions present to model lifetime data but each distribution comes with its own limitations. In recent years, an extensive research is put on finding the perfect distribution to model data and trigonometric transformation was a way to go forward.
The trigonometric transformations provide versatility and flexibility as the parameter(s) oscillate with change in its value and the periodic function controls the way the distribution curve behaves.
The composition of various functions with different behaviour provides a better make for modelling real world phenomena which the previous established generalised statistical distributions lacked. With suitable composition of trigonometric functions and generalised distributions by considering the parameters suiting to data to be modelled, the shortcoming is minimised or better said the distribution becomes apt to model data of different nature and gives rise to even newer transformed distribution to work with. The article takes up some references for a few statistical test criterions like Goodness of fit criterion by Anderson and Darling,1 The Kolmogorov-Smirnov (KS), Cramer-von Mises (CVM) tests by Darling,2 Akaike Information Corrected Crtierion (AICc) by Konishi and Kitagawa3 all of which will be used extensively while discussing the test fit for models.
The paper reviews different distributions involving trigonometric transformation and discusses the proposed distribution by the respective authors, along with its application and a few properties as to why trigonometric transformation of statistical distribution is vital for research for data of different-different nature. The section ‘Trigonometrically transformed distributions’ gives a glimpse of statistical distributions based on different trigonometric functions. The section ‘Comparative approach to trigonometric classes of distributions’ compares various essential trigonometric transformations of statistical distributions mentioned in the second section. Finally, concluding remarks are given in the last section.
Trigonometrically transformed distributions
This section briefly discusses different trigonometric transformations of statistical distribution based on Sine, Cosine, and Tan functions.
1)Sine transformations
In statistical literature, there exist a number of statistical distributions based on Sine function. This subsection mentions them one by one.
1.1)Sine square distribution: The continuous distribution, sine square distribution was proposed by Faris and Khan.4 As the name suggests, it is based on sine function.
The probability density function (pdf) of sine square distribution is given by:
(1)
The domain of the distribution depends upon the lone parameter λ but not when λ = n/π. The parameter λ in the distribution is also called the growth parameter or the shape parameter. This distribution can be used easily to model truncated data. The value of parameter λ has an inverse relation with the growth rate of the pdf curve i.e., smaller the value of parameter higher will be the growth rate of the pdf curve and vice versa.
The mean and median of the Sine square distribution are:
So, the mean and median of the distribution depends on the growth parameter and both are increasing functions of λ. For this distribution, maximum likelihood estimator (MLE) of λ is hard to obtain, but the method of moment estimator is much easier to obtain.
SS transformation and SSE(ϴ) distribution: Kumar et al.5 introduced a new class of distribution obtained by the composition of baseline life distribution with the sine function, which is called as the SS transformation.
If f(x) be the pdf of some baseline function such that F(x) is the corresponding cumulative density function (cdf), then the cdf G(x) of new life distribution is given by:
(2)
The corresponding pdf, on differentiating the cdf G(x), obtained as
Also, the hazard rate function (hrf) h(x) corresponding to G(x) is given by:
The authors focused on SS transform of exponential distribution, and showed how it was a better fit than the other known distribution. The SSE(ϴ) denotes the SS transform of exponential distribution, and it was explored to solve the estimation problem of the bladder cancer patient’s data.
For the pdf f(x) which follows exponential distribution with parameter ϴ, then SSE(ϴ) from Equation (2) is given by:
(3)
Functions like General Entropy Loss Function (GELF), Squared Error Loss Function (SELF), with parameter ϴ are derived and showed that with appropriate substitution the GELF can be reduced to SELF. The paper had also shown through the example of sample data of Cancer patients, the SSE(ϴ) distribution provided a better fit than the previously used lifetime distributions like the Transmuted Inverse Rayleigh Distribution (TIRD) by Ahmad et al.,6 Transmuted Inverse Weibull Distribution (TIWD) by Khan and King7 Transmuted Inverse Exponential Distribution (TIED) by Ogutunde and Adejumo,8 Inverse Weibull Distribution (IWD), as the SSE(ϴ) distribution has the lowest Akaike Information criteria (AIC), Bayesian Information criteria (BIC), Kolmogorov-Smirnov (KS) and -log-likelihood (-LL) test values among the rest lifetime distributions mentioned.
On evaluating the risks of the estimator of parameter ϴ under GELF, the data had shown a clear picture that the Bayes estimator of ϴ of SSE(ϴ) can be used corresponding to the confidence level in the guessed value of ϴ respectively. The SS transform proposed by the authors was of vital use in the discovery of new distribution based on trigonometric functions.
SS transform for Lindley distribution: This transformation proposed by Kumar et al.9 explored the SS transform given by Equation 2, for the baseline distribution taken as Lindley distribution.
The Lindley distribution with parameter ϴ is given by:
where
The cdf of the SS Transform for Lindley Distribution (SSL(ϴ)) is given by:
On calculating the moments, it is showed that the distribution is positively skewed and leptokurtic.
The authors had taken up a sample of real data set and tested the proposed distribution against the likes of the SSE(ϴ) distribution, DUSLindley distribution and Lindley Distribution through various statistical tools like K-S test, AIC, and BIC test values. It was observed that SSL(ϴ) was the better fit for the data among the rest distribution as it had the least values in the test criterion. They had observed that the distribution was a good fit for data with non-increasing rate of failure.
Sine Kumaraswamy G family of distribution: Chesneau and Jamal10 introduced a new family of distribution based on trigonometric functions called as Sine Kumaraswamy G family of distribution. A special case of the Sine Kumaraswamy G family of distribution was the Sine Kumaraswamy Exponential (SKE) distribution when the baseline distribution is taken as the exponential distribution.
The cdf of Sine Kumaraswamy G family of distribution with parameter a and b (both positive) is defined by:
where F(x) is the cdf for baseline distribution.
At a =1, b =1, G(x) reduces to the SS transform.
1.4.1)Sine Kumaraswamy exponential (SKE) distribution: When F(x) is the cdf of exponential distribution (taken as baseline) then the cdf of SKE distribution is given by:
At a=1, b=1; G(x) reduces to SSE(λx) (Equation 3).
Two real life data sets were taken into test for good fit for the SKE distribution. The SKE distribution provided a better fit than Kumaraswamy Weibull distribution, Beta Weibull distribution, CS1 distribution and Exponential distribution as the SKE distribution has low test values for the statistical tests like AIC, BIC, KS, CVM, Anderson-Darling test (AD), Consistent Akaike Information Criterion(CAIC), Harley-Quinn Information Criterion (HQIC) test values among the rest distributions. For the second real data set, the SKE distribution provides a better fit than SSE distribution (Equation 3).
1.5)Sinh inverted exponential distribution: Hemeda and Abdallah11 introduced a new probability distribution based on hyperbolic sine function with baseline as exponential distribution, called as Sinh Inverted Exponential (SIE) distribution.
The cdf FSIE(x) of SIE distribution is given by:
FSIE(x) =
The authors carried out stimulation study through the MLE method and noted that with an increase in sample size the bias and MSE decreases which implied the accuracy and consistency of MLE of the parameters especially.
A real-life data set was taken into account comparing SIE distribution with Weibull distribution and Inverted Exponential distribution through statistical measures like AIC, BIC, Log likelihood and KS test values. The SIE distribution had the lower test values in the test among the rest distribution concluding that the SIE distribution is better fit.
1.6)Sine Topp-Leone G family of distributions: Al-Babtain et al.12 introduced a new distribution based on composition of Sine-G family and Topp-Leone generated family (TL-G), termed as Sine Topp-Leone G family (STLG).
The cdf of TL-G with parameter lambda is given by:
, where
,
The cdf of STLG distribution with parameter is given by:
FSTLG(x) =
The STLG family is really flexible and gives rise to more new flexible distributions.
Also, as
and as
Sine Topp Leone inverse Lomax (STLIL) distribution: One special case of STLG arises when the baseline distribution is taken as the Inverse Lomax distribution, the distribution then produced is a three-parameter distribution called Sine Topp Leone Inverse Lomax (STLIL) defined by the cdf (for x>0):
Also, as
The hazard curves are also quite flexible which can be observed by varying the values of the parameters.
Sine Topp-Leone exponentiated exponential (STLEE) distribution: When the baseline distribution is taken as the Exponentiated exponential distribution the cdf is given by
At λ=1, the FSTLEE reduces to SSE( /2) given by Equation 3.
At
, the FSTLEE reduces to Sine Exponentiated Exponential distribution (
)
Sine Topp Leone Lindley (STLELi) distribution: Here the baseline cdf is taken as the cdf of Lindley distribution, the cdf of STLELi is given by:
Through stimulation studies for STLIL and STLEE models, the authors showed that on increasing the sample size MSE and Average Length (AL) converges to zero and the ML estimate converge to the corresponding values of the parameter which establishes asymptotic efficiency of method.
Two real life data are used to compare the STLIL model for the data fit with other set of distributions. For the first data set STLIL model was compared with Sin Inverse Lomax by Souza,13 Topp Leone Weibull Lomax by Jamal et al.,14 Topp Leone Lomax by Hanif et al.,15 Weibull Lomax by Tahir et al.,16 Kumaraswamy Lomax by Shams,17 Beta Lomax by Eugene et al.,18 Gompertz Lomax by Oguntunde et al.,19 Exponential Lomax by Moniem and Hamed,20 Kumaraswamy Exponential by Cordeiro et al.,21 Marshall Olkin Exponential by Alice and Jose,22 Burr X Exponential by Oguntunde et al.,23 Beta exponential by Nadarajah and Kotz24 and Kumaraswamy Marshall Olkin Exponential by George and Thobias25 models. It was found that STLIL model was the better fit among the rest as it had the smallest values in the AIC, CAIC, HQIC test values.
For the second data set the fit of STLIL model with various other distributions like SIL, Sine Weibull, Kumaraswamy Log Logisitic by Santana et al.,26 Transmuted Complementary Weibull Geometric by Afify et al.,27 Kumaraswamy Exponentiated BurrXII by Mead and Afifiy,28 Beta exponentiated BurrXII by Mead,29 Generalised Inverse Gamma by Mead,30 Beta Fretchet by Nadarajah and Gupta,31 Exponentiated Transmuted Generalised Rayleigh by Afifiy et al.,32 Transmuted Modified Weibull by Khan and King,33 Transmuted additive Weibull by Elbatal and Aryal,34 Generalised Transmuted Weibull by Nofal et al.,35 Exponentiated Weibull by Mudholkar and Srivastava,36 Beta Extended Pareto by Mahmoudi,37 Kumaraswamy Weibull by Cordeiro et al.,38 Exponentiated Exponential by Gupta and Kundu,39 Beta Weibull by Lee et al.,40 Gamma Weibull by Saboor et al.,41 Kumaraswamy Gamma by Cordeiro et al.,42 Beta Gamma by Kong et al.43 and Gamma models. On this data also it was found that STLIL model was the best fit among the rest mentioned distributions for the same data as it had the smallest value in the statistical measures like AIC, CAIC, HQIC, CVM which showed why it was better.
So, the STL-G family provides better fit among the previously mentioned Sine G family and can be used to model in various fields.
Sine power Lomax (SPL) model: Nagarjuna et al.44 introduced a new distribution Sine Power Lomax (SPL) with composition of Sine-G family and Power Lomax distribution by Rady et al.45
The cdf of SPL distribution with the parameters α (shape parameter), β (scale parameter), λ (scale parameter) is given by:
where
For different values of parameter α, β, λ various forms of distributions can be derived. fSPL(x; α, β, λ) decreases for
and when
only a single mode exists. Hrf is increasing
and Hrf is decreasing for
and when
hrf has only a single mode, which all shows the flexibility of SPL distribution.
From simulation the authors gave a result that on increasing the sample size, the bias and SE of the MLE decreases, also the MMLE converges approximately to the parameter value. For data fitting the proposed model authors use nine real data set against models like Topp Leone Lomax by Oguntunde et al.,46 Power Lomax, Exponentiated Lomax by Cordeiro and Lemonte47 and Lomax distribution through tests like AIC, CVM, CAIC, BIC, HQIC and found the SPL distribution had the smallest test value and highest value for p value against the rest described models. So, it provides a better fit and it can be used efficiently for large data analysis.
The transformed sine G (TSG) family: Jamal et al.48 introduced a new distribution called transformed Sine G family of distributions (TSG) which serves as an extension to the Sine G family using a new polynomial-trigonometrical function along with additional parameters to provide more versatility.
The polynomial-trigonometric function is given by:
where
The cdf of TSG family is given by
where G(x) is the baseline cdf which has a parameter set
When
When
1.8.1)Transformed sine Weibull (TSW) distribution: When Weibull distribution is taken as the baseline, the cdf of TSW distribution for x>0 is given by:
As
The TSW distribution is efficient to fit heterogeneous data set as its hrf has varied shapes, the distribution has various skewness shapes and kurtosis properties.
From simulation we can come to a conclusion that as sample size increases the MLE converges to the true values of the parameters. Two real life data set are taken into account for the fit for the model to perform against four parameter models like Generalised Modified Weibull, Kumaraswamy Weibull and Beta Weibull, the three parameter models like Transmuted Weibull, Modified Weibull a two-parameter model like Sine Weibull through statistical tools like CVM, AD , KS, and AIC test values. It was observed that TSW model had the smallest test values among the models and had the largest value for p value which proved why it was a better fit than the rest mentioned distribution.
Cosine transformations
This subsection mentions different statistical distribution in literature based on Cosine function.
Cos-G class of distribution: Souza et al.49 proposed the Cos-G class of distribution which is one of the recent classes of trigonometric distribution. A particular case of Cos-G class is taken up when the baseline cdf is termed as Cosine Weibull distribution (CosW). Weibull distribution had its drawbacks and over the time modified forms of the distribution had come up.
The Cos-G class for baseline cdf G(x) is given by:
(4)
CosW distribution: When G(x) is taken as cdf of Weibull distribution then the cdf of CosW distribution is given by:
The shapes of hazard curve of CosW distribution clears the shortcoming faced by Weibull distribution as it can be increasing, non-increasing and can also be unimodal as well in CosW Distribution.
The authors took into account three real data sets, and compared the CosW model with Weibull, Gumbel, Exponentiated exponential, Weibull exponentiated model for the fitting of data through AD, CVM test values and found the CosW was quite flexible and provided a best fit among the rest mentioned models.
Cosine geometric (CGD) distribution: Chesneau et al.50 introduced a new discrete distribution with the composition of cosine function and weighted geometric distribution, called as Cosine Geometric Distribution (CGD).
The pdf of CGD with two parameters is given by:
where
From the hrf plot, they identified the flexible behaviour of CGD for different values of parameter. With an increase in the value of parameter p index of dispersion also increases but with an increase in value of index of dispersion remains overly dispersed, so the CGD is concluded overly dispersed.
From simulation the authors had concluded that estimates are consistent as on increasing the sample size MSE and Variance decreases.
Through real data study the authors had concluded CGD was better for fitting of data with other distributions following equi-dispersed behaviour like negative binomial, weighted negative binomial Lindley, Poisson distribution and geometric distribution as it had the smallest values in the AIC, HQIC, CAIC, AICc, KS and test values.
3)Sine-Cosine transformations
This subsection mentions different statistical distribution based on both Sine and Cosine functions simultaneously.
Cosine sine distribution: Chesneau et al.51 introduced a new class of distribution obtained by composition of baseline distribution with sine and cosine functions. A generalisation of SS transformation is given in this paper using sine and cosine functions which can be used to derive more new trigonometric distributions. Two transformations are introduced based on Cosine Sine transformation (CS transform) by changing the parameter set accordingly, and also a particular case is taken to derive distribution when the exponential distribution is taken as the baseline function.
The CS transform for F(x) (which is the cdf of baseline function) is given by:
(5)
3.1.1)CS1 Transformation: It is a particular case of Equation 5 when the parameters are taken such that
and the CS1 transformation is given by:
The corresponding pdf and hrf can be easily found with the help of G(x). When
, G(x) becomes the SS transform given by Equation 2.
CS1
: When F(X) is taken as the cdf of exponential distribution then the CS1E distribution arises with
. The cdf is given by:
When
and
then G(x) here becomes the SS transform with F(x) as the cdf of exponential distribution.
CS2 Transformation: It is another case of CS Transform when the parameter set is taken as
,
,
and
. Then the CS2 Transformation is given by:
3.1.2.1)CS2
: When F(x) is taken as the cdf of Exponential distribution then the CS2E distribution arises with cdf,
;
Through simulation study the authors showed that for the
as sample size increases MSE decreases whereas for CS 2
as sample size increases MSE decreases.
Four real life data sets were taken up by the authors to demonstrate the versatility of CS1E for fitting of data against distributions like TIWD, Transmuted Modified Inverse Rayleigh (TMIR) by Khan and King,52 Modified Weibull (MW) by Lai et al.,53 Generalised Linear Failure Rate (GLFR) by Sarhan and Kundu54 and SS Transform, given by Equation 2, distributions through statistical tools like AIC, CAIC, HQIC, AICc. It was found the proposed distribution was better to model data as it had the smallest test values and the highest p values among the mentioned distributions.
3.2)New exponential trigonometric distribution: Bakouch et al.55 proposed a New Exponential distribution defined with Trigonometric function (NET) consisting of exponential and trigonometric functions. It consisted of three shape parameters which provides comparatively more flexibility than the rest distributions present then. Also, the cdf F(x) depends on sinusoidal functions, the cdf F(x) of NET Distribution is given by:
where
The NET Distribution belongs to the family of weighted exponential distribution.
The values of the parameters
also determine the generation of oscillation of varied types and magnitude. When
, the NET distribution reduces to Exponential Distribution. Also, the plotting of graphs denoted that
is directly proportional to the increase in number of peaks i.e., on increasing
the number of peaks also increases and on decreasing
the number of peaks also decreases, which is vital characteristic for the construction of model using NET Distribution. An algorithm was presented by the authors to provide the general random variables from the NET model. The authors used a real-data set to denote why NET Distribution is a better fit for model than exponential distribution, Weibull distribution, Gamma distribution and Exponentiated exponential distribution using the statistical tools AIC, BIC test values.
TransSC distribution: Chesneau and Jamal56 introduced a new family of polynomial–exponential–trigonometric distribution with three parameters, called as Transformation of distribution using Sine and Cosine function (TransSC(F;a,b,c)). The distribution has variety of functions like polynomials, exponential and trigonometric functions. A particular case of the proposed distribution to model data was also introduced where the baseline distribution is taken as Weibull distribution.
The cdf G(x) of the proposed distribution for baseline cdf F(x) is given by:
where
On changing the values of parameters a, b, c; TransSC(F;a,b,c) behaviour also changes and helps derive different established distributions.
TSCW(α,β,b) distribution: It is a special case of TransSC(F;a,b,c) where a=0, c=0, and the baseline cdf F(x) is taken as Weibull distribution. On substitution cdf of TSCW(α,β,b) is derived and on differentiating it the pdf could also be derived.
They considered three real life data sets and showed why TSCW distribution was more flexible and better fit to model. Through statistical measures like AIC, CVM, AD and KS, it was observed TSCW distribution had the least test values among the other distributions like CS1E distribution, Transmuted Weibull distribution, SSE distribution and Odd Burr Weibull distribution.
Tan transformations
This subsection mentions different statistical distribution in literature based on Tan function.
Tan G class: Souza et al.57 proposed a new class of trigonometric distribution called Tan G class, as the name suggests it is based on the trigonometric tangent function. The paper had also introduced a member of Tan G class termed as Tan BXII model where baseline is Burr XII distribution. The cdf of Tan G class for baseline G(x) is given by:
(6)
Tan-BXII distribution: When the baseline distribution is with cdf given by
where
, the Tan-BXII distribution is obtained wit cdf for x>0 given by :
The Tan-BXII distribution is proved to be flexible as it showcases wide varieties of skewness and kurtosis for different values of parameter set. Using simulation, authors observed that on increasing the sample size, the biases, MSE, MLEs decrease, and it could also be noted that the MLEs overestimate the parameters of the distribution.
The authors took a real-life data set to test the fit of data for the model against models like Kumaraswamy BXII, Burr XII and Kumaraswamy Weibull distribution through AIC, CAIC, BIC, CVM and AD statistics and concluded that Tan-BXII was the best among the models as it had the smallest test values.
Type II Tan G class: Chesneau and Artault58 introduced a new trigonometric class of distribution based on tangent function called as type II Tan G class (TIIT-G). The cdf of TIIT-G class with baseline cdf G(x) is given by:
(7)
When
When
Both Tan-G class and TIIT-G class are interrelated, and flexible. If S(x) is the cdf of Tan-G class and F(x) is the cdf of TIIT-G class then
Type II Tan Weibull (TIITW) Distribution
If G(x) is the baseline cdf of Weibull distribution, the cdf F(x) of TIITW is given by:
TIITW distribution proved to be flexible as the pdf and hrf’s plots had varieties of shapes.
Comparative approach to trigonometric classes of distributions
This section aims to review an article which compare trigonometric classes of distributions like Sin-G (Equation 2), Cos-G (Equation 4), Tan-G (Equation 6) and TIIT-G (Equation 7). Chesneau and Artault12 conducted a comparison study to find which one was the most suitable one for test of fit through a common baseline taken as Weibull distribution on a series of twelve real life data set conducted by using various statistical tools like AD, CVM, AIC, CAIC, BIC, HQIC, KS and KS with p value , the authors compared the classes, of which nine data set showed Cos-W and TIITW distributions were of suitable fit, and the rest three data set showed Sin-W and Tan-W were suitable. So, a conclusion was drawn that if a data is symmetric then there is no predominant model but when the data is positively skewed then Cos-W or TIITW model were suitable whereas for a data which is negatively skewed Sin-W or Tan-W model is suitable