Millions of people suffer due to volcanic gases worldwide. Health hazards in volcanic gases1 like SO2 H2S and CO2 cause fatalities from asphyxiation.2 Chronic exposure to H2S increases respiratory diseases.3,4 Natural hazards like the 14th April 2010 eruption of Eyjafjallajokull volcano, Iceland cause global public health hazards within a radius of distance from its epi-center in several directions of measurable angle due to wind.5,6 The metallic and heavy substances in the ash are trigger illness.4 Learning from such natural calamity data may not help to prevent it but surely assists to reduce health damages.7 Developing an appropriate model for the data is a starting point. Modeling such trivariate volcanic data has been a challenge to those who wish to analyze and interpret data evidence.8 A reason is that the variables are seemingly independent but are correlated otherwise, according to the data (Table 1, Figures 1 and 2). This is a conflict. Such a conflict is not unique to volcanic data analysis but also in tsunami, cyclone, earthquake, and cancer data analysis. In an analogues manner, the breast cancer research comes across a similar scenario. The malignant cells spread in an area of distance at some direction with a varying carcinogenic intensity level. An appropriate model for the collected data of a specific scenario is a necessity to interpret the data evidence. What is model? Model is an abstraction of reality. To echo the reality, the model ought to have appropriate ingredients. How should a data analyst create a model which integrates seemingly independent but rather correlated random variables with a meaningful versatility and interpretability is the aim of this article. To attain this aim, this article innovatively introduces a non-negative bonding function
with a flexing parameter
. When the flexing parameter
, the model exhibits the trivial scenario of mutual independence of the data variables as special cases. In modeling volcanic debris data, the affected distance ( ) and the direction angle ( ) of the wind are seemingly independent random variables. Assume their probability density functions (PDF) are
,
with shape and rate parameters
and
.
Day |
Wind direction in angle |
Distance (in kilo meter) where ashes are found |
Percent ashes mass more than 31
less than 63
|
14 Apr 2010 |
90 |
1 |
21 |
14 Apr 2010 |
90 |
2 |
24 |
14 Apr 2010 |
90 |
10 |
13 |
14 Apr 2010 |
90 |
10 |
17 |
15 Apr 2010 |
90 |
58 |
44 |
15 Apr 2010 |
90 |
60 |
56 |
15 Apr 2010 |
90 |
58 |
70 |
15 Apr 2010 |
90 |
56 |
65 |
16 Apr 2010 |
90 |
21 |
26 |
16 Apr 2010 |
90 |
11 |
47 |
22 Apr 2010 |
135 |
4 |
7 |
5 May 2010 |
135 |
30 |
46 |
8 May 2010 |
135 |
13 |
12 |
10 May 2010 |
135 |
13 |
12 |
13 May 2010 |
135 |
10 |
38 |
13 May 2010 |
225 |
14 |
10 |
14 May 2010 |
135 |
8 |
42 |
Average |
113.8 |
22.3 |
32.3 |
Variance |
1295.4 |
462.2 |
407.7 |
Flexing parameter |
|
Table 1 Volcanic eruption of eyjafjallajokull during 14th april - 13th may 2010.5
Figure 1 Box plots of distance and percent ashes at a given wind direction.
Figure 2 The 3-dimensional inter-relations of
.
The third variable is percent,
ash mass and it is assumed to follow independently a beta distribution,
with parameters
.
The variance of the distance is
, where the expected distance is
. The parameter
captures the proportionality of the expected amount in variance. Furthermore, the entropy “
” of the distance is minimally “
” but it increases at a rate
, where
is the well-known digamma function.The parameter
portrays the increment.
The variance of the distance is
, where the expected distance is
. The parameter
captures the proportionality of the expected amount in variance. Furthermore, the entropy “
” of the distance is minimally “
” but it increases at a rate
, where
is the well-known digamma function.The parameter
portrays the increment.
The variance of the angle is
, where the expected angle of the wind direction is
. The entropy “
” of the angle is “
”.
The variance of the percent ash is
,
where the expected ash is
.
The entropy “
” of the percent ash spread is
.
In other words,
, where
,
and
denote respectively marginal PDF of the data variables y,
and
. Shanmugam and Chattamvelli9 for derivations and statistical details about beta, gamma and uniform distributions.
On the contrary to a seeming impression that the three random variables y,
and
are independent, their data (Table 2) exhibit correlated, simply negating the assumption of their independence. Such a data based clue warrants a necessity to derive a realistic trivariate PDF for the collected data. This necessity results in an innovative and realistic model with a bonding function
in which
is recognized as a flexing parameter for the sake of versatility as it is done in this article.
This trivariate PDF (1) is new to the literature and hence, it is named flexing and bonding trivariate distribution (FBTD). The statistical properties of FBTD are done first in Section 2 and are illustrated later in Section 3. Final comments are made in Section 4.10
Variable
|
Wind direction (in angle) |
Distance (in kilo meter) where ashes are found |
Percent ashes ( mass more than 31) |
Wind direction (in angle) |
1 |
-0.35 (p value = 0.12) |
-0.44 (p value = 0.05) |
Distance (in kilo meter) where ashes are found |
-0.35 |
1 |
0.77 (p value = 0.0001) |
Percent ashes ( mass more than 31) |
-0.44 |
0.77 |
1 |
Table 2 Correlation among the three random variables
To be realistic, the data collection process is sometimes tilted unevenly in the collection of natural calamities such volcanic eruptions. The tilted sampling process is recognized as length-biased sampling with a weight factor
in statistics literature. Then, what is an appropriate weight factor in our scenario? A rationality for selecting the weight factor is the following. The area in which the volcanic debris is found is proportional to the circular circumference
with radius distance
. Such proportionality is well connected to an angle,
due to wind direction and hence, it is
.
In addition to this proportionality in the weight factor, a flexibility to condense or expand the proportionality is needed and it is done by introducing a finite and non-negative flexible parameter
so that the weight function becomes
to accompany the PDF
. Because of the third variable Y, the sampling bias weight function is expanded to
. In other words, the trivariate PDF of the percent ashes, radius distance, and angle of wind in the collected data is
(1)
It is straightforward to check out that
in (1) is a bona fide PDF, since
and
.
With no flexibility (that is,
), the FBTD (1) precipitates to a product of the three (that is, gamma, circular uniform, and beta) bona fide marginal PDFs, implying that the three data variables (percent volcanic ash Y,
affected distance and wind direction angle
) are all stochastically mutually independent (as,
). Otherwise (that is, when
), the data variables are all mutually and stochastically dependent (that is,
). The flexing parameter
helps to construct a contour mapping of similarly affected places by volcanic debris. The product moment of the FBTD (1) is
(2)
Note that, with
, the expression (2) is one as it should be. With
, the trivariate product moment,
is obtained and it is
. (3)
In the absence of flexibility or equivalently referring independence among the three random variables (that is,
), the product moment (3) breaks up to a product
of their marginal moments. The expected amount,
in (3) is at its base
when
and later increases at a rate
,
when
. We define, in this article, the trivariate product variance
as
. Using (2) and (3), we obtain that
(4.a)
The variance
in (4.a) is at its base
(4.b)
when
and it later changes when
. The predictability becomes less precise when the variance is more and vice versa.
Of interest to healthcare researchers is of course the ability to predict one among the three data variables:
based on patterns in the other two variables. This requires configuring their conditional PDFs. Suppose a healthcare researcher at a known distance from the epi-center of a volcanic with an observable wind direction
wonders about receiving an average amount of ash. For this purpose, the conditional PDF,
is needed. That is,
The expected ash amount,
starts a base value
and increases at a
depending on the wind direction and distance. The rate is greater than one, meaning that
. What does it imply? The conditional average predictive percent,
of the ashes based on known wind direction angle,
and the distance,
is more than the unconditional average predictive percent,
of ashes without knowing wind direction and location distance. Likewise, we notice that
. The implication is that the conditional average predictive percent of the ashes based on known wind direction angle,
and the distance,
is more precise (because lesser variance means more precise) than the unconditional average predictive percent of the ashes without knowing the wind direction and location distance.
Agencies responsible to protect the public healthcare often want to project the expected distance,
based on knowing the angle,
of the wind direction and the percent
of spreading ashes. This requires configuring the conditional PDF,
of the distance
from the epi-center of a volcanic with an observable wind direction
and measurable percent of the volcanic ash,
and it is,
The expected distance,
starts at a baseline
and it increases at a
which is greater than one. It means that
. What does it imply? The conditional average predicted distance,
for the ashes based on the known wind direction angle,
and the perceived percent,
of ashes is more than the unconditional average predictive distance,
of the ashes without knowing wind direction and the percent of ashes spreading. Likewise, we notice that
Implying
. The conditional average projected distance to receive ashes based on known wind direction angle,
and the percent of spreading ashes,
is less precise (because more variance means lesser precision) than the unconditional average projected distance to receive ashes without knowing wind direction angle,
and the percent of spreading ashes,
.
Proceeding likewise, having already observed a percent, y of the volcanic ashes at a known distance, d from the epi-center of the volcano, an environmental researcher could have done an educated guess of the angle of wind direction on the eruption day. For this purpose, the conditional PDF,
of the angle is needed and it is
The educated guess,
of the angle of the wind direction starts at a baseline
with an
which is greater than one, meaning that
. What does it imply? The educated conditional average guess,
of the angle based on known percent ashes,
at location distance,
is more than the unconditional average guess angle,
of wind direction without knowing percent of ashes at a location distance,
. Furthermore, we notice that
implying
. The educated average guess of the angle for wind direction based on known percent,
of ashes at distance,
is more precise than the uneducated average guess of wind direction without knowing percent ashes at location of distance d.
We now proceed to estimate the model parameters from a collected data. Consider a random sample
of size
from FDTD (1). Let
and
denote respectively their sample average and variance. The log-likelihood function is
. Then, their maximum likelihood estimators (MLE) are the simultaneous solutions of the score functions
,
,
,
and
. They yield
(5.a)
, (5.b)
, (5.c)
, (5.d)
, (5.e)
and
, (5.f)
where the initial values
,
,
,
,
are obtained from the sample averages and variances. In the next section, all derived expressions of this section are illustrated.
From (5.a), we note that the product variables to be considered are
,….,
.
With the MLE
of the flexible parameter and expressions (3) and (4.b), an approximate
confidence interval for
can be constructed and it is
(6)
where
is the standard error of the product variables.