Mini Review Volume 8 Issue 1
Department of Statistics, Wachemo University, Ethiopia
Correspondence: Getachew Tekle, MSc. in Biostatistics, Department of Statistics, Wachemo University, Ethiopia
Received: October 22, 2018 | Published: January 9, 2019
Citation: Tekle G. Application of GLM (logistic regression) on serological data of malaria infection. Biom Biostat Int J. 2019;8(1):1-4. DOI: 10.15406/bbij.2019.08.00261
As a nation reduces the burden of falciparum malaria, identifying areas of transmission becomes increasingly difficult. Over the past decade, the field of utilizing malaria serological assays to measure exposure has grown rapidly, and a variety of serological methods for data acquisition and analysis of human IgG against falciparum antigens are available.1
The main Objective of this case study is to model the probability of infection as a function of age (the prevalence of malaria infection).
Variables
The predictor variable (age) is continuous and the dependent variable serology/ disease status is binary (where, sero-positive or sero-negative).
Data: Serological data of malaria
Serology is the scientific study of plasma serum and other bodily fluids. In practice, the term usually refers to the diagnostic identification of antibodies in the serum. Serological tests may be performed for diagnostic purposes when an infection is suspected, in rheumatic illnesses, and in many other situations, such as checking an individual's blood type.2
Antibodies produced in response to an infectious disease like malaria remain in the body after the individual has recovered from the disease. A serological test detects the presence or absence of such antibodies. An individual with such antibodies is termed sero-positive.
A sample which has taken at a certain time point, the information for each individual:
Binary data may occur in two forms:
Ungrouped in which the variable can take one of two values, say success/failure. Grouped in which the variable is the number of successes in a given number of trials.
(1)
Generalized linear models (GLM)
Generalized linear models (GLM) are used to fit fixed effect models to certain types of data that are not normally distributed. Generalized–not limited to normally distributed data. Linear–models use a linear combination of variables to "predict" the response. Exponential family of Binomial distribution, Dobson.3
The link function
(2.2)
Components of GLM
Random component
then which will also be , whereTo show the sum of Bernollis is binomially distributed, and
(2.3)
Number of sero-positive at each age group ni: sample size at each age group
Pi is the probability to be infected (the prevalence). We use logistic regression in order to model the prevalence as a function of age.
(2.4)
Binomial link functions
mean of the response with logit link
Analysis of designed matrices
Define a (design) matrix X so that for response variable Where is a vector of parameters and X is a design matrix of predictors.
Whereis a vector of parameters and X is a design matrix of predictors.
The most commonly known model selection criteria are Akaike Information Criterion (AIC) (Sakamoto, 1986), and Log-likelihood were used.
Where, -2 log L is twice the negative log-likelihood value for the model
P: - is the number of estimated parameters.
Smallest value of AIC, best is the model.
Exploratory analysis of data
The above plot indicates the prevalence of malaria infection will be increased with age, as age increases the probability of infection will increases. Thus, there is almost a linear relationship among the probability of malaria infection and age (Figure 1). The line indicates the fitted proportion of infection linearly as given below:
(3.1)
Model Diagnosis
As the above plot describes, there is a pattern the residuals fit and the residuals are not constant through fitted values; the variation among the predicted probability of infection is not the same. Thus, it indicates some assumption/constant variance of the model has not been satisfied (Figure 2).
The above normal plot shows that the normality assumption has been satisfied (Figure 3).
Models with different link functions
Model with logit link
Deviance Residuals:
Complementary log log or (c-log-log) link:
Deviance Residuals:
Model with log link:
Model with Identity link:
Models Comparison
Selection of terms for deletion or inclusion is based on Akaike's information criterion (AIC). In R, the function “extractAIC(model) will give AIC (Table 1). According to the AIC criteria and Likelihood, the model with log link function will be chosen as a good model; though its mean estimate is the second smallest next to identity, its AIC and Likelihood are the smallest of all. Hence, the chosen model with the log link function should be given as follows:
Model |
Estimate |
Likelihood |
No. parameters |
AIC |
Logit |
0.044672 |
-31.1941 |
2 |
66.388 |
Logit |
0.034705 |
-29.9179 |
2 |
63.836 |
Identity |
0.006354 |
-33.3445 |
2 |
70.689 |
C-log-log |
0.039671 |
.30.59063 |
2 |
65.181 |
Table 1 Model comparison
, which indicates that for a unit increase in age since at infection, the proportion of developing the antibiotics will increase by 0.0347(3.5%).
The odds ratio: point estimator
How to calculate the odds ratio? For continuous predictor the odds ratio is given by The meaning of a logistic regression coefficient is not as straightforward as that of a linear regression coefficient. While B is convenient for testing the usefulness of predictors, exp (B) is easier to interpret. Exp (B) represents the ratio-change in the odds of the event of interest for a one-unit change in the predictor. Exp (0.0347) =1.0353, in this case the odds for malaria infection in sero-positive people is 0.035(3.5%) times the odds for malaria infection in sero-negative people.5Serological data is explored and analyzed as is shown above. From the summary part it is indicated that in all models fitting, the p-value is very small and the predictor variable age is significant for the prediction of the prevalence of malaria. Comparison of the four models indicated that the model with log link function is chosen as the best model based on AIC criteria, in which case the predicted value of model coefficient is 0.0347, which indicates for a unit increase in mid age the proportion of malaria infection will increase by 0.0347.
None.
Author declares that there is no conflict of interest.
©2019 Tekle. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7