Research Article Volume 3 Issue 2
Università degli Studi di Roma La Sapienza, Rome, Italy
Correspondence: Guido Travaglini, Università degli Studi di Roma La Sapienza, Rome, Via Attilio Friggeri, 94/8, Italy, Tel +393809061265
Received: December 20, 2017 | Published: March 12, 2019
Citation: Travaglini G. Five snippets of signal analysis. Phys Astron Int J. 2019;3(2):60-64. DOI: 10.15406/paij.2019.03.00158
In this article, we introduce five statistical procedures in the field of Signal Analysis. They feature as practitioner’s exercises, mere snippets in fact, addressed at supplementing renowned statistical tests and select procedures pertaining to the received literature. In such guise, the procedures range from simple explanatory devices to novel algorithmic routines. The first procedure introduces and develops regime-switch testing applied to linear and nonlinear signals, whereas the second procedure provides a novel approach to efficient and automatic time-series smoothing. The third and the fourth procedures consider Bayesian analysis for the purpose of back/forecasting a vector dataset through a given time domain. The fifth procedure represents a novel method for computing the Augmented Dickey-Fuller stationarity test in a context of instability of both trend and subsample variances, which frequently characterize nonlinear signals.
Keywords: bayesian models, regime switching, stationarity, smoothing techniques, JEL classes: C1, C4
Signal Analysis currently features tests and procedures addressed at verifying certain hypotheses, such as linearity, stationarity, changes of regime, and the kind of information utilized in back/forecasting time series, possibly by calibration. In such context, methodological updates are required, especially when it comes to dealing with series characterized by high irregularities or when unobservable are involved as in the case of Kalman filtering.1,2 Many Regime-Switching techniques, for instance, are affected by strict requirements such as stationarity and single-point shift3,4 which require amelioration or updating. Similarly, time-varying parameters utilized for back/forecasting purposes are better computed in a Bayesian context, in order to exploit prior pieces of evidence based on their known distribution. Finally, current stationarity tests5,6 in a nonlinear setting with dramatic changes in trend and subsample variances may require substantial corrections. Section 1 is devoted to defining random signals, whereas Section 2 tackles the issue of Regime-Switch testing. Section 3 produces a novel technique in the field of signal smoothing. The following two sections illustrate the steps for implementing Bayesian Monte Carlo simulation and calibration, whereas the final section is addressed at supplying corrected stationarity tests. Section 7 concludes.
Definition of signal
All of the aforementioned procedures share the assumption of a signal represented by a real-valued random
sequence which, for the discrete time notation t∈[1,T] , T<∞, may be expressed as follows
{Xt}Tt=1∼ I.I.D.(ˉX,σ2X) (1)
With unknown distribution, mean ˉX=T−1T∑t=1Xt and finite variance σ2x<∞ . More specifically, Equation (1) may be decomposed Cramér7 in the following fashion
Xt=X*t+St+εt (2)
Where X*t=E(Xt|Ωt−1) and Ωt−1={Xt−j}Jj is the information set available at t−j ,j∈[1,J] , J≤T is a maximum preselect lag, and εt~I.I.D.(0,σ2ε) . The first term X*t of Equation (2) is the slow-mode component of the signal, either a linear trend or a smoother, or also a combination of the two. The second component St is the periodical or quasi-periodical seasonal cycle, and the last component is a random Gaussian fast mode. Obviously, independent of St , if the linear and/or the smoothed trends are not time-dependent, the first component zeroes out in mean and the process described by Equation (1) collapses to εt. All components of Xt are unobservable and must be retrieved from the raw dataset by means of appropriate computational methods. In particular the slow-mode term, if not solely represented by a linear trend, requires applying an efficient (i.e. minimum-variance) smoothing procedure. In addition, if St is a high- or low-frequency phenomenon, it may be entrenched into either εt or X*t and thus difficult to disentangle from either component. The signal so characterized constitutes a Monte Carlo Markov Chain8 where each observation of the signal Xt may be unconditional or conditional upon a known distribution, abridged from prior information, or upon a transition probability πt∈[0,1].9,10
Sequential and cluster-based regime switching
Regime switching is intended as a single or multiple change of pattern of the signal in terms of mean, variance, and autocorrelation structure.9,10 It is often associated to rare events such as disasters with permanent effects like earthquake aftermaths or financial crises.11 Changes associated to regime switching are generally abrupt and of sizable magnitude, and may be tested for their statistical significance by means of some preselect critical-value standard.12 Regime switching is closely associated to structural-break testing.13 However, it is more pervasive by attempting to establish not only the date and size of breaks, but also the moment effects over the signal of interest. Moreover, regime switching is deeply involved with threshold behavior, which analyzes the event of a variable taking an extreme value at some time of its evolution. In practice, the parameterization of the signal after each break is tested for in a context of a Markov switching model. Many tests exist for the detection of regime switches,2 whereby the different states of the signal may be identified within a given degree of likelihood. Among these tests, two stand out for their usefulness and simplicity:
The first kind of test in case of linearity (nonlinearity) is modeled on a local-to-global basis by comparing the variance (the tied-rank score sum) of subsamples to the variance (the mean tied-rank score sum) of the entire sample. Let {xm,τ}Ττ=1 be the m.th subsample of Equation (1), for m∈[1,M], 1≤M≪T and M the maximum number of subsamples, and alsoτ∈[1,Τ], Τ<T. The subsample length is established arbitrarily, usually on a peak-to-peak basis. The F-test version for establishing the number of regime switches is given by the following statistic
Fm=Var(xm,τ)Var(Xt), ∀m∈M (3)
Where Var(.) represents the variance of the braced argument. From Equation (3), the M-sized string of F statistics and associated p-values may be obtained. A regime switch, namely a subsample variance outlier with respect to the sample variance, is said to significantly exist if its m.th p-value falls short of some preselect critical value, usually 5%.
In case of a nonlinear signal, the test statistic corresponding to Fm is represented by the two-sample Ansari-Bradley test for equal sample tied-rank scores, also defined as quasi-variances.15 The Ansari-Bradley test requires several steps departing from the padded signal vector Zθ={xm,τ;Xt}; θ=τ+t∈Θ; Θ=Τ+T; ∀m∈M , and from the associated vector Rθ of the locations of the values in Zθ ranked in increasing order. Next, the locations of vector Rθ are parted into two group vectors depending on their origin, namely, their belonging to the subsample or to the entire sample. Formally, the concept may be expressed as R1=Rθ|θ=τ; R2=Rθ|θ=t . Subsequently, the tied-rank vector Uθ from each end of Zθ is computed, so that the smallest and largest values of Uθ get rank 1, the next smallest and largest get rank 2, etc. Finally, the Ansari-Bradley test statistic15 is shown to be computed as follows
Wm=Τ∑τ=1Uτ|R1 (4)
Which is the sum of the tied ranks associated to the m.th subsample? From Equation (4), the test statistic W*m may be obtained by subtraction of the mean tied-rank score sum together with the corresponding p-value. The statistic is asymptotic normal under the null hypothesis that two stochastic processes come from the same distribution, against the alternative that they come from distributions sharing the same median but different variances. In the present context, it may also be loosely viewed as a test for heteroskedasticity.16 The QLR test, in its simplest form, checks the null hypothesis of one versus two regime switches, and is built up from forming two non-sequential clusters each characterized, possibly, by different distributions. Thereafter, the QLR is computed upon the two samples, with the proviso that these produce a Gaussian mixture where the standard Likelihood-Ratio testing cannot be implemented. For this purpose, the relevant literature provides appropriate critical values.12 Because of its stricture in admitting a maximum of two regime switches and of its computational complexity, the QLR test receives here no applied treatment.
Spectral representation theorem for signal decomposition and optimal smoothing
A large amount of signal smoothers (i.e. smoothed trends) is nowadays available for applied work in many different scientific fields. Of these, the most notable are the Savitzky-Golay, the Daubechies, the Hodrick-Prescott filters and the Hilbert Huang Transform.17–23 All of these methods, however, share serious problems when it comes to data resolution. In fact, they all require prior input of specific parameters, namely, the polynomial degree for the first two methods, the window width for the first and the decomposition level for the second method. In addition, it requires a preselected smoothing coefficient for the third method and, finally, a ‘sifting’ grid search for the last. For details of such inputs, the reader is redirected to the respective reference listed in this paper. Suffice here to warn that arbitrary, and thus casual, choice of any of these parameters is conducive to the risk of over/under fitting of the slow mode X*t with respect to the original signal Xt of Equation (1). The immediate consequence is inefficient estimation in terms of suboptimal denoising, namely and loosely stated, a bang-bang signal extraction yielding either a too flat smoother that is statistically inconsistent with the original peak/trough pattern or that too closely replicates the original signal itself. We propose here a fast, automatic, and efficient smoothing procedure that applies the Spectral Representation Theorem to a signal X1 of whichever nature: stationary or not, linear or nonlinear. The signal may be approximated by means of the following periodic function
⌢Xt,k=μ+J∑j=1[ϕksin(ωk⋅(t−1))+φkcos(ωk⋅(t−1))] (5)
where ˆXt,k is a fitted sequence of the original signal, μ is the estimated mean of the fitted signal, k∈[1,K] , 2≤K≪T is a maximum lag integer, {ϕk}Kk=1, {φk}Kk=1 are real-valued random coefficient sequences, both I.I.D. with finite mean and variance. Finally, {ωk}Kk=1 is a frequency sequence such that
ωk={T−12πk if ΔXt=f(Xt−1), Limk→0.5T(ωk=π)T−1πk if ΔXt= f(X3t−1), Limk→T(ωk=π) (6)
Where Δ is a first-difference operator, f(.) is a functional form, ΔXt=f(.) expresses the existence of linearity and nonlinearity of the process as evidenced by the value of the exponent attached to the lagged signal. The fitted signal of Equation (5), which is the smoother estimable by Ordinary Least Squares (OLS), may be defined as the Spectral Representation Transform of the original signal with time-varying amplitude (Hamilton, 1994). If the signal is affected by more than one Regime Switches and we let et,k=Xt−ˆXt where et,k∼I.I.D.(0,σ2e,k) , the Central Limit Theorem (CLT) is empirically found not to hold asymptotically as k→T . In fact, while for k=1 the smoother ˆXt,k converges towards the linear or nonlinear trend of Xt , the CLT holds only for a limited range of K∈[2,αT) , for α∈(0.02,0.04) . In other words, for all lags k≥2 , e2t.k tapers off, but then widens up again as K gets larger. Hence L whereas L for 1>β>α , where the values of α are obtained by means of 1,000 Monte Carlo random normal and first-order autoregressive simulations of Equation (5). If the signal is not affected by Regime Switches, then the CLT applies and we have limk→T(⌢Xt,k=Xt) . The optimal smoother is therefore defined as
ˆXt,K*=ˆXt,k|min(Pk); ∀k∈ (2≤K)<0.04T (7)
For the optimal lag K* of Equation (5) and the performance index closely akin to the Percent Root Mean Square Error (PRMSE), defined as
Pk=(‖et,k‖‖Xt‖) (8)
Where ||.|| is the Euclidean norm of its argument.
Bayesian monte carlo simulation with bootstrapping for back/forecasting
Bayesian Monte Carlo simulation with bootstrapping is one of the two signal back/forecasting procedures herein proposed. The other procedure is shown in Section 6. Whereas the latter is a full-blown Kalman filter with time-varying parameters extracted from Gibbs sampling,2 this procedure is based on bootstrapping known dates and values of the signal to produce a feasible MCMC which represents the back/forecasted signal along select P<∞ periods. This procedure is thus a game of chance which involves all of the elements of Equation (2) such that we may proceed to back/forecasting a signal by means of two different techniques:
While the first technique works well only under the assumption of fully or fairly regular cycles in amplitude and length, and is likely to be affected by substantial uncertainties,26–28 the second technique preserves the stochastics of the original signal and is by consequence more trustworthy from the distributional viewpoint. There follows that Bayesian prior values drawn from the known distribution of the original signal can be utilized along the entire signal sample so as to forestall predictive uncertainties.
Implementation of this preferred technique requires three consecutive steps:
The latter is the lower-date trough of the subsequent cycle, and so on. Each triplet encapsulates a specific cycle of given length. Bootstrap N times the rows of matrix F so as to obtain a matrix G(P,D) of the reshuffled values of the signal Xt . For each row of this matrix such values might be further simulated up to D draws to produce Bayesian priors from the distribution of Xt and generate a new matrix G with the posteriors. From this matrix find the series which mostly approximates the original signal Xt in terms of mean and/or variance. This is the optimal prediction signal ˆX**p , for p∈[1,P] , commonly defined as the “mode” since it is the closest to the actual data. This procedure is quick and easy to implement as opposed to the lengthier procedure featured in the next section, which needs a larger, often cumbersome informational set provided by the proxy instrument (s) utilized.
Bayesian calibration for back/forecasting
Calibration is a technique for back/forecasting the signal Xt by means of a matrix of proxy instruments Zτ:(T,n) , where τ∈T>T and n≥1 respectively are the sample length and size. It assumes a statistically significant relationship among the two datasets for the overlapping sample length T such that back/forecasting P periods ahead (behind) by means of Zp is expected to produce a mode ˆX**p . The relationship is customarily expressed in terms of a fixed-time parameter set B0:(n,1) usually estimated by means of OLS.29 Bayesian calibration in the present context is an extension of the above model as it contemplates time-variable parameters connecting the two datasets and letting the prediction process be estimated by a dynamic state-space (SS) model whose upshot is a MCMC process characterized by ergodicity.2 This model is represented by two equations whose workings in a Bayesian framework are better detailed elsewhere30
Bp+1=Bp|Ωp+epXp=CpBp+vp (9)
Where, for p given as above (Sect. 1.3), Bp and Xp respectively are the state(s) and the observable(s), whereas ep and vp are orthogonal I.I.D. zero-mean disturbances. More precisely, Bp:(P,n) is the time-varying parameter set whose first input is given by the Bayesian prior B0 conditional upon a normal distribution. The variable Cp is a slope parameter set connecting the contemporary state(s) and observable(s) and has the same size as Bp .
The real-time information matrix Ωp={Πp,Qp,Rp} includes the covariances Πp=BpBp', Qp=ep'ep, Rp=vp'vp , all of which positive definite and of size (n,n) except the last which is a scalar. The first component of Ωp is the Riccati matrix of the state covariance, which is distributed as an Inverse Wishart, whereas the other two components respectively are the state and the observable error covariances, distributed as Inverse Gamma. Matrix Ωp may be defined as the prior information set for predicting Bp+1 and its estimation errors ep . Once this value is computed conditional on the covariances and on their distributional properties, the ensuing posterior Ωp+1 is utilized for computing Bp+2 in the same fashion, and so on along the process. This prior-posterior-prior sequence of the information set is at the heart of the conditional adaptive Kalman Filter for which the conditional parameters of Equation (9) at any stage are optimally estimated and ends as soon as P is hit.1 Needless to say, optimality refers in the present context to the valuable information set Ωp delivered stage-wise onto the parameters. More explicitly, let the Kalman Gain be
Hp=(CpΠpC'p+Rp)−1 (10)
Such that the Riccati matrix be represented overtime as the following nonlinear and recursive process
Πp=Πp−1+Qp−Π'pC'pHpCpΠp (11)
Where by the sequential MCMC parameter series may obtain as follows
Bp+1=Bp+ΠpCpHpν'p (12)
Needless to say, all of the elements in Equations (10) to (12) enter the real-time information matrix Ωp . Optionally, in order to gain further information by letting Bp and Bp+1 be conditioned directly by the observable Xp , the covariance matrix Qp in Equation (11) may be replaced by the variance of the observable itself.
Proposed method for correction of stationarity test statistics
In this last Section a novel method is proposed for correcting the conventional stationarity test statistic in a setting of non-flat linear trends, and non-constant smoothers and cycles. Among such tests, the most notorious applicable to linear signals is the t-type Augmented Dickey-Fuller (ADF) test5,6 which embeds an option for the intercept and/or for linear trend. Similarly, for nonlinear signals the Kapeitanos-Shin-Snell (KSS) t-type test is commonly applied with similar options.31,32 The critical asymptotic values with preselected p-values of the test statistics are tabulated in the respective references. The null hypothesis implied by both test statistics involves first differencing of the signal, which in Econometrics is dealt with Autoregressive Conditional Heteroskedasticity (ARCH) testing in case of subsample changing variances.33,34 We endeavor in the present context to follow a richer avenue, given the plethora of ad hoc tests and statistics available for deeply scrutinizing the performance of the signal in terms of changing trends, means and/or variances, and cyclical shapes. Among these we test, in a two-sample setting, for major changes that might have occurred within. Specifically, we test from Equation (2) the evolution of the slow-mode term X*t , which includes a linear trend ˜Xt estimable by OLS, and the smoother ˆXt,K* of Equation (7). We also test for equal variances by means of the F- or Ansari-Bradley test of Equations (3) and (4) respectively. We finally add a test for different subsample shape in terms of cyclical amplitude (e.g. Schwabe cycle).
Let the following: Ψ1=˜XT/˜X1 , Ψ2=ˆXT,K*/ˆX1,K* , Ψ3=FFT{Xi}Ni=1/FFT{Xj}Tj=N+1 , where FFT stands for the Fast-Fourier Transform of the associated signal and N is the length of the first subsample. Let also Ψ4 be the p-value of Fk or of the W*k test statistic for k=1 , as from Equation (3) or (4). Under the null of constant trends, variances, and cyclical amplitudes all over the entire sample we have E(Ψj)=1; ∀ j∈[1,4] , where E(.) the expectation operator of the braced argument is. Under the alternative we have ∞>Ψj>0; ∀ j∈[1,4] . Under either or all the nulls, we have, for TS=ADF or KSS of the raw dataset, E(TS−4∑j=1E(Ψj)−4)=TS , such that the corrected stationarity test statistic is
TS*=−(|TS|−|4∑j=1|Ψj−1||) (13)
Where the computed TS are penalized whenever one or more of the coefficients Ψj exceeds or falls short of unity. Penalization consists of the reduction in the absolute value of the computed TS, possibly below the associated critical value or even turning into a positive value. In practice, the farther away are the coefficients Ψj from their expected values, the greater the risk of nonstationarity in spite of stationarity originally detected by the conventional test statistics. Incidentally, the asymptotic 10%, 5%, and 1% critical values of the computed TS are -2.22, -2.93, and -3.40.35–37 Incidentally, such values are close to some simulated experiments over 10,000 replications of the nonlinear version of Equation (2).
Five statistical practitioner’s procedures in support of signal analysis have been introduced and discussed. Such snippets were shown to be useful for better and efficient implementation of the corresponding procedures and of some select hypothesis tests existing in the received literature. Of notable consideration are the optimal smoothing and the corrected ADF test procedures, both characterized by significant novelty in their implementation techniques. The Bayesian stepwise procedures for Monte Carlo simulation and calibration are also of interest because of their ability to perform back/forecasting of signals over long periods of time, if necessary. Of the two, the former may be preferred because quick and easy to implement. The first and the last snippets were found to be particularly useful in a highly erratic environment regarding the signal. The latter, in particular, is worth of notice because of its capability to reverse the results of the traditional stationarity test statistics.
None.
The author declares there is no conflict of interest.
©2019 Travaglini. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.