Interest in //Web //Win //Mobile //Database & //Life

Time Series Analysis Baby Steps Using R

This tutorial consists of my baby steps towards time series analysis. I will provide lots of supporting links, along with this tutorial, that helped me a lot to get a grip on this topic.

First thing first, What is “Time Series”?

Sequence of measurements/observations of the same variable over time.

Examples of Time Series can be:

  • Daily closing price of a particular stock for last 1 year
  • Yearly number of computer science graduates for the last 5 years

All right, then, what is “Time Series Analysis”?

The aims of time series analysis are to describe and summarize time series data, fit low-dimensional models, and make forecasts.

In simple words, what I understood, we have to understand the nature of time series data then fit it into a model to forecast.

Lets get our hands dirty and do some fun. The data set I am going to use is the “Nile River Flow” data set which is available in this link.

Open you R-Studio (as I am going to use R-Studio), create a new R Script and put the following code to read the “Nile River Flow” data set.

in the console you are going to see the following values:

X time Nile
1 1871 1120
2 1872 1160
3 1873 963
4 1874 1210
5 1875 1160
6 1876 1160

The first is the serial, second is the time(year for this case) and last column represents the value of the river flow on that year.

Now, lets convert this data to time series object. You can check the details for ts function from this link.

output will look like the following:

NileTimeSeriesObject

Prior to building a model we should check whether the data is stationary or not as we can not build model for time series until and unless it is stationary. If the data is not stationary, first we should transform it to stationary data using Differencing, Detrending etc.

So, A series xt is said to be stationary if it satisfies the following properties:

  • The mean E(xt) is the same for all t.
  • The variance of xt is the same for all t.
  • The covariance (and also correlation) between xt and xt-h is the same for all t.

Please check the details about stationary and non-stationary data from this link and how to calculate mean, variance and standard deviation from this link.

We can identify this by simple looking at the timeseries plot above figure or with help of the “Augmented Dickey-Fuller Test” using the following code to check whether the series is stationary enough or time series modeling or not.

Though it shows our data is stationary but if we try to fit in a line we will notice the mean is not constant over time. The lm() function used to fit linear model.

TimeSeriesNotSeasonal

We can also use Auto Correlation Function (ACF) to determine the stationarity of our data set.This is a widely used tool in time series analysis to determine stationarity and seasonality.

This line of code will give us output some thing like this, please note that the ACF should show exponential decay for stationary series data

ACF

we can use differencing to make out series more stationary with the following code. I found this link useful about the diff function.

DiffTimeSeries

ACF_After_Diff

Now it looks good, what you say? check details about stationarity and differencing from this link.

As all significant correlation being removed after lag 1 it is time to fit a time series model to the data using Autoregressive Integrated Moving Average(ARIMA) model.

Why ARIMA? Why not Exponential Smoothing?

The best answer I got:

While exponential smoothing methods are useful for forecasting, they make no assumptions about the correlations between successive values of the time series. We can sometimes make better models by utilizing these correlations in the data, using Autoregressive Integrated Moving Average (ARIMA) models.

The arima function takes in 3 parameters (p,d,q), which correspond to the Auto-Regressive order, degree of differencing, and Moving-Average order.

To select the best ARIMA model  we will inspect the Auto Correlation Function(ACF) and Partial Auto Correlation Function (PACF).

ACF_After_Diff

 

pacfafterdiff

The ACF/PACF plot give us suggestions on what degree of parameters to utilize.

Identification of an AR model is often best done with the PACF. 

Identification of an MA model is often best done with the ACF rather than the PACF.

The two most basic rules are:

  • If the ACF is a sharp cutoff the q component is equal to the last significant lag. This type of model also often has a PACF with a tapering pattern.
  • If the PACF has a sharp cutoff, the p component is equal to the last number of significant lags. Again, the ACF may exhibit a tapering pattern.

Now we can test the following two models first based on the basic rules mentioned earlier

arima(NileTimeSeriesObject, order=c(0, 1, 2))
arima(NileTimeSeriesObject, order=c(7, 1, 0))

The Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic. When comparing two models, the one with the lower AIC is generally “better”.

The first model arima(NileTimeSeriesObject, order=c(0, 1, 2)) gives AIC value 1265.75 and the second model arima(NileTimeSeriesObject, order=c(7, 1, 0)) gives AIC value 1285.87.

So, we can say that arima(NileTimeSeriesObject, order=c(0, 1, 2)) is the best fitted model. We can test our findings  using the auto.arima() function.

auto.arima() function gives AIC value 1267.25 so it is evident that arima(NileTimeSeriesObject, order=c(0, 1, 2)) is the best fitted model for our data set.

If we plot the residuals we will see all the spikes are now within the significance limits, and so the residuals appear to be white noise.

To know more about Residual Diagnostics please check this link and to understand White Noise please check this link.

residuals

A Ljung-Box test also shows that the residuals have no remaining autocorrelations.

As we have found our best fitted model for our data set, lets forecast for next 5 years and plot the result using the following command.

I found another R package astsa gives a better view for arima modeling. The following commands can be used for arima modeling and forecasting.

astsa_acf_pacf

astsa_arima

sarimaforcast

To conclude, the flow chart for Time Series Analysis is like the following (taken from http://www.analyticsvidhya.com).

flowchart

And the complete version of my R Scripts are as follows:

 

Arima ModelingTime Series Analysis

Md Shiefuzzaman • September 1, 2016


Previous Post

Shares