I believe it is possible to use statistic models on web traffic data and use it to forecast the traffic that a website will have. I will first outline the basic model. Then I will comment on certain aspects that you may want to change depending on the specific website. Lastly, I will make some general remarks about the model itself.
The Basic Model
Google Analytics is a wealth of data. As with any large data set, it would seem reasonable that you can use it to predict the future. The basic premise of this concept will involve running a regression analysis of a historical data set, with time being the x variable, a number of sessions being the y variable (could potentially modify to use %change as well) to determine a linear equation of the expected traffic. This will allow the user to input a forthcoming timeframe to get a sense of what the traffic should be. You can also use the confidence interval outputs to create a range of where the traffic should be within.
Changes to Implement
Anyone that uses Google Analytics can tell you that there is a large variance of daily traffic. I would recommend using weekly data at the minimum. This will make your results much more statistically significant. I would recommend some data smoothing. Certain times of the year just naturally result in less traffic. For example, over Christmas, many sites will have a drop off in traffic because people are spending time with family and are not working. If there is a reasonable explanation for why your traffic is down for a particular week then I would recommend removing. This will lead to better predictive results. The other thing to keep in mind is seasonality. Generally, every business will go through busy and slow times during a year. You should account for this seasonality in your model or at certain points, you will think you are doing better than you would otherwise think you should be and vice versa. There are a number of ways to get around this. If you know the seasonality then you can create a model using data just from the seasonal times. You could also incorporate a moving average. Creating seasonal models would be best in terms of accuracy but a moving average would be simpler. If you are going with the moving average note that it will not be perfect in terms of accuracy (especially when the seasonality downturn starts and ends) but still more accurate than nothing at all.
- Generally, you should try to have a minimum of 100 data points
- You should update your models to include the most recent data as possible
- You shouldn’t create one model and keep using it for 100 weeks out
- The less time you have between from when you create your model to when you predict a web traffic, the more accurate it will be
- e. if the model signifies t=0, then t=1 will be more accurate than t=10