How to Forecast Data Containing Outliers

An outlier is a data point that falls outside of the expected range of the data (i.e., it is an unusually large or small data point). If you ignore outliers in your data, there is a danger that they can have a significant adverse impact on your forecasts. This article surveys three different approaches to forecasting data containing outliers, discusses the pros and cons of each and makes recommendations about when it is best to use each approach.

Option #1: Outlier Correction

A simple solution to lessen the impact of an outlier is to replace the outlier with a more typical value prior to generating the forecasts. This process is often referred to as Outlier Correction. Many forecasting solutions, including Forecast Pro, offer automated procedures for detecting outliers and “correcting” the history prior to forecasting.

Correcting the history for a severe outlier will often improve the forecast; however, if the outlier is not truly severe, corrections may do more harm than good. When you correct an outlier, you are rewriting the history to be smoother than it actually was and this will change the forecasts and narrow the confidence limits. If the correction was not necessary, you may end up with poor forecasts and unrealistic confidence limits

The screenshot above shows Forecast Pro TRAC’s ability to “correct” potential outliers.

Recommendations for outlier correction

1. If the cause of an outlier is known, alternative approaches (such as Option #2 and Option #3 below) should be considered prior to resorting to outlier correction.

2. Outlier correction should be performed sparingly. Using an automated detection algorithm to identify potential candidates for correction is very useful; however, the detected outliers should ideally be individually reviewed by the forecaster to determine whether a correction is appropriate.

3. In cases where an automated outlier detection and correction procedure must be used, (for example if the sheer number of forecasts to be generated precludes human review), then the thresholds for identifying and correcting an outlier should be set very high. Ideally the thresholds would be calibrated empirically by experimenting with a subset of the data.

Option #2: Separate the Demand Streams

At times, when the cause of an outlier is known, it may be useful to separate a time series into two different demand streams and forecast them separately. Consider the following three examples.

Example A: A pharmaceutical company’s demand for a given drug consists of both prescription fills (sales) and free goods (e.g., samples distributed free of charge to physicians). The timing of the distribution of free goods introduces outliers in the time series representing total demand. Separating the demand streams yields an outlier-free prescription fills series and allows different forecasting approaches to be used for each series—which is appropriate since the drivers generating the demand are different for the two series.

Example B: A manufacturing company’s demand normally consists of orders from its distributors. In response to an unusual event, the government places a large one-time order that introduces a significant outlier into the demand series, but does not impact base demand from the distributors. Separating the demand streams yields an outlier-free distributor demand series and allows the forecast for the government’s demand series to be simply set to zero.

Example C: A food and beverage company sells its products from both store shelves and promotional displays (e.g., end caps, point-of-sale displays, etc.). It has access to data for the two separate demand streams. Although it is tempting to forecast these two series separately, it may not be the best approach. Although the promotional displays will increase total demand, they will also cannibalize base demand. In this example it may be better to forecast total demand using a forecasting method that can accommodate the promotions (e.g., event models, regression, etc.).

Recommendations for separating demand streams

1. Separating the demand streams should only be considered when you understand the different sources of demand that are introducing the outliers.

2. If the demand streams can be separated in a “surgically-clean” manner, you should consider separating the demand streams and forecasting them separately.

3. In cases where the demand streams cannot be cleanly separated, you are often better of working with a single time series.

Option #3: Use a Forecasting Method Capable of Modeling the Outliers

Outliers can be caused by events of which you have knowledge (e.g., promotions, one-time orders, strikes, catastrophes, etc.) or can be caused by events of which you have no knowledge (i.e., you know that the point is unusual, but you don’t know why). If you have knowledge of the events that created the outliers, you should consider using a forecasting method that explicitly models these events.

Event models are an extension of exponential smoothing that are particularly well suited to this task. They are easy to build and lend themselves well to automation. Another option is dynamic regression.

Unlike time series methods, which base the forecasts solely on the items’ past history, event models and dynamic regression are causal models, which allow you to bring in additional information such as promotional schedules, the timing of business interruptions and (in the case of dynamic regression) explanatory variables.

By capturing the response to the events as part of the overall forecasting model these techniques often improve the accuracy of the forecasts as well as providing insights into the impact of the events.

Recommendations for Using Forecasting Methods Capable of Outlier Moding

In instances where the causes of the outliers are known, you should consider using a forecasting method that explicitly models the events.

Summary

Ignoring large outliers in your data often leads to poor forecasts. The best approach to forecasting data containing outliers depends on the nature of the outliers and the resources of the forecaster. In this article, we have discussed three approaches—outlier correction, separating the demand streams and modeling the outliers—which can be used when creating forecasts based on data containing outliers.

If you would like to see how Forecast Pro TRAC can help manage outliers and address other forecasting challenges, schedule a personalized Web-based demo with one our specialists.

How to Forecast Data Containing Outliers

Option #1: Outlier Correction

Recommendations for outlier correction

Option #2: Separate the Demand Streams

Recommendations for separating demand streams

Option #3: Use a Forecasting Method Capable of Modeling the Outliers

Recommendations for Using Forecasting Methods Capable of Outlier Moding

Summary

Related Posts

How do I measure forecast accuracy?

How do I use Statistical Models to Forecast Sales?

Understanding Pareto (ABC) Analysis