Why should I measure forecast accuracy?
1. Improving your forecasting process requires the ability to track accuracy.
Forecasting should be viewed as a continuous improvement process. Your forecasting team should be constantly striving to improve the forecasting process and forecast accuracy. Doing so requires knowing what is working and what is not.
For example, many organizations generate baseline forecasts using statistical approaches and then make judgmental adjustments to them to capture their knowledge of future events. Organizations that track the accuracy of both the statistical and adjusted forecasts learn where the adjustments improve the forecasts and where they make them worse. This knowledge allows them to focus their time and attention on the items where the adjustments are adding value.
2. Tracking accuracy provides insight into expected performance.
A forecast is more than a number. To use a forecast effectively you need an understanding of the expected accuracy.
Within-sample statistics and confidence limits provide some insight into expected accuracy; however, they almost always underestimate the actual (out-of-sample) forecasting error. This is due to the fact that the parameters of a statistical model are selected to minimize the fitted error over the historic data. The parameters are thus adapted to the historic data, and reflect any of its peculiarities. Put another way, the model is optimized for the past—not for the future.
Generally speaking, out-of-sample statistics (i.e., historic forecast errors) yield a better measure of expected forecast accuracy than within-sample statistics.
3. Tracking accuracy allows you to benchmark your forecasts.
If you are lucky enough to be in an industry with published statistics on forecast accuracy, comparing your accuracy to these benchmarks provides insight into your forecasting effectiveness. If industry benchmarks are not available (usually the case), periodically benchmarking your current forecast accuracy against your earlier forecast accuracy allows you to measure your improvement.
4. Monitoring forecast accuracy allows you to spot problems early.
An abrupt unexpected change in forecast accuracy is often the result of some underlying event. For example, if unbeknownst to you, a key customer decides to carry a competing product, your first indication might be an unusually large forecast error. Routinely monitoring forecast errors allows you to spot, investigate and respond to these changes early on—before they turn into bigger problems.
A Brief Guide to Forecast Accuracy Metrics and How to Use Them
The MAPE. The MAPE (Mean Absolute Percent Error) measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error, as shown in the example below:
Many organizations focus primarily on the MAPE when assessing forecast accuracy. SInce most people are comfortable thinking in percentage terms, the MAPE is easy to interpret. It can also convey information when you don’t know the item’s demand volume. For example, telling your manager “we were off by less than 4%” is more meaningful than saying “we were off by 3,000 cases” if your manager doesn’t know an item’s typical demand volume.
The MAPE is scale sensitive and should not be used when working with low-volume data. Notice that because “Actual” is in the denominator of the equation, the MAPE is undefined when Actual demand is zero. Furthermore, when the Actual value is not zero, but quite small, the MAPE will often take on extreme values. This scale sensitivity renders the MAPE ineffective as an error measure for low-volume data.
The MAD. The MAD (Mean Absolute Deviation) measures the size of the error in units. It is calculated as the average of the unsigned errors, as shown in the example below:
The MAD is a good statistic to use when analyzing the error for a single item; however, if you aggregate MADs over multiple items you need to be careful about high-volume products dominating the results—more on this later.
The MAPE and the MAD are by far the most commonly used error measurement statistics. There are a slew of alternative statistics in the forecasting literature, many of which are variations on the MAPE and the MAD. A few of the more important ones are listed below:
MAD/Mean Ratio. The MAD/Mean ratio is an alternative to the MAPE that is better suited to intermittent and low-volume data. As stated previously, percentage errors cannot be calculated when the Actual equals zero and can take on extreme values when dealing with low-volume data. These issues are magnified when you start to average MAPEs over multiple time series. The MAD/Mean ratio tries to overcome this problem by dividing the MAD by the Mean—essentially rescaling the error to make it comparable across time series of varying scales. The statistic is calculated exactly as the name suggests—it is simply the MAD divided by the Mean.
GMRAE. The GMRAE (Geometric Mean Relative Absolute Error) is used to measure out-of-sample forecast performance. It is calculated using the relative error between the naïve model (i.e., next period’s forecast is this period’s actual) and the currently selected model. A GMRAE of 0.54 indicates that the size of the current model’s error is only 54% of the size of the error generated using the naïve model for the same data set. Because the GMRAE is based on a relative error, it is less scale sensitive than the MAPE and the MAD.
SMAPE. The SMAPE (Symmetric Mean Absolute Percentage Error) is a variation on the MAPE that is calculated using the average of the absolute value of the actual and the absolute value of the forecast in the denominator. This statistic is preferred to the MAPE by some and was used as an accuracy measure in several forecasting competitions.
Measuring Error for a Single Item vs. Measuring Errors Across Multiple Items
Measuring forecast error for a single item is pretty straightforward.
If you are working with an item which has reasonable demand volume, any of the aforementioned error measurements can be used. You should select the one that you and your organization are most comfortable with—for many organizations this will be the MAPE or the MAD. If you are working with a low-volume item then the MAD is a good choice, while the MAPE and other percentage-based statistics should be avoided.
Calculating error measurement statistics across multiple items can be quite problematic.
Calculating an aggregated MAPE is a common practice. A potential problem with this approach is that the lower-volume items (which will usually have higher MAPEs) can dominate the statistic. This is usually not desirable. One solution is to first segregate the items into different groups based upon volume (e.g., ABC categorization) and then calculate separate statistics for each group. Another approach is to establish a weight for each item’s MAPE that reflects the item’s relative importance to the organization—this is an excellent practice.
Since the MAD is a unit error, calculating an aggregated MAD across multiple items only makes sense when using comparable units. For example, if you measure the error in dollars then the aggregated MAD will tell you the average error in dollars.