In our previous post, we explored the potential of ChatGPT as a forecasting support tool. In this post, we put ChatGPT to the test and evaluate its predictions made entirely on its own, without any human assistance. To do this, we will use the normalized mean square error (NMSE) as our evaluation metric. The NMSE is a measure of the accuracy of a prediction. It is calculated by dividing the mean square error (MSE) of the prediction by the variance of the true values. In general, the NMSE is preferred over the MSE when you want to compare the accuracy of different predictions that are based on datasets with different variances.
def calc_nmse(true_values, predicted_values):
"""Calculate the normalized mean square error (NMSE)"""
# Calculate the mean square error (MSE)
mse = sum([(y - ŷ)**2 for y, ŷ in zip(true_values, predicted_values)]) / len(true_values)
# Calculate the variance of the true values
variance = sum([(y - sum(true_values)/len(true_values))**2 for y in true_values]) / (len(true_values) - 1)
# Calculate the NMSE
nmse = mse / variance
return nmse
If you want to do your own estimations and compare them to ChatGPT, don’t scroll further and estimate them here:
- How many cars are there in the United States?
- How many minutes of video are uploaded to YouTube every day?
- How many flights take off from airports around the world every day?
- How many babies are born every day?
- How many people visit Disneyland every year?
- How many cells are there in the human body?
- How many words are there in the English language?
We now let ChatGPT estimate the following values. We used the following chat message: “Estimate via Fermi quiz method QUESTION.”
- How many cars are there in the United States?
Estimated: 495 million cars
Actual: 276 million cars - How many minutes of video are uploaded to YouTube every day?
Estimated: 333,333,333 hours
Actual: 720,000 hours - How many flights take off from airports around the world every day?
Estimated: 250,000 flights/day
Actual: 100,000 flights/day - How many babies are born every day?
Estimated: 400,000 people
Actual: 385,000 babies - How many people visit Disneyland every year?
Estimated: 18 million people
Actual: 8.5 million visitors - How many cells are there in the human body?
Estimated: 100 trillion
Actual: 30 trillion - How many words are there in the English language?
Estimated: 500,000
Actual: 171,146 words
The NMSE of ChatGPT is 5.44.
A value of 0 indicates a perfect fit, while a value greater than 1 indicates a poor fit.
Have you calculated the NMSE for your forecasts? If so, please leave a comment with your result or send me your result directly. It would be interesting to see how ChatGPT’s performance compares to that of a human forecaster.