Evaluating ChatGPT’s Forecasts

In our previous post, we explored the potential of ChatGPT as a forecasting support tool. In this post, we put ChatGPT to the test and evaluate its predictions made entirely on its own, without any human assistance. To do this, we will use the normalized mean square error (NMSE) as our evaluation metric. The NMSE is a measure of the accuracy of a prediction. It is calculated by dividing the mean square error (MSE) of the prediction by the variance of the true values. In general, the NMSE is preferred over the MSE when you want to compare the accuracy of different predictions that are based on datasets with different variances.

def calc_nmse(true_values, predicted_values):
"""Calculate the normalized mean square error (NMSE)"""
# Calculate the mean square error (MSE)
mse = sum([(y - ŷ)**2 for y, ŷ in zip(true_values, predicted_values)]) / len(true_values)

# Calculate the variance of the true values
variance = sum([(y - sum(true_values)/len(true_values))**2 for y in true_values]) / (len(true_values) - 1)

# Calculate the NMSE
nmse = mse / variance

return nmse

If you want to do your own estimations and compare them to ChatGPT, don’t scroll further and estimate them here:

  1. How many cars are there in the United States?
  2. How many minutes of video are uploaded to YouTube every day?
  3. How many flights take off from airports around the world every day?
  4. How many babies are born every day?
  5. How many people visit Disneyland every year?
  6. How many cells are there in the human body?
  7. How many words are there in the English language?

We now let ChatGPT estimate the following values. We used the following chat message: “Estimate via Fermi quiz method QUESTION.”

  1. How many cars are there in the United States?
    Estimated: 495 million cars
    Actual: 276 million cars
  2. How many minutes of video are uploaded to YouTube every day?
    Estimated: 333,333,333 hours
    Actual: 720,000 hours
  3. How many flights take off from airports around the world every day?
    Estimated: 250,000 flights/day
    Actual: 100,000 flights/day
  4. How many babies are born every day?
    Estimated: 400,000 people
    Actual: 385,000 babies
  5. How many people visit Disneyland every year?
    Estimated: 18 million people
    Actual: 8.5 million visitors
  6. How many cells are there in the human body?
    Estimated: 100 trillion
    Actual: 30 trillion
  7. How many words are there in the English language?
    Estimated: 500,000
    Actual: 171,146 words

The NMSE of ChatGPT is 5.44.
A value of 0 indicates a perfect fit, while a value greater than 1 indicates a poor fit.

Have you calculated the NMSE for your forecasts? If so, please leave a comment with your result or send me your result directly. It would be interesting to see how ChatGPT’s performance compares to that of a human forecaster.

Leave a Reply

Your email address will not be published. Required fields are marked *