What is Curve Fitting (Overfitting) in Trading?

Ok, yes, I understand that many of you experienced traders feel that curve fitting (aka overfitting aka data fitting) is such a rudimentary (and over-blogged) topic. However, understanding this concept is extremely important for designing and testing effective trading strategies. Thus, for those of you who are new to trading; make sure you thoroughly understand this foundational theory!

overfitting comics

Understanding Curve Fitting

Definition (from Wiki[1])

In statistics and machine learning, overfitting (curve fitting) occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations.

Definition in Simpler Words

Curve fitting is the process of adapting a trading system so closely to historical data (which includes both noise and signals) that it becomes ineffective in the future.

Why is Curve Fitting Bad?

The past does not predict the future perfectly, especially in financial markets.

Adapting strategies too closely to past data will result in an inflexibility to adapt to the future. Hence, it leads to poor performance in the future.

What Does It Mean to Reduce Curve Fitting?

We need to adapt our trading strategies to signals in historical data, not noise.

Curve Fitting in Nature

The idea of curve fitting can be seen in nature. We shall use 3 animals as case studies to understand curve fitting in nature.

giraffe2

I die easily…

Case 1: Giraffe (Curve Fitted)

Can only survive on land. Prefers leafy tall trees.

We consider giraffes as curve fitted animals. If we throw them into the ocean, they die. If we throw them into Antarctica, they die. If we remove leafy trees and just leave them with grass patches, they will find it difficult to survive.

Hence, the future of their environment needs to be similar to its past. If the giraffes’ environment starts changing aversely, they will be in trouble.

Case 2: Turtles (Reasonably Robust[2])

Can survive on land and water, but they are cold-blooded.

Turtles are much hardier creatures. However, the downside is that they are cold-blooded. Their body temperature relies on their surrounding environment. That said, these creatures can be considered reasonably robust. They can survive most changes in their environmental conditions.

Case 3: Tardigrade aka Water Bears (Extremely Robust)

Can survive in extreme temperatures [1 K (−458 °F; −272 °C) to about 420 K (300 °F; 150 °C)], pressure, radiation and outer space.

Fig 2: Look at this cutie!

Look at this cutie!

What kind of monster is this?! Tardigrades are tiny (1mm in length) but they are the hardiest animal on earth. Just a few days ago, I read that a tardigrade was brought back to life after being frozen for 30 years[3]! And it gave birth to 14 healthy babies!

Tardigrades are extremely robust. They can survive almost anything that the future decides to throw at them. In general, we should design trading strategies in the same way that nature designs tardigrades.

Data-centric View of Curve Fitting

Chart view of curve fitting

Modelling data points.

 Now that we have a better understanding of curve fitting, let’s leave the animal kingdom and head back to Wall Street. Looking at curve fitting in nature is interesting, but we need to understand the context of curve fitting in trading.

In Fig 3, we have 3 charts with the same data. We are trying to create a model that fits the shape of the data. We can clearly see that the data forms a U-shape. This model will be used to predict future data points.

In the leftmost chart, our model is a straight line. This is not representative of the data at all. It will have poor predictive abilities. We describe this model as an underfitted model.

In the rightmost chart, our model intercepts every data point. This model fits the data perfectly. On paper, this seems like the perfect model. And it is, if we are not using it to predict the future. You see, unless future data points follow the past perfectly, this model will have very poor predictive value. This is an overfitted model.

In the middle chart, our model describes the general shape of the data points. This model does have some level of error – it does not intercept all the data points. However, this is fine. We need our models to have a certain degree of error. This means that the model does not rigidly follow the past. This model captures the signal in the data (signal refers to the U-shaped data points). Thus, it should be able to adapt to minor changes to the data structure in future. This model is a good fit aka it is robust.

Demonstrating Curve Fitting

Enough theory! Let’s see some action! Ok sir, let’s curve fit some stuff on purpose.

In this exercise, we will curve fit a basic trading robot we use at AlgoTrading101 called Belinda.

Disclaimer: This is the WRONG way to conduct your optimisation[4]. Do not try this at home!

Step 1:

We run an optimisation for Belinda by varying 3 variables: sma_short, sma_long and atr_period.

1

Step 2:

We run the optimisation from 1st April 2014 to 1st January 2015.

2

Step 3:

We find the optimised parameter values aka the parameter values that produce the best objective function.

3

Step 4:

Using the optimised parameter values, we run a backtest[5] to see the performance and equity curve in detail. We use the same backtest dates as before: 1st April 2014 to 1st Jan 2015. We should expect to see a profit of $3,549.18.

4

Step 5:

Now we test Belinda with the optimised parameter values using data from the future. As mentioned, the future rarely reflects the past perfectly. Thus, we do not expect this backtest to be profitable.

Let’s run the backtest in the future period: 1st Jan 2015 to 1st Oct 2015 (the next 9 months after previous period).

5

What a disaster! (Unsurprisingly)

Ok, now that we know the negative effects of curve fitting, the next question is “How do we minimise curve fitting?” At AlgoTrading101 we have designed 10 methods to reduce curve fitting. Unfortunately, that is paid content so if you are keen, go sign up!

Before you go…

We need to show you something called performance manipulation. Occasionally, you may see some people selling trading robots over the web. They claim that their robot can make 100% returns overnight. And they manage to produce some equity charts and performance tables that do indeed indicate 100% returns in a night.

So, is their robot legitimate? Well, I have never bought such robots so I can’t refute their claim. However, I may have some insights into how they produce such incredible performance.

Step 1:

Run an optimisation and find the optimised parameter values. Run a backtest using these parameter values and the same dates as used in the optimisation. The difference now is that we increase our position size to insane levels. We use 20% risk per trade.

Short version: Repeat Step 4 in the above example but trade much bigger!

6

Look at that performance! Did we just turn $10,000 into $269,086 over 9 months?!

7

Oops, it is not $269,086. It is $2,544,741.19! I’m going to be a zillionaire!!! (And yes, zillionaire is a real word!)


AlgoTrading101 is an Investopedia-featured algorithmic trading course that doesn’t suck. Learn more about us at AlgoTrading101.

Keep up to date


[1] https://en.wikipedia.org/wiki/Overfitting

[2] When we use the word robust in this post, we are referring to Optimisation Robustness (i.e. Maximising objective function while minimising curve fitting.)

[3] http://www.sciencealert.com/a-frozen-tardigrade-has-been-brought-back-to-life-after-30-years

[4] http://www.investopedia.com/terms/o/optimization.asp

[5] http://www.investopedia.com/terms/b/backtesting.asp

 

Lucas Liew

This dude runs AlgoTrading101.com, an algorithmic trading academy with over 13,000 students. Click on the "Author" link above to learn more about him.