Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stauth common pitfalls_stock_market_modeling_pqtc_fall2018

Data Modeling the Stock Market Today - Common Pitfalls to Avoid
The lure of creating models to predict the stock market has drawn talent from fields beyond finance and economics, reaching into disciplines such as physics, computational chemistry, applied mathematics, electrical engineering and perhaps most recently statistics and what we now refer to as data science. The attraction is clear - the stock market (and the economy/internet at large) throws off massive and ever increasing reams of data from garden variety time-series to complex structured data sets like quarterly financials, to unstructured data sets like conference call transcripts, news articles and of course — tweets! While all this data holds promise - it also holds traps and blind alleys that can be deceptively tricky to avoid. In this session we’ll review some of the common (but not easy!) pitfalls to avoid in creating models for predicting stock returns; overfitting & exploding model complexity, non-stationary processes, time-travel illusions, under-estimation of real-world costs, and as many more as we have time to cover.

  • Login to see the comments

Stauth common pitfalls_stock_market_modeling_pqtc_fall2018

  1. 1. Modeling the Stock Market: Common pitfalls… and how to avoid them! Jess Stauth Portfolio Management and Research / @jstauth
  2. 2. Disclaimer Quantopian provides this presentation to help people write trading algorithms - it is not intended to provide investment advice. More specifically, the material is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory or other services by Quantopian. In addition, the content neither constitutes investment advice nor offers any opinion with respect to the suitability of any security or any specific investment.
  3. 3. Motivation Building a beautiful backtest is easy! But… Don’t expect anyone to pay you for it! Building a model that predicts the future is HARD! But… Many people will fight to pay you a lot for doing that!
  4. 4. Ok, so it’s hard. I love hard work! What’s the catch? • It can be hard to know when you have what you want – aka “future predictor”! • We “simulate” the future (usually using the past!) to validate our model • But what if our simulation doesn’t match reality? • Or our data was flawed? • Or we just got lucky? • Or… Idea Data Research/ Build model Simulate Trade $$$
  5. 5. Common pitfalls that turn into 1. Overfitting 2. Overtrading 3. Non-stationary processes / regime changes 4. Lookahead aka “time travel illusion” 5. Model complexity
  6. 6. Let’s talk about overfitting
  7. 7. 1. Overfitting Real world example: The incredible shrinking portfolio Example from A Quantopian author / model developer in diligence. A robust ‘information rich’ signal should show stable or increasingly good performance (Sharpe ratio) as you increase the number of assets included. Fundamental law of active management*: IR = IC * sqrt(N) Finding that your signal is degraded by expanding the number of assets scored is a red flag that you may have identified an unstable, noisy, or spurious effect How to avoid: Take care not to ‘over optimize’ your model on a small number of data points (in our use case those are assets/ stock tickers) *Grinold and Kahn. Active Portfolio Management – pdf online
  8. 8. This phenomenon of overly concentrated portfolios turned out to be prevalent in the submissions to Quantopian’s daily contest. In a July ‘tearsheet feedback’ thread and webinar we highlighted this pitfall. We ran a second feedback session earlier this month and…
  9. 9. 2. Overtrading – three real examples Algo A Algo B Algo C “low” costs “high” costs Trading algorithms developed with the assumption of “low” (or no) cost of trading in the markets often show unrealistically good returns. How to avoid: Using conservative cost estimates, and looking at the sensitivity of your stock market model to the underlying assumption of what your costs will be can be the difference between profits and losses in the real world!
  10. 10. 3. Regime Shift/Non-stationarity • Many common time-series techniques assume data are stationary (constant mean and variance). • Imagine doing all your research on data from 2016/17 and evaluating a model that makes money shorting volatility… • How to avoid: Know that markets are always changing and make sure to backtest over long enough time ranges to see regime changes that might impact your model. Vol Regimes – Quantopian Blog
  11. 11. 4. Time travel illusion: What did you know and when did you know it? • Classic date alignment fail examples: • Drop the timestamp from close prices and build a daily technical factor... You’ll prove that knowing the 4pm price at 9am would be super valuable! • Modeling earnings surprises and assuming your model knows actual reported earnings on quarter end dates, when IRL you don’t get them for 45+ DAYS after… • How to avoid: Same principle as with modeling market impact, be conservative with your assumptions about data timeliness and check your strategy’s robustness to lagged data over a range of lags.
  12. 12. 5. Model complexity
  13. 13. Resources: • backtests-to-lie-by-tucker-balch-ph-d/ • ^Tucker Balch recorded talk QuantCon 2015 • Lots of other great stuff on Quantopian’s Youtube Channel • Quantopian Lectures Thank you! Questions? Do you have a unique idea and think you’ve avoided these pitfalls? Enter the Quantopian Contest and find out if you’re smarter than the average Quantopian user! / @jstauth