The field of quantitative finance is intensely competitive and maniacally secretive as a rule. The tendency toward secrecy is perhaps unsurprising given that the smallest of competitive advantages can translate to substantial profits. Indeed, over the past decade a growing list of legal prosecutions for alleged code theft or misuse have underscored how high the stakes can be for developers looking to leverage and contribute to open source projects. Notable exceptions to this approach include work from Wes McKinney and Travis Oliphant, whose work on open source projects like pandas and numpy, which have gained widespread adoption. In this talk we will review some of the costs and benefits of engaging with open source as a “two way street” and frame the modern quant workflow as a mosaic of open sourced, third party, and proprietary components.
Our story starts almost a decade ago, on July 3, 2009 in the Newark International airport. Elite Programmer Serge Aleynikov steps off a flight from Chicago (where he has just accepted a new job at an HFT firm) and is greeted on the jetway by 3 men in dark suits. - The men are FBI agents. - Serge is handcuffed and escorted off the jetway and into a waiting towncar. - Only later does he learn what he is charged with, stealing computer code owned by his former employer ... Goldman Sachs. - Serge’s story became an inspiration for Michael Lewis’ vanity fair article on the topic and for the best-selling book Flash Boys.
Serge’s story is hardly the only scary tale of programmers being accused of viewing or stealing “secret code worth millions of dollars”.
A quick google search turned up half a dozen hits, not all of which have so clear of a connection to open source – nonetheless – these stories highlight the high stakes of determining what is proprietary software in the quant trading world.
Being open (source) in the traditionally secretive field of quant finance.
A beginner’s guide to being open
(source) in the traditionally secretive
field of quantitative finance.
PyData NYC October 2018
Jess Stauth, PhD
Portfolio Management and Research , Quantopian
Quantopian provides this presentation to help people write trading algorithms -
it is not intended to provide investment advice. More specifically, the material is
provided for informational purposes only and does not constitute an offer to
sell, a solicitation to buy, or a recommendation or endorsement for any security
or strategy, nor does it constitute an offer to provide investment advisory or
other services by Quantopian. In addition, the content neither constitutes
investment advice nor offers any opinion with respect to the suitability of any
security or any specific investment. Quantopian makes no guarantees as to
accuracy or completeness of the views expressed in the website. The views are
subject to change, and may have become unreliable for various reasons,
including changes in market conditions or economic circumstances.
Quants and open source – let’s start with a scary
Excerpted from: https://www.vanityfair.com/news/2013/09/michael-lewis-goldman-sachs-programmer
“…A one-way relationship with open source.”
What followed was a decade of legal battle…
• Aleynikov was convicted in 2010 of violating the Economic Espionage Act and
the Interstate Transportation of Stolen Property Act.
• In 2012, the U.S. Court of Appeals in New York reversed the conviction.
• Aleynikov was rearrested in August 2012 on state charges.
• In 2015, he was convicted of one count of unlawfully using secret scientific
material, but the judge threw out the verdict and acquitted him.
• The District attorney’s office appealed and the conviction was reinstated by an
intermediate appellate court.
• In May of 2018 the appeals court upheld the conviction but declined to seek
additional jail time. All told Aleynikov served one year in prison.
Can open-source be a two-way street, even in
traditionally secretive quant finance land?
Short answer: YES!
Longer answer: Yes, but…
• Effort and some expertise are required to separate source code that is truly
proprietary from that which is effectively commoditized.
• Some companies just don’t see the benefits as large enough to outweigh the costs
• The first real example of this in the quant finance space of course was Wes’ open
sourcing of pandas while at AQR, built on Numpy/Travis’ work!
Costs and risks – both real and perceived
• Engaging users on mailing lists
• Reviewing pull requests
• Making proper releases
• Navigating interdependencies between open and closed source
• May fail to get engagement if response times are slow
• IP leakage, if you misjudge the line between commodity and proprietary code
• Perception of IP leakage, non-technical stakeholders need to trust that what is
open sourced is not a ‘trade secret’
• Embarrassment “What if everyone thinks my code is crap!?!”
Why it’s worth it to be open, and not just with
• Transparency builds trustful relationships
that extend far beyond the walls of your
• Robustness scales with use.
• Linus’ Law – Given enough eyes, all bugs are
• Talent is globally distributed.
• Some of our best hires have come from OSS
• Our asset management business model relies
on contributions from the community in the
form of (closed source) algorithms.
• Life comes at you fast
h/t Thomas Wiecki’s open source talk
Ok so it’s worth it, but there remain costs and
risks – so how does this all come together?
The modern quant finance workflow – a mosaic of
open and closed source components
• Ingest both publicly available data such as company financial statements and proprietary data sources
• Open source tools (e.g. Jupyter, Pandas, Numpy, SciPy, Alphalens, Qgrid, Qdb)
• Product is proprietary signals/factors
• Open source event simulation tools (e.g. Zipline)
• May exploit vendor models/tools for cost modeling.
• Product is a proprietary alpha model (or trading algo)
• Vendor tools for risk management and portfolio construction (e.g. Axioma, Barra, Northfield), or
Quantopian’s free Risk Model
• Product is a *very* proprietary trade list and possibly an execution strategy
Developer / Researcher
Proprietary signals, factors,
insights from data
3rd party vendors
• ITG Market Impact Model
• Executing Brokers
Proprietary logic combining
individual factors into a
predictive model or algo
Proprietary execution strategy
if HFT, else rebalance
Same mosaic, fewer words…
Some (more) Open source projects we maintain
• qdb - A debugger for python that allows users to debug code executing on remote machine.
• empyrical - A python library for computing common financial risk and performance metrics. Used by
zipline and pyfolio.
We have committers or made significant contributions to:
Smaller contributions to:
Acknowledgement to our amazing OSS
… and counting!
Contribute $ as well
Thank you and Happy Halloween!
Email: jess at quantopian dot com @jstauth
Zipline is a Python library for algorithmic backtesting and trading. Zipline is the backtesting
and live-trading engine powering Quantopian – a free, community-centered, hosted
platform for researching quantitative trading strategies. The best strategies are eligible for
performance-based royalty licenses.
Pyfolio is a Python library for performance and risk analysis of financial portfolios. It works
well with the Zipline open source backtesting library. At the core of pyfolio is a tearsheet
consisting of individual plots that provide a comprehensive overview of the performance of
a trading algorithm.
Alphalens is a Python library for performance analysis of predictive (alpha) stock factors.
Alphalens is compatible with the Zipline open source backtesting library,
and Pyfolio which provides performance and risk analysis of financial portfolios.
qgrid - An interactive grid for sorting, filtering, and editing DataFrames in Jupyter