The Alpha Scientist

Discovering alpha in the stock market using data science

Reddit for Fun and Profit [part 2]

The "Roaring Kitty" episode of January/February 2021 may have been the most remarkable financial story of the year. It demonstrated the amazing capacity of social media to move markets. This post is part 2 in a series on exploring how to scrape reddit data to increase returns and/or avoid volatility in your portfolio.

Reddit for Fun and Profit [part 1]

The "Roaring Kitty" episode of January/February 2021 may have been the most remarkable financial story of the year. It demonstrated the amazing capacity of social media to move markets. This series of posts demonstrates the basics of how anyone can programmatically extract posts from forums like r/wallstreetbets to identify stocks which might be driven more by the internet mobs than by wall street tycoons.

COVID By The Numbers

During the past 18 months of the COVID pandmeic, we’ve all become armchair epidemiologists. Those with quantitative minds and python capabilities may want to “crunch the numbers” in a way that goes beyond widely-used web dashboards. This post walks through how to download, prepare, and analyze a rich historical dataset using the COVID Act Now API.

Contribute to The Alpha Scientist blog!

The Alpha Scientist is seeking submissions from bloggers interested in contributing to a conversation around usage of data science for investment research. If you may be interested, please contact thealphascientist@gmail.com.

Stock Prediction with ML: Ensemble Modeling

This final post in the six-part tutorial series presents an ensemble modeling technique called "stacked generalization". This technique focuses on improving out-of-sample generalization of models, making it extremely well suited to messy, non-stationary, regime shifting markets. Like the prior five parts of the series, this presents a practical framework for applying ML to real-world financial time series prediction problems.

Towards Better Keras Modeling

If deep learning is part art and part science, one of the more "artistic" aspects must be model topology design. Deep learning books and courses are generally vague on the subject. Practitioners are often forced to resort to tedious trial and error. In this post, I'll walk through talos, a package which aims to automate the process of hyperparameter optimization for keras models.

Swimming Against the Current

October 2018 has started with a frenzy of activity in the markets. This post explores which way the funds have been flowing in October so far. It is a follow-up to a prior post which found that fund flows can provide a contra-indicator of future market direction.

Listening to the Short Sellers

Short sellers are often thought to be among the most informed market participants. Can a systematic, quantitative trader find meaningful edge following (or fading) the actions of short sellers? This article crunches the numbers for 2000 stocks across a full year of daily data to find signal inside of shorting.

Stock Prediction with ML: Model Evaluation

This post is going to delve into the mechanics of walk-forward modeling which is, in my view, the most robust way to train and apply machine learning models in inherently sequential domains like finance. The overriding objective of the methods described here is to overcome the issues inherent in traditional cross validation approachs.

Stock Prediction with ML: Walk-forward Modeling

This fourth post in a series on transforming data into alpha provides the mechanics of walk-forward modeling which is often regarded the most robust way to train and apply machine learning models in inherently sequential domains like finance.

Stock Prediction with ML: Feature Selection

This is the third post in my tutorial series on applying machine learning to stock prediction. This post is going to delve into the mechanics of feature selection - a critical step towards improving model robustness. This will cover a systematic approach for choosing between the many variations of features you've created during the feature engineering stage.

Stock Prediction with ML: Feature Engineering

This post is going to delve into the mechanics of feature engineering for the sorts of time series data that you may use as part of a stock price prediction modeling system. I'll cover the basic concept, then offer some useful python code "recipes" for transforming your raw source data into features which can be fed directly into a ML algorithm or ML pipeline.

Stock Prediction with ML: Data Management

This first post of a tutorial on machine learning in finance will present a framework for organizing and working with data. Perhaps not the most electrifying of topics, but it's the foundation for later modeling tutorials. It's also of critical significance and importance. I've heard it said that 90% of time in real-world quant finance is spent on data rather than models.

Our Own Worst Enemy

We've all had the feeling - buying or selling at the worst possible time. Rest assured, you're not alone. The average investor significantly underperforms the market indices because the market consistently lures them into buying high and selling low. This post delves into the data and finds that typical ETF investors underperform the funds they use by 1 to 4% per year.

Sell in May and Go Away?

There's an old wall street axiom which advises to "sell in May and go away". Has this advice ceased to work in modern markets? Are there other seasonal patterns in the broader market? This analysis evaluates seasonal patterns on the S&P 500 from 1993 to present.

We Are All FX Traders Now

Investors in the US have the luxury of ignoring exchange rates when trading in markets quoted in US dollars. However, this can create a blind spot to the giant impact that exchange rates have on prices and returns.