Part III – Scikit Classification Algorithms


  1. Introduction and Discussion of the Problem
  2. Feature Generation
  3. Classification Algorithms
  4. Feature/Model Selection
  5. Results on Test Set
  6. Trading Algorithm and Portfolio Performance

So finally, as a result of last post, we have a dataframe to play with.

Before diving into model and feature selection I would like to make a little overview of the Classification Algorithms I tested. In the next post I’ll show the techniques and the decision process followed to select the best model and the best features. Anyway to keep it simple I’ll present actual results only for the best model, so it is worth to introduce the other ones I implemented.

Classification ML

The following is a pretty awesome algorithm cheat-sheet provided as part of the Scikit-Learn Documentation. I’ll cover the Classification branch of the tree, going through the code needed to have the selected algorithms running.


First of all we need to prepare our data for the proper Machine Learning stuff. So let’s take care of that with prepareDataForClassification(). This function takes care of

  1. turning S&P 500 daily returns from a float to a string column (Up, Down)
  2. encoding it as a binary integer 1,0 (the only form accepted by Scikit-Learn Classifiers).
  3. selecting the features used for prediction (basically all the columns except for the first one – float returns – and the last one – string returns).
  4. splitting the whole dataframe into train and test set (based on a date passed as argument) and returns for each of them predictors and actual prediction.

performClassification() is in charge of calling the selected algorithm and returning the classification result

Following are the code snippets implementing the algorithms

Random Forest

K- Nearest Neighbors

Support Vector Machines

Adaptive Boosting

Gradient Tree Boosting

Quadratic Discriminant Analysis

Well, time to turn to some Model and Feature Selection!

by Francesco Pochetti