Tag Archives: Python

Recommendation Engines

All posts in the series:

  1. Linear Regression
  2. Logistic Regression
  3. Neural Networks
  4. The Bias v.s. Variance Tradeoff
  5. Support Vector Machines
  6. K-means Clustering
  7. Dimensionality Reduction and Recommender Systems
  8. Principal Component Analysis
  9. Recommendation Engines

Here my pythonic playground about Recommendation Engines.
The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course on Coursera.
I had some fun translating everything into python!
Find the full code here on Github and the nbviewer version here.

by Francesco Pochetti

K-means Clustering

All posts in the series:

  1. Linear Regression
  2. Logistic Regression
  3. Neural Networks
  4. The Bias v.s. Variance Tradeoff
  5. Support Vector Machines
  6. K-means Clustering
  7. Dimensionality Reduction and Recommender Systems
  8. Principal Component Analysis
  9. Recommendation Engines

Here my pythonic playground about K-means Clustering.
The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course on Coursera.
I had some fun translating everything into python!
Find the full code here on Github and the nbviewer version here.

by Francesco Pochetti

Support Vector Machines

All posts in the series:

  1. Linear Regression
  2. Logistic Regression
  3. Neural Networks
  4. The Bias v.s. Variance Tradeoff
  5. Support Vector Machines
  6. K-means Clustering
  7. Dimensionality Reduction and Recommender Systems
  8. Principal Component Analysis
  9. Recommendation Engines

Here my pythonic playground about Support Vector Machines.
The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course on Coursera.
I had some fun translating everything into python!
Find the full code here on Github and the nbviewer version here.

by Francesco Pochetti

The Bias v.s. Variance Tradeoff

All posts in the series:

  1. Linear Regression
  2. Logistic Regression
  3. Neural Networks
  4. The Bias v.s. Variance Tradeoff
  5. Support Vector Machines
  6. K-means Clustering
  7. Dimensionality Reduction and Recommender Systems
  8. Principal Component Analysis
  9. Recommendation Engines

Here my pythonic playground about Bias v.s Variance in Machine Learning.
The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course on Coursera.
I had some fun translating everything into python!
Find the full code here on Github and the nbviewer version here.

by Francesco Pochetti

Pythonic Logistic Regression

All posts in the series:

  1. Linear Regression
  2. Logistic Regression
  3. Neural Networks
  4. The Bias v.s. Variance Tradeoff
  5. Support Vector Machines
  6. K-means Clustering
  7. Dimensionality Reduction and Recommender Systems
  8. Principal Component Analysis
  9. Recommendation Engines

Here my implementation of Logistic Regression in numpy.
The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course on Coursera.
I had some fun translating everything into python!
Find the full code here on Github and the nbviewer version here.

by Francesco Pochetti

Pythonic Linear Regression

All posts in the series:

  1. Linear Regression
  2. Logistic Regression
  3. Neural Networks
  4. The Bias v.s. Variance Tradeoff
  5. Support Vector Machines
  6. K-means Clustering
  7. Dimensionality Reduction and Recommender Systems
  8. Principal Component Analysis
  9. Recommendation Engines

Here my implementation of Linear Regression in numpy.
The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course on Coursera.
I had some fun translating everything into python!
Find the full code here on Github and the nbviewer version here.

by Francesco Pochetti

Predict physical and chemical properties of soil using spectral measurements

Check out on NBViewer the work I’ve done with Pandas, Scikit-Learn, Matplotlib wrapped up in  IPython about predicting physical and chemical properties of African soil using spectral measurements on Kaggle.

The code and the files are also available on Github.

Here the challenge: “Advances in rapid, low cost analysis of soil samples using infrared spectroscopy, georeferencing of soil samples, and greater availability of earth remote sensing data provide new opportunities for predicting soil functional properties at unsampled locations. Soil functional properties are those properties related to a soil’s capacity to support essential ecosystem services such as primary productivity, nutrient and water retention, and resistance to soil erosion. Digital mapping of soil functional properties, especially in data sparse regions such as Africa, is important for planning sustainable agricultural intensification and natural resources management.

Diffuse reflectance infrared spectroscopy has shown potential in numerous studies to provide a highly repeatable, rapid and low cost measurement of many soil functional properties. The amount of light absorbed by a soil sample is measured, with minimal sample preparation, at hundreds of specific wavebands across a range of wavelengths to provide an infrared spectrum (Fig. 1). The measurement can be typically performed in about 30 seconds, in contrast to conventional reference tests, which are slow and expensive and use chemicals.

Conventional reference soil tests are calibrated to the infrared spectra on a subset of samples selected to span the diversity in soils in a given target geographical area. The calibration models are then used to predict the soil test values for the whole sample set. The predicted soil test values from georeferenced soil samples can in turn be calibrated to remote sensing covariates, which are recorded for every pixel at a fixed spatial resolution in an area, and the calibration model is then used to predict the soil test values for each pixel. The result is a digital map of the soil properties.

This competition asks you to predict 5 target soil functional properties from diffuse reflectance infrared spectroscopy measurements.”

by Francesco Pochetti

PiPad – How to build a tablet with a Raspberry Pi

The Project

When I stepped into the Raspberry Pi for the first time on the web I immediately started thinking about a cool application of this amazing mini computer. There are actually a ton of very interesting projects it is possible to dive into using the Pi, ranging from a relatively simple web server to pretty complicated home automation stuff. The one which ultimately caught my full attention was with no doubt the PiPad, whose name and idea I am borrowing from Michael K Castor. So, first of all, thank you very much Michael for pioneering this application and for sharing your fantastic experience on your blog! I took inspiration from his work in the first place personalizing pipeline and components. Adding the Pi Camera Module is a good example (thanks Amandine Esser for pushing me to always raise the bar!).

I also need to address a huge thanks to Pierre Esser who helped me out with the electronic part of the project. My electronics skills are unfortunately very limited (work in progress on that) and his help was absolutely fundamental to put together an ON/OFF button which could power at the same time the board and the screen.

Before diving into the technical details I thought it would be worth sharing how the tablet looks like right now, just a couple of days after I finished putting everything together. Here a demo video. Seems to work pretty well actually!

I think we are done for the intro, so let’s get started.

The basic idea is to do the following:

  1. Get a Raspberry Pi
  2. Get a touchscreen
  3. Power both board and screen with an external battery
  4. Connect to the Pi all the necessary devices (WiFi dongle, bluetooth, camera, audio output etc)
  5. Build a wood enclosure with enough room for everything and with an easy-to-open structure (book-like) to replace any broken/mis-functioning pieces in the future.

The plan sounds a little bit oversimplified as it is stated above but those are the main points.

What I needed

How I did it

I started with the electronics. I plugged mouse, WiFi dongle, keyboard, Camera Module, SD Card (with Noobs – I planned to install Raspbian) into the Raspberry Pi. Then I focused on the screen. Here the first issues started raising. As soon as I began checking the cables I realized I had committed quite a big mistake. As explained here on the Chalk-Elec website the screen can be powered either by external power supply (5V/2A) or by USB. The second option was the one I was looking for as I planned to plug the screen directly into the Pi and get power from there. However by default the LCD can be run only via external power supply, which is not exactly what I had in mind. You don’t really want a tablet which needs  to be constantly attached to a plug. Not very portable I would say. For USB power to work some soldering is needed, as we have to detach a 0R resistor from a specific position and move it to another dedicated place on the board. Not too complicated but still a bit risky as I was not even sure I had enough voltage to power the LCD via the Pi. I was instead pretty sure of the fact that 5V were definitely enough to power Raspberry or screen alone.  Hence I went for the solution in parallel, also in the light of the future need for an ON/OFF button. The idea was the following:

  1. Cut a USB cable and solder the red wire (one of the 2 bringing power) to one of the external ON/OFF switch connectors. This USB cable links the battery to the switch, carrying 5V.
  2. Cut a second piece of wire and solder it to the central connector to get the power out of the switch. This piece of wire will work as a bridge between the ON/OFF button and the screen/Pi.
  3. Cut the external power cable provided with the LCD. The pin side of the cable will be plugged into the screen board while the other extreme will be soldered to the wire at point 2, ensuring 5V to the LCD.
  4. Cut an external power cable used to recharge mobile phones. The mini-usb side will be plugged into the Raspberry while the other extreme will be soldered to the wire at point 2, ensuring 5V to the main board.

I followed “my instructions” and there you go I had a fully functioning ON/OFF switch for my tablet. I connected everything as needed, switched on the button and both the Pi and the screen powered up.

I won’t spend too many words on the software side of the project. The first experience with Raspbian was pretty smooth. I needed to tweak a little bit the system to have the WiFi dongle work and to get the screen at full size. I also had to calibrate the LCD to adjust for my touch. Nothing impossible actually. It was everything pretty straightforward and without too much of an effort I had a fully working touch screen.

After making sure all the electronics was in place I started working on the enclosure. I wish I had a CNC machine to cut the plywood in a clean way. Unfortunately this was not the case. I was aware I had to sacrifice precision but it was an acceptable trade off in absence of more accurate machines. Hence I began with the frame following the below strategy:

  1. I cut 8 pieces of wood from a long regular plywood stick. Those would make the external part of my case. I glued them together 4 pieces at a time to obtain 2 separate frames.
  2. I connected the 2 frames with the hinges, making sure the folding was smooth enough to ensure a comfortable closing/opening of the tablet.
  3. The time for a first check had arrived. I needed to put all the electronics in place to achieve two main results. Optimize as much as possible the limited room I had available under the screen and consequently decide where to carve the frame to expose the key components (SD Card, USB, battery recharge, audio). As soon as I figured out the exact position of all my pieces, I also decided where to cut the enclosure and went on with all the carving. Specifically I created holes for the battery charger (bottom frame), ON/OFF switch (both frames), audio jack (bottom frame), USB exit (bottom frame), SD Card (bottom frame) and neodymium magnets (both frames). For all this wood work I used nothing more than drill, exacto knife and saw, cleaning everything up with rasp and sandpaper.
  4. I had my frame almost ready. Now, assuming that the top of it would have been covered by the screen, I still needed a back. So I went for another piece of thin plywood and cut a rectangular slice just for this purpose. I glued it to the bottom frame and after making sure everything had dried correctly I drilled the last 5 holes, 1 for the Pi Camera Module (which then would be used as a back camera), the other 4 for the status-lights of the external battery (to be able to check if and when to recharge it).
  5. Time for some varnishing. The enclosure was ready hence I moved to the next step which was to varnish the whole thing (a couple of layers were enough).
  6. Then I proceeded with putting all the electronics in place. I screwed Raspberry and Camera. I connected all the cables to the board and glued the relevant pieces into their respective holes (magnets included). I laid the battery and fixed it with extra strong double side tape strips. The same strips were quite useful to fix the screen to the top frame and finally close the enclosure.
  7. And now the moment of the truth. I switched it on and.. the screen lighted up and the PiPad booted! Fantastic! It was (and it still is) working!

Next Steps

I am pretty glad of the result, I must admit. For the moment I don’t have any specific plans to upgrade the tablet with new hardware. I still need to focus on the software side of the project and solve a couple of annoying issues. First of all the sound which is not working or wrapping the Camera Module into a more user-friendly interface rather than having to go to the command line every time. The touchscreen works smoothly (at least with Raspbian) and also the virtual keyboard I have installed is not too bad.

I also have to mention that a couple of months ago there was the official release of the 7″ Raspberry Pi Touchscreen which will definitely be a game changer and will probably deprecate pretty soon tablet solutions like mine. This is of course very cool as the community is always working very hard to continuously raise the bar.

There is still work to be done but the first results are pretty awesome! Keep you posted then!

by Francesco Pochetti

Stock Trading Algorithm on top of Market Event Study

This post is the result of the first six weeks of class from the Computational Investing course I’m currently following on Coursera. The course is an introduction to Portfolio Management and Optimization in Python and lays the foundations for the second part of the track which wil deal with Machine Learning for Trading. Let’s move quickly to the core of the business.

The question I want to answer with is the following:

  • Is it possible to exploit event studies in a trading strategy?

First of all we should clarify what an event study is. As Wikipedia states, an event study is a statistical method to assess the impact of an event on the value of a firm. This definition is very broad and can easily incorporate facts directly concerning the company (i.e. private life of the CEO, merging with other firms, confidential news from insiders) or anomalous fluctuactions in the price of the stock. I naively (and maybe incorrectly) categorized events regarding a company into these two types, news related and market related, but there should be no difference as they are generally tigthly correlated. In any case, as it is not easy to have access and parse in real time news feeds we will focus on market related events, meaning that in the rest of the post an event must be intended as an anomalous behavior in the price of the stock whose consequences we could exploit to trade in a more efficient way.

Now that we have properly defined an event we can go back to the beginning and think a little bit more about what study an event really means. To understand it let’s walk through a complete example and suppose that we have an event whenever the closing price of a stock at the end of day i  is less than 10$ whilst  at the end of day i-1 was more than 10$. Thus we are examining a significant drop in the price of the stock. Given this definition the answer is: what does it statistically happen to prices of stocks experiencing those kind of fluctuations? Is there a trend that could be somehow exploited? The reason at the base of these questions is that if we knew in advance that a stock followed a specific pattern as a consequence of some event we could could adjust our trading strategy accordingly. If statistics suggests that the price is bound to increase maybe it is a good idea to long the shares whether in the opposite case the best decision is to short.

In order to run an even study we take advantage of the EventProfiler class inside the QSTK library. This class allows us to define an event and then, given a time inerval and a list of stocks, it works in the following way: it scrolls firm after firm and whenever it finds an event sets that day day as day 0. Then it goes 20 days ahead and 20 days before the event and saves the timeframe. After having analyzed all the stocks it aligns the events on the day 0, averages all the prices before and after and scales the result by the market (SPY). The output is a chart which basically answers this question: what happens on average when the closing price of a stock at the end of day i  is less than 10$ whilst  at the end of day i-1 was more than 10$? The test period was the one between 1 January 2008 and 31 December 2009 (in the middle of the financial crisis), while the stocks chosen were the 500 contained in the S&P index in 2012. The graph is shown below and the following information can be extracted: first, 461 such events were registered during the investigated time frame. Second, on the day of the event there is a drop of about 10% in the stock price w.r.t the day before. Third, the price seems to recover after day zero, even though the confidence intervals of the daily increase are huge. SPY2012_10$

 Now the idea is the following. If the observed behavior is respected what we can do is build a trading strategy consisting in buying on the day of the event and selling let’s say after 5 days (we don’t want to hold too long despite the price increasing almost monotonically). Just to recap here you find the whole pipeline from event definition to portofolio assessment.

trade

Now that we have a plan let’s dive into the code (you can find all the code on Github).

First of all I’ll introduce one after the other the two main functions.

find_events(ls_symbols,  d_data,  shares=100):  given the list of the stocks in the portfolio, their historical prices and the number of shares to be traded identifies events and issues a Buy Order on the day of the event and a Sell Order after 5 trading days. Eventually it returns a csv file to be passed to the market simulator. The first lines of the csv file are previed below (year, month, day, stock, order, shares).

orders

 

 

 

 

 

 

 

marketsim(investment, orders_file, out_file):  given the initial investment in dollars (50000 $ in our case), the csv files containing all the orders (the output of find_events()) and the file to save to the results of the simulation, this function places the order in chronologic order and updates automatically the value of the portfolio. It returns a csv file with the portfolio value in time, a plot comparing the portfolio performance against the market benchmark and print to screen a summary of the main financial metrics used to evaluate the portfolio.

main(): this function calls the previous two after getting and cleaning all the relevant data.

This is the output, as promised:

portfolio

Well, despite the huge crisis (-19% market return) our trading strategy brought us to gain a remarkable +19%! This was just an example but in any case very powerful to show the possibilities of event studies in finance.

 

 

 

by Francesco Pochetti

Community Detection in Social Networks

In this post I would like to share a very basic approach to Community Detection in Social Networks.

commu1

I came across this fascinating topic following the superb course on Mining Massive Datasets provided on Coursera by Stanford University. The specific field of finding overlapping clusters in graphs is introduced and deeply treated during the third week of classes (links to the PDF slides available for Part 1 and Part 2). I immediately found it extremely interesting and decided to play around by myself. There are at least two very strong reasons to directly check the potentials of these group of algorithms: first of all my complete lack of knowledge in the field and secondly the data I found a couple of weeks ago on a the Kaggle competition “Learning Social Circles in Networks“. The contest challenges participants to correctly infer Facebook users’ social communities. Such circles may be disjoint, overlap, or be hierarchically nested.  To do this, machine learners have access to:

  1. A list of the user’s friends
  2. Anonymized Facebook profiles of each of those friends
  3. A network of connections between those friends (their “ego network”)

This is exactly what I needed for my learning purposes!

Overview

The approach I propose below is structured in two main parts:

  1. Build the Graph of the ego-networks extracting nodes and edges from Kaggle data. I implemented this step in Python, generating the graphs with Networkx and saving the Adjiacency matrix of each of them to a separate file.
  2. Community Detection on top of the undirected graph. I performed this step in R, loading the graphs as Adjiacency matrices and then run a bunch of Clustering Algorithms available in R-igraph.

The use of both Python and R was not planned in the first place. I directly dived into the first of them supported by Neyworkx, but as soon as I started deepening the community detection algorithms I realized that R-igraph had a woderful ensemble of methods directly available. Note that igraph supports Python as well but apparently there are not the same features between the two libraries and the R one seems to be much fancier. I was a bit disappointed at the very beginning but in the end I grabbed the opportunity of learning a new package.

Enough words, I’d say. Let’s go for some code.

Building Ego-Networks

The Kaggle data (available here) is organized in 110 .egonet files (corresponding to 110 anonymized facebook users), each containing the network of his friends. A practicle example may help to clarify the data structure.

Let’s focus on the file 0.egonet, which contains all the information on user 0‘s network. Each row of the file is the list of the friends of the first user in the line who is directly part of the ego’s network. Below the first 4 lines are shown (for the purpose of clearness only the first 5 five connections in the line are reported).

0 has 1 as friend who has 146-189-229… as friends as well.

0 has 2 as friend who has 146-191-229… as friends as well.

0 has 3 as friend who has 185-80-61… as friends as well.

0 has 4 as friend who has 72-61-187… as friends as well.

Well I guess you get the point…

Below  I attach the Python code which access every egonet file and builds a list of nodes and edges to be fed to the Networkx constructor.  Just to be clear [0, 1, 2, 3, 4 …] are vertices of the graph while [(0-1), (1-146), (1-189), (1-229) …] are edges or connections. After a graph has been constructed its adjiacency matrix is computed and saved in a csv file.

The result of the provided code are 110 CSV files containing the adjiacency matrices of  each ego network graph. Let’s move to the real Clustering part.

Detecting Communities

First of all let’s plot a graph and see how it looks like before clustering detection. Below the R code to load the data from CSV file, build the network (we stick to the 0.egonet) and draw it.

ego1

 Time for some clustering.

R-igraph provides several powerful community detection algorithms. Each of them works in a different way and I highly encourage you to have a look at this very informative post on Stack Overflow describing all of them in detail. I decided to go for the whole bunch of algorithms as I wanted to somewhat compare their performances, which I did with the help of Modularity. This metrics measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.

Modularity is basically the fraction of the edges that fall within the given groups minus the expected such fraction if edges were distributed at random. So the higher the better.

Here you find the results on the user-0-network.

 The spinglass.community algorithm (based on a statistical physics approach) is the best one, with a modularity of 0.4649. Turns out that for this particular problem of community detection in small ego-social-networks the spinglass method beats the others in all the 110 egonet graphs.

Below you can find a nice visualization of the detected clusters, in R as well. By the way the plot at the top of the post is exactly the same as the following one visualized in a fancier way.

clusters1

by Francesco Pochetti