Are there any invasive plant species? (1/28/2018) - Here you go the Kaggle overview of the competition I played with in this last week or so: “Tangles of kudzu overwhelm trees in Georgia while cane toads threaten habitats in over a dozen countries worldwide. These are just two invasive species of many which can have damaging effects on the environment, the economy, and … Continue reading Are there any invasive plant species?

Given a picture, would you be able to identify which camera took it? (1/22/2018) - [Link to Jupyter Notebook] “Finding footage of a crime caught on tape is an investigator’s dream. But even with crystal clear, damning evidence, one critical question always remains-is the footage real? Today, one way to help authenticate footage is to identify the camera that the image was taken with. Forgeries often require splicing together content … Continue reading Given a picture, would you be able to identify which camera took it?

Ship or iceberg, can you decide from space? (12/31/2017) - Statoil/C-CORE Iceberg Classifier Challenge Quoting the Kaggle website: “Drifting icebergs present threats to navigation and activities in areas such as offshore of the East Coast of Canada. Currently, many institutions and companies use aerial reconnaissance and shore-based support to monitor environmental conditions and assess risks from icebergs. However, in remote areas with particularly harsh weather, … Continue reading Ship or iceberg, can you decide from space?

Recommendation Engines (4/5/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my pythonic playground about Recommendation Engines. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning … Continue reading Recommendation Engines

Principal Component Analysis (3/30/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my pythonic playground about PCA. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning course … Continue reading Principal Component Analysis

Bridging Recommender Systems and Dimensionality Reduction (3/26/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines At a first sight Recommender Systems (RS) and Dimensionality Reduction (DR) have pretty much nothing in common. They solve different problems in different domains of … Continue reading Bridging Recommender Systems and Dimensionality Reduction

K-means Clustering (3/14/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my pythonic playground about K-means Clustering. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning … Continue reading K-means Clustering

Support Vector Machines (3/5/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my pythonic playground about Support Vector Machines. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine … Continue reading Support Vector Machines

The Bias v.s. Variance Tradeoff (2/25/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my pythonic playground about Bias v.s Variance in Machine Learning. The code below was originally written in matlab for the programming assignments of … Continue reading The Bias v.s. Variance Tradeoff

Pythonic Neural Networks (2/20/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my implementation of Neural Networks in numpy. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning … Continue reading Pythonic Neural Networks

Pythonic Logistic Regression (1/31/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my implementation of Logistic Regression in numpy. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine Learning … Continue reading Pythonic Logistic Regression

Pythonic Linear Regression (1/23/2017) - All posts in the series: Linear Regression Logistic Regression Neural Networks The Bias v.s. Variance Tradeoff Support Vector Machines K-means Clustering Dimensionality Reduction and Recommender Systems Principal Component Analysis Recommendation Engines Here my implementation of Linear Regression in numpy. The code below was originally written in matlab for the programming assignments of Andrew Ng’s Machine … Continue reading Pythonic Linear Regression

Airbnb & Machine Learning: Where will a new guest book their first travel experience? (2/2/2016) - “Instead of waking to overlooked “Do not disturb” signs, Airbnb travelers find themselves rising with the birds in a whimsical treehouse, having their morning coffee on the deck of a houseboat, or cooking a shared regional breakfast with their hosts. New users on Airbnb can book a place to stay in 34,000+ cities across 190+ countries. By accurately predicting where a … Continue reading Airbnb & Machine Learning: Where will a new guest book their first travel experience?

Predict physical and chemical properties of soil using spectral measurements (12/26/2015) - Check out on NBViewer the work I’ve done with Pandas, Scikit-Learn, Matplotlib wrapped up in IPython about predicting physical and chemical properties of African soil using spectral measurements on Kaggle. The code and the files are also available on Github. Here the challenge: “Advances in rapid, low cost analysis of soil samples using infrared spectroscopy, … Continue reading Predict physical and chemical properties of soil using spectral measurements

PiPad – How to build a tablet with a Raspberry Pi (12/22/2015) - The Project When I stepped into the Raspberry Pi for the first time on the web I immediately started thinking about a cool application of this amazing mini computer. There are actually a ton of very interesting projects it is possible to dive into using the Pi, ranging from a relatively simple web server to … Continue reading PiPad – How to build a tablet with a Raspberry Pi

How to build a Recipe Finder Web Application with Ruby on Rails (10/19/2015) - The purpose of this post is to walk you through the creation of a basic but fully functional Ruby on Rails Web Application. At the end of the tutorial we will have a live Recipe Finder App like the one I generated myself and which you can find and explore here. Specifically, the process I will … Continue reading How to build a Recipe Finder Web Application with Ruby on Rails

Stock Trading Algorithm on top of Market Event Study (10/26/2014) - This post is the result of the first six weeks of class from the Computational Investing course I’m currently following on Coursera. The course is an introduction to Portfolio Management and Optimization in Python and lays the foundations for the second part of the track which wil deal with Machine Learning for Trading. Let’s move quickly to the … Continue reading Stock Trading Algorithm on top of Market Event Study

Community Detection in Social Networks (10/19/2014) - In this post I would like to share a very basic approach to Community Detection in Social Networks. I came across this fascinating topic following the superb course on Mining Massive Datasets provided on Coursera by Stanford University. The specific field of finding overlapping clusters in graphs is introduced and deeply treated during the third … Continue reading Community Detection in Social Networks

Image Text Recognition in Python (10/14/2014) - In this post I’m going to summarize the work I’ve done on Text Recognition in Natural Scenes as part of my second portfolio project at Data Science Retreat. The importance of image processing has increased a lot during the last years. Especially with the growing market of smart phones people has started producing a huge … Continue reading Image Text Recognition in Python

Part VI – Trading Algorithm and Portfolio Performance (9/20/2014) - Index Introduction and Discussion of the Problem Feature Generation Classification Algorithms Feature/Model Selection Results on Test Set Trading Algorithm and Portfolio Performance Now that we have a prediction we can also develop a trading strategy and test it against the real markets. Trading Strategy The idea is the following. I built a forecasting algorithm and … Continue reading Part VI – Trading Algorithm and Portfolio Performance

Part V – Results on Test Set (9/20/2014) - Index Introduction and Discussion of the Problem Feature Generation Classification Algorithms Feature/Model Selection Results on Test Set Trading Algorithm and Portfolio Performance We closed the previous post with the results of Cross Validation. Eventually we decided that our best combinations is the following: Algorithm: Random Forests (n_estimators = 100) Features: n = 9 / delta … Continue reading Part V – Results on Test Set

Part IV – Model/Feature Selection (9/20/2014) - Index Introduction and Discussion of the Problem Feature Generation Classification Algorithms Feature/Model Selection Results on Test Set Trading Algorithm and Portfolio Performance In the last post I introduced the classification algorithms tested for the project’s purposes. The function in charge of data preparation and splitting has also been presented. Basically we are now ready for … Continue reading Part IV – Model/Feature Selection

Part III – Scikit Classification Algorithms (9/20/2014) - Index Introduction and Discussion of the Problem Feature Generation Classification Algorithms Feature/Model Selection Results on Test Set Trading Algorithm and Portfolio Performance So finally, as a result of last post, we have a dataframe to play with. Before diving into model and feature selection I would like to make a little overview of the Classification … Continue reading Part III – Scikit Classification Algorithms

Part II – Feature Generation (9/20/2014) - Index Introduction and Discussion of the Problem Feature Generation Classification Algorithms Feature/Model Selection Results on Test Set Trading Algorithm and Portfolio Performance In the last post I went through the project’s introduction and the data collection, together with a little bit of feature analysis. In this article I’ll deal with additional feature generation and … Continue reading Part II – Feature Generation

Part I – Stock Market Prediction in Python Intro (9/20/2014) - This is the first of a series of posts summarizing the work I’ve done on Stock Market Prediction as part of my portfolio project at Data Science Retreat. The scope of this post is to get an overview of the whole work, specifically walking through the foundations and core ideas. First of all I provide … Continue reading Part I – Stock Market Prediction in Python Intro

Pythonic Cross Validation on Time Series (9/16/2014) - Working with time series has always represented a serious issue. The fact that the data is naturally ordered denies the possibility to apply the common Machine Learning Methods which by default tend to shuffle the entries losing the time information. Dealing with Stocks Market Prediction I had to face this kind of challenge which, despite … Continue reading Pythonic Cross Validation on Time Series

Financial Sentiment Analysis Part II – Sentiment Extraction (9/15/2014) - As promised I’ll devote this second post to walk trough the remaining part of the Financial Sentiment Anaysis pipeline. Just to recap, the steps we wanted to clarify are the following: Scrape the historical archives of a web financial blog in order to get for each post the following information: date, keywords, text. Save all … Continue reading Financial Sentiment Analysis Part II – Sentiment Extraction

Financial Sentiment Analysis Part I – Web Scraping (8/31/2014) - It’s been a while without Mr Why’s posts! I apologize but quite a lot has happened in the meantime. I quit my job in Italy and I moved to Berlin to attend a three-month course in Data Analysis and Machine Learning. Amazing experience which started at the beginning of August and will end on the … Continue reading Financial Sentiment Analysis Part I – Web Scraping

Is there a statistically significant correlation between religious faith and total family income in the US? (4/5/2014) - Introduction and Aim of the Study The main target of this study (which is available here in pdf) is to investigate any possible relation between religion and financial income in the US in the last decade. More precisely I decided to focus on Protestants, Catholics and the ones who claimed to belong to no religious … Continue reading Is there a statistically significant correlation between religious faith and total family income in the US?