Here you go the Kaggle overview of the competition I played with in this last week or so:
“Tangles of kudzu overwhelm trees in Georgia while cane toads threaten habitats in over a dozen countries worldwide. These are just two invasive species of many which can have damaging effects on the environment, the economy, and even human health. Despite widespread impact, efforts to track the location and spread of invasive species are so costly that they’re difficult to undertake at scale.
Currently, ecosystem and plant distribution monitoring depends on expert knowledge. Trained scientists visit designated areas and take note of the species inhabiting them. Using such a highly qualified workforce is expensive, time inefficient, and insufficient since humans cannot cover large areas when sampling.
Because scientists cannot sample a large quantity of areas, some machine learning algorithms are used in order to predict the presence or absence of invasive species in areas that have not been sampled. The accuracy of this approach is far from optimal, but still contributes to approaches to solving ecological problems.
In this playground competition, Kagglers are challenged to develop algorithms to more accurately identify whether images of forests and foliage contain invasive hydrangea or not. Techniques from computer vision alongside other current technologies like aerial imaging can make invasive species monitoring cheaper, faster, and more reliable.”
In my last post I promised I would dive into TensorFlow. Turns out I did not keep my word. I could not resist the temptation to have some more fun with Keras. It is just too easy to build models with it!
So, this time, it was the turn of figuring out whether an image contained invasive hydrangea or not. For the ones who have no clue of what that is (as I was), here a nice picture of our new vegetal friend. Nice flowers, right?
Ok, let’s dive directly into the model and provide some details about the pipeline.
Deep Learning approach
In terms of the modeling pipeline, here what I did.
Balancing positive and negative classes: The original dataset was split into Invasive/Not Invasive images with a ratio of 60%-40%. This is not an unbalanced problem, still I wanted to make sure the classes were equally distributed, hence I artificially created new Not Invasive pictures by flipping/rotating the original ones.
Input images size: The original pictures were 1154px x 866px. This is too big to fit into the GPU memory of my paperspace instance, considering that a batch would contain more than 1 image for sure. Kagglers tried several sizes averaging results of various inputs. I decided to stick to 300px x 300px. This is probably not optimal as I was sure to lose some relevant information out of originals, but it did the job, especially considering I wanted to give priority to fast prototyping rather than high accuracy.
Batch Size: 8 randomly shuffled pics per batch.
Cross Validation: I went for a standard 5 fold CV. Each model was trained and evaluated 5 times and then test set predictions were averaged across the 5 folds.
Deep Learning Models: I opted for fine tuning imagenet-pre-trained Inception V3 and ResNet50 architectures, with the latter giving slightly worse results than the former (both above 97% accuracy). Here the summary charts
Attention (Saliency) Maps: this is something I had been waiting to test fora while. The concept is simple but really powerful. Don’t want to reinvent the wheel, hence, as I ended up using the (AWESOME) Keras Visualization Toolkit, let me “steal” a blurb from the official package documentation: “Suppose that all the training images of bird class contains a tree with leaves. How do we know whether the CNN is using bird-related pixels, as opposed to some other features such as the tree or leaves in the image? This actually happens more often than you think and you should be especially suspicious if you have a small training set.
Saliency maps was first introduced in the paper: Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
The idea behind saliency is pretty simple in hindsight. We compute the gradient of output category with respect to input image.
This should tell us how the output value changes with respect to a small change in inputs. We can use these gradients to highlight input regions that cause the most change in the output. Intuitively this should highlight salient image regions that most contribute towards the output.”
This is cool indeed! It is interesting to check what Deep Nets focus their attention on when training. For this specific use case I actually ended up following the approach suggested in one of the package example notebooks. In a nutshell, as proposed in this paper, grad-CAM may work better than Saliency as the layer it uses to compute gradients is not the final Dense one, but the last Conv (or Pooling). This ensures the gradients keep more “spatial details”, which are inevitably lost in a Dense layer.
Check out my NbViewer notebook for a nice visualization!by Francesco Pochetti