Skip to content

Detecting Pneumonia in chest radiographs with fast.ai

Reading Time: 5 minutes

I recently started looking into Part 2 of the fast.ai Deep Learning MOOC. I went through the first 2 lessons, diving directly into Object Detection and, without much surprise, I pleasantly could confirm the level of the material is top-notch. Jeremy Howard & team put together world class lectures.

My approach (to-date) towards this course has been to watch videos, reading through  notebooks and then try applying the learnings to a brand new data set (generally from Kaggle) challenging myself into NOT using the fast.ai library at all. Hence, basically, dig a level of abstraction below the surface and use either plain PyTorch or convert the code to a completely new deep learning framework (MXNet, Keras). I found this strategy to be very rewarding as it allowed me to explore as many tools as possible keeping the focus on the core technical and theoretical take-home messages from the course.

This time I opted for the easy way though.  As the MOOC is moving into more advanced territories and I need more time to figure the entire code out, I decided to test myself using the Jupyter notebooks made available after the in-class lectures and re-run the necessary parts on a different data set.  On top of it I sticked to the (easier) approach presented in the first lesson, consisting in detecting a single object in an image, rather than going all-in with the state-of-the-art (and more complex) SSD technique introduced in the second video. This does not mean I am giving up on it. Not at all. I just want to move smaller steps in this more advanced section.

To recap, this is what I did:

  1. Watched first and second lesson of the fast.ai Deep Learning MOOC Part 2.
  2. Borrowed the the data set from the RSNA Pneumonia Detection Challenge on Kaggle.
  3. Adapted the fast.ai code of the relevant Jupyter notebook to fit my data.

Here a quick summary of how this exercise unfolded.

  • Code: the Jupyter notebook containing the entire implementation is available here on Github.
  • Data: the data set is composed of 25,684 unique chest radiographs (1024 x 1024 grayscale). 5,659 (22%) of these, present lung opacities, suggesting pneumonia in the patient. The x-rays tagged as pneumonia can contain multiple opacities, hence several bounding-boxes (ope per opacity) might be provided. To simplify my task (and follow fast.ai-Part-2-MOOC-first-lesson approach) I extracted the biggest rectangle per image, reducing the standard multi-object-detection problem into a single-object-detection one. Here a random sample of the provided radiographs.

  • Image pre-processing: to speed up my development cycle I randomly selected roughly half of the images (13,039) and saved them in PNG. The reason behind dropping the original DICOM format is mainly due to fast.ai being incompatible with it. Instead of spending time hacking the library, I thought it would be more productive to adapt the data to it. The radiographs ship in (1024, 1024) size. For memory-related constraints and Resnet34-compatibility issues they are respectively downsized to (512, 512) and artificially converted to RGB, i.e. triplicating the unique original grayscale channel. The pneumonia-bounding-box gets scaled as well.
  • Data augmentation: fast.ai provides very sophisticated data augmentation pipelines for images. You might have guessed that augmenting a picture for an object detection purpose is not as easy as in the standard classification context. The additional (non trivial) complexity is added by the bounding-boxes which need to be adjusted according to the transformation the background is undergoing. The fast.ai library makes this whole process incredibly easy. It basically suffices to add the (pretty self-explanatory) tfm_y=TfmType.COORD argument to the data set transformations and you are all set. The result on an image are below.

  • Batch size: I went for a batch size of 8. Such low number is driven by my initial experiments, in which I had used the unscaled, original (1024, 1024) images. Anything bigger than 8 would cause the GPU to go out of memory. I eventually had to downsize the x-rays anyway, so, technically the OOM issue would not be as bad as the beginning. I sticked to 8 though, mainly for training speed reasons.
  • Architecture: Resnet34 topped by a couple of ad-hoc “regression-classification” layers (let’s call it the head). The output consists of 6 numbers per image: 4 x-y coordinates of the top-left/bottom-right corners of the bounding-box + 2 probabilities (not pneumonia/pneumonia). It is important to focus on the size of the input to the head. We need to define it manually and it is far from being a trivial guess as it depends on where we cut the lower CNN. fast.ai’s behavior, when dealing with custom heads, is to truncate the pre-trained network at its last conv layer (BasicBlock-122 in the case of Resnet34). Therefore the input to the head needs to be equal to the output of this layer. How do we get that shape, considering that it depends on the input image size too? fast.ai provides a very handy way of doing this. It suffices to call summary on the learn object. This method runs a forward pass through the network, spitting out the input-output shapes at each layer. Pretty neat! Below you can find
    • a screenshot from my Jupiter notebook showing the summary trick and how to identify the correct (flattened) input shape for the head
    • how the head topping Resnet34 looks like

  • Loss function: This is probably the real core of the entire exercise. How do I tell the optimizer it is going in the right direction while training? Specifically, I want the minimization to proceed on both the regression (bounding-box) and classification (not pneumonia/pneumonia) sides, at the same time. PyTorch custom losses to the rescue here. We just sum up l1_loss and cross_entropy (appropriately scaled). Pretty neat. Here the result.

  • Training: this step is made trivial by the fast.ai library. The approach for a Computer Vision task is always the same. Fine-tune a pre-trained Convolutional Neural Network (i.e. Transfer Learning). The steps are:
    1. Freeze the entire CNN and train just the model sitting on top.
    2. Reduce the learning rate.
    3. Unfreeze the last 2-3 convolutional layers and keep training.
    4. Reduce the learning rate.
    5. Unfreeze the whole CNN and keep training.
    6. NOTE: all the previous magic happens thanks to some additional proprietary fast.ai tricks such as
      1. adapting the learning rate (LR) during  training (Cosine Annealing, Cyclical LR etc)
      2. using different LR for different “groups” of layers. The top ones get the highest as their weights need to be optimized the most. The more we descend into the core convolutional layers, though, i.e. those trained on ImageNet, the more we need to decrease the LR as the weights don’t need to move a lot around the minimum.

I have to admit. I am really starting to enjoy this Deep Learning hype 🙂

Discover more from

Subscribe now to keep reading and get access to the full archive.

Continue reading