Training and Deploying a fully Dockerized License Plate Recognition app with IceVision, Amazon Textract and FastAPI -

Reading Time: 9 minutes

Note: Link to GitHub’s directory with code and models’ artifacts

Table of Contents

Context

In this post, I’ll walk you through building and deploying a fully Dockerized car license plate recognition app. Both training and deployment will happen inside a Docker container, proving how easy it is to achieve full end2end reproducibility.

What you will learn:

Train a license plate detector from scratch using IceVision in a Docker environment
Detect the license plate in an image, crop it and run OCR with Amazon Textract
Deploy the end2end application behind a FastAPI server inside a Docker container

Let’s get started.

The project

Here is a screenshot of the directory structure of our project. Starting from the top:

models: the license plate detection model we’ll train
sample_images: the car images we’ll use to test our pipeline
serve: everything we need to serve our application, from the Dockerfile to the model and the python code to invoke it.
train: the Dockerfile needed to set up the development environment
car_license_plates.ipynb: the jupyter notebook where the training happens

Development environment

I coded the application on an Amazon EC2 g4dn.xlarge instance, powered by NVIDIA T4 Tensor Core GPU. I SSH into it via VSCode and enjoy the full IDE experience running code on AWS (here a previous post of mine on how to do just that). As said earlier though, the cloud is not a prerequisite for this project. Docker is. What we want is to run our training pipeline inside a Docker container. Turns out that VSCode makes this whole “write and run code in a container” thing extremely easy. I wrote about it here, so head to the post and enjoy.

The Dockerfile we use as development env could not be simpler. We start off from a python base image and manually install the relevant DL libraries for training. The awscli and boto3 are for connecting to Amazon Textract.

FROM python:3.8

RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6  -y

RUN pip install torch==1.10.0+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html --upgrade
RUN pip install icevision[all]==0.12.0 -U
RUN pip install yolov5-icevision -U
RUN pip install awscli -U
RUN pip install boto3==1.21.32 -U

We then build the image and run a container on top of it, making sure to:

grant GPU access (--gpus all)
mount the relevant local directories to be able to access them from within the container (-v /home/ubuntu/:/root/)

We then attach a VSCode session to the running container and we automagically import IceVision (with CUDA support) from within the isolated Docker environment.

Training the license plate detection model

The idea is the following:

We train an object detection model to recognize car plates
Once the car plate is detected, we crop the bounding box from the input image and delegate the OCR part (e.g. extract the plate number) to Amazon Textract. There are plenty of very valid OCR engines out there. I tested Textract, Tesseract, and EasyOCR here. PaddleOCR is quite promising too.

As always, the first step to train a model is to get data. In our case, I opted for this Kaggle dataset: 433 car images with annotated bounding boxes’ plates. Here is how they look like.

We use the YoloV5 medium architecture from Ultralytics. Its tight integration with IceVision makes the training process a breeze. Fine-tuning for 20 epochs is largely sufficient, as you can see from the following ground-truths and predictions from the trained model (check out the relevant code in the notebook). The model is pretty spot on!

Run OCR with Amazon Textract

Ok, great. We have an accurate license plate detector. How do we read the plate number now?

With the following procedure

Run the model on an input image
Select the bounding box with the largest area. The assumption here is that we’ll have one main car in the foreground. There could be false detections or smaller cars with visible plates in the background, so we need a criterion to pick a plate in case multiple boxes are predicted (see the extract_biggest_bbox function in the notebook).
Extract the coordinates of the selected bounding box (extract_coords_from_bbox function in the notebook).
Crop the region of the input image containing the car plate and enhance it (turn to B&W and increase contrast) to facilitate the work of the OCR engine (crop_and_enhance function in the notebook).
Send the enhanced plate to Amazon Textract (read_text_from_roi function in the notebook). Here as well, we need a post-processing strategy similar to the one used for the object detector. Very rarely does a license plate contain only the plate number. It’s quite common to find all sorts of additional tags added by the car maker or local authorities for various reasons.
1. Look at the first example below: we are just interested in CCC 444, but, depending on how tight the bounding box is around the plate, S and TESLA.COM might also be included in the cropped region. Textract will return those strings too. We can address this issue the same way we solved the multiple-detected-plates problem. Textract returns height and width of the various regions of text. We can compute the area of each of those and pick the biggest. It’s likely it will contain the plate number.
2. The second example poses another problem. We want to extract N KR 992 from that one. There are some vertical symbols between N and K though, meaning that Textract will likely read N and KR 992 as two separate strings. How do we merge them? This is tricky. If we apply the strategy from point 5.1, e.g. selecting the biggest text box, we’ll always come up with one string. In this case KR 992. All the others, N included, would be discarded. Two potential solutions here:
  - the predicted car plate region is too loose. Our Yolo model should predict much tighter bounding boxes, just isolating the plate number. If that was the case, we wouldn’t have the problem raised in point 5.1, e.g. we would be confident that any string Textract reads is actually part of the plate number, allowing us to just concatenate them without further assumptions. For this to happen we need to go back to our dataset labels. Fix them (manually draw tighter boxes) and retrain a new object detector.
  - N and KR 992 are two separate text regions with approximately the same height. What we could do is: identify the region with the largest area (KR 992), then loop over the other regions (N,D at the bottom left corner, etc) and check if there are any strings of the same height as the main text box. If yes, grab those and concatenate them in the appropriate order (Textract returns coordinates too, so we can figure the spatial ordering out).
    - In the interest of time, we didn’t implement any of the 2 previous approaches, so our car plate reader will return N and KR 992 as separate strings in the notebook, whereas the deployed app will return the largest text box only (KR 992).

Let’s see the logic in action. First we run inference with the trained object detector on an input image (car3.jpeg). Note the green bounding box drawn around the car plate in the visualization.

Then we follow the procedure highlighted before, ending up with Amazon Textract predicting N and KR 992. Nice! It seems we have everything to move to the next stage. Making this pipeline accessible outside of the training notebook. Let’s head to deployment.

Deployment with FastAPI and Docker

I have already experimented with deploying a ML app with Flask and Docker here. This time, to learn something new, I decided to go with FastAPI for our license plate recognition model.

To deploy what we have, we need the following:

A python file defining the FastAPI app wrapping up the inference logic
A Dockerfile defining the serving python environment and spinning up the FastAPI app when the container is launched

FastAPI app

FastAPI ships with tons of goodies. Among my favorites is the automatic endpoint docs support, built on top of Swagger UI, with interactive exploration, calling, and testing of the API directly from the browser. Providing a pydantic‘s response_model to the endpoints enriches the experience even further. We’ll see how all of that works pretty much out of the box.

The app.py file is here. Let’s highlight the most important bits:

We load the object detector once when the Docker container is started. Then keep it in memory, ready to be invoked by the predict endpoint.
OcrResponse: the pydantic model defining how the app prediction looks like. A JSON with the selected output fields. Keep in mind that we return the string with the largest bounding box area as per Amazon Textract, which is why the output is a dictionary and not a list of dictionaries.
The health_check GET endpoint. When we ping the port the app is listening on, if everything is fine, we’ll receive "Status": "Application up and running." back.
The predict POST endpoint. This is where the inference logic happens, from loading the image we receive in the payload, to running the car plate object detection model and invoking Amazon Textract. We use fastapi.UploadFile as input to the endpoint. We then read the bytes contents of the file and convert to PIL.

Dockerfile

The serving Dockerfile is very similar to the training one. The differences are:

installing the serving libraries (fastapi, uvicorn and python-multipart for file uploads)
copying all the files under the /serve directory in the /app folder inside the container and setting /app as working directory
defining what we want the container to execute when it’s started, e.g. python app.py

FROM python:3.8

RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6  -y

RUN pip install torch==1.10.0+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html --upgrade
RUN pip install icevision[all]==0.12.0 -U
RUN pip install yolov5-icevision -U
RUN pip install awscli -U
RUN pip install boto3==1.21.32 -U

RUN pip install fastapi==0.78.0
RUN pip install uvicorn==0.17.6
RUN pip install python-multipart==0.0.5

COPY . /app
WORKDIR /app    

ENTRYPOINT ["python"]
CMD ["app.py"]

Putting everything together

We have the Dockerfile, the trained model, and the inference logic. How do we expose those behind an endpoint?

This could not be easier.

We build the Docker image and…
… we spin up the container making sure to map port 8000 on the hosting machine (EC2 instance where the VSCode session is running) with port 8000 on the container. Given we didn’t specify a port when running uvicorn, it defaults to port 8000 (see more settings here). The first screenshot👇shows the result.

Notice how VSCode informs us that it has forwarded port 8000 from the AWS EC2 instance (itself listening to the Docker container running the FastAPI app) to my local MacBook. Wait… does this mean I can invoke the car plates model endpoints from a local browser? Yes 🤯

Let’s check. I open up a Chrome window on my MacBook, navigate to localhost/8000, and there it is! The health check works. We get back a {"Status": "Application up and running."} message.

But that’s not all. As promised, we can also visualize the docs, powered by Swagger UI and nicely integrated into FastAPI. How so? It’s as easy as navigating to localhost/8000/docs which renders as👇. In the screenshot, I have already expanded the /predict endpoint. Notice how it provides an Example Value | Schema, matching the OcrResponse here.

The UI also offers a Try it out option. Let’s see if it works. We upload an image (car6.jpeg) and hit Execute. A couple of seconds later we get the correct response in the UI! 🎉

We should check with curl too, from within the EC2 instance running the app. We open up another terminal on VSCode (we have one already running the container and printing logs) and we type👇 which returns the screenshot below. Great! It works!

>>> cd KagglePlaygrounds/car_license_plates
>>> curl http://127.0.0.1:8000/predict -F "image=@sample_images/car6.jpeg"

Well, I guess we are done. I hope that was useful! Happy hacking everybody!

Twitter

Training and Deploying a fully Dockerized License Plate Recognition app with IceVision, Amazon Textract and FastAPI