Fast Neural Style Transfer: deploying PyTorch models to AWS Lambda -

Reading Time: 7 minutes

Links to code: repo and Jupyter Notebook. Lambda function.
Disclaimer: this is a less technical post compared to previous ones. I see it mostly as a wrap-up of a learning experience rather than a thorough scientific walkthrough.

Table of Contents

Why Fast Style Transfer again?

I have discussed Neural Style Transfer (NST) more than a couple of times in the last year or so. From the theory to the implementations in fast.ai, PyTorch and Tensorflow, to model serving applied to both images and PNGs on VisualNeurons.com. The most recent sets of experiments focused on the fast version of this technology, involving training a neural network to learn a style upfront and then apply it on new images, as opposed to running an optimization loop from scratch each time. The results are discussed here (deep learning part) and here (production part). Admittedly, the solution was far from being optimal, for at least two reasons:

I had deployed my models in SageMaker, which was not bad per se. It was just very expensive. At least in the context of my project, serving the purpose of a fun, side-learning experience. To be more precise, I deployed 3 endpoints (Van Gogh, Picasso, and Kandinsky) on the cheapest available CPU-based EC2s (ml.t2.medium). Those are machines that are basically up and running 24/7 regardless of the model being invoked or not. Each one adds up to 50 USD/month, which means getting billed for 150 USD/month for a toy project nobody is actually using. In the interest of keeping the budget under control, I had to shut the endpoints down within a couple of weeks after publishing the blog post. This was a real shame as I see VisualNeurons.com as a showcase of my achievements. This means those proves-of-concept really need work, all the time. Even if the website gets only a hit a month.
The end result was not the most visually appealing, no matter how much effort I put into fine-tuning the content and style weights. Not awful obviously, but not even the best around. Also, the loss weirdly plateaued within 1.5k training images. Around 400 batches, as I was using a batch size of 4. If I tried to force the network to keep training, the end results significantly degraded. This was awkward, as the original Johnson et al paper employed the entire COCO dataset (80k images) for 2 epochs. I reckoned I had a bug in my code, as the behavior I experienced was completely unexpected.

Point 2 is definitely the trickiest to address as it involves debugging a codebase that is not really bugged in the first place. Not in the classical software development kind of way, at least, e.g. no error being thrown anywhere. Still, something was off. Spoiler: I could not figure it out. I was eventually obliged to fork the NST scripts from the official PyTorch examples repo, drop my code and re-start from there. More on this below.

Point 1 turned out to be a relatively simple fix. I needed an easy deployment solution equipped with fast inference and for which I would get charged only when the model was actually invoked. This is basically the description of AWS Lambda, which is exactly the service I ended up opting for.

How the web app looks like

Nothing changed in terms of user-interface from my previous deployment. I just updated the styles to play around with. In terms of performance, Sagemaker is quite faster than Lambda. Only on its first invocation, though. I guess this is due to the fact that a Sagemaker endpoint is nothing else than an EC2 instance up-and-running 24/7, no code-warm-up required, e.g. provisioning container and loading libraries. Instead, those events need to be taken care of, within a serverless call. Again, as soon as Lambda gets “warm”, execution is almost instantaneous. Check it out.

Deploying PyTorch models to AWS Lambda

AWS diagram showing how the current Fast NST architecture works

This felt almost like a breeze. I had already fiddled around (and failed) with deploying ML models to AWS Lambda in this recent piece of work about serving a pre-trained NLP model. I covered all the needful in the “Why not going entirely serverless” section of the post, so I will avoid repeating myself and will only summarize the key learnings.

The reason why the Lambda approach had failed in the NLP case was that there was not enough storage space on the `tmp/` local directory. 512Mb in total. The required Python libraries plus the weights of the pre-trained network easily added up to more than that.

This is not the case in the Fast NST context though. As shown in the below screenshot from the S3 bucket containing the NST models, all of them weigh only 6.4Mb, quite far from the ~300Mb of the smallest Huggingface GPT-2.

This makes deploying the solution to Lambda entirely feasible. I, once again, turned towards the publicly available pytorch layer Matt McClean gifted the community. Adding this stack to Lambda involves 2 steps: 1) grabbing the layer’s ARN (`arn:aws:lambda:<YOUR REGION>:934676248949:layer:pytorchv1-py36:2`) and adding it to the function as shown in the following three screenshots.

2) Putting lines 3-6 at the top of your Lambda function. This is needed to unzip the layer and make the libraries available for import.

The Layer part is definitely the hardest. After that, what is left is to implement the inference logic, meaning:

Read the image from the `event[‘body’]` and decode it from base64 format
Pre-process the image to get it ready for the model (convert to RGB, scale it to 0-1 range and normalize it using ImageNet’s stats)
Download the model’s weights from S3 and load them into the appropriate neural architecture
Run the image through the network
Encode the result in base64 format and send it back to the frontend
Everything is wrapped in the below lambda_handler. Full code here.

class="wp-block-syntaxhighlighter-code"># OMITTING THE IMPORTS AND THE CLASSES/FUNCTIONS DECLARATIONS
# FIND THE COMPLETE IMPLEMENTATION HERE
# https://github.com/gabrielelanaro/ml-prototypes/blob/master/prototypes/styletransfer/fast_image_no_sage_lambda.py
def lambda_handler(event, context):
    body = json.loads(event['body'])
    style = body["style"][:-4]
    model = styles_map[style]
    img = base64.b64decode(body['data'])
    img = Image.open(BytesIO(img)).convert('RGB')
    img = content_transform(img)
    img = img.unsqueeze(0).to(device)
    
    s3.download_file("visualneurons.com-fast-nst", model, f"/tmp/{model}")
    style_model = TransformerNet()
    state_dict = torch.load(f"/tmp/{model}")
    
    for k in list(state_dict.keys()):
        if re.search(r'in\d+\.running_(mean|var)$', k):
            del state_dict[k]
    style_model.load_state_dict(state_dict)
    style_model.to(device)
    
    with torch.no_grad():
        output = style_model(img)
    img = output[0].clone().clamp(0, 255).numpy()
    img = img.transpose(1, 2, 0).astype("uint8")
    img = Image.fromarray(img)
    fd = BytesIO()
    img.save(fd, format="PNG")
    
    return format_response(base64.b64encode(fd.getvalue()).decode(), 200)

Lambda warm-up (or cold start)

This is the only minor issue (if we can even call it that way) of dealing with a Lambda deployment. The first time you run it, AWS gets at work under the scenes to provision the container with the selected runtime and make the relevant python packages available. This takes time, causing the first execution to be significantly slower than expected. After this run though, AWS keeps the environment up and running for a specific period, which means that any successive execution will likely be quite fast. You can notice this behavior playing with my Fast NST web app. Be patient during the very first interaction, then experience styling in the blink of the eye.

To dive further, here a great writeup around Lambda’s cold start.

What went wrong in the previous iteration

As mentioned in the introductory paragraph, the second reason behind me coming back to Fast NST was my visual dissatisfaction with the results I got on my first attempt, together with the unexpected behavior my models were showing during the training phase (loss plateauing too quickly), which made me reckon I had a bug somewhere.

I actually spent quite some time (~2 weeks) trying to figure this out. First, I literally checked my code line by line (more or less…) without finding any obvious screwup. For reference, here the notebook and helper functions/classes from the playground `fra-fst` branch of the repo. Obviously this does not guarantee an image pre/post-processing step or specific parts of the training loop or whatever else might contain subtle mistakes. Truth is, I could not find them. I also ported to plain PyTorch fast.ai’s learning rate finder and fit-one-cycle policy. Those are two truly amazing goodies in the bag of tricks the fast.ai library offers. The former allows to stop guessing an appropriate learning rate and actually pick a reasonable one from the beginning (find_lr function in this notebook). The latter implements a smart learning rate and momentum scheduler during the learning phase (calc_lr_mom_schedule in this notebook).

The learning rate (LR) finder from fast.ai computes the loss for increasing values of the LR. The above is what we get when we plot one vs the other. What we want is the LR at the point of maximum steepness of the loss curve. 3e-4 is a good guess here. This guarantees we pick the LR for which the loss is decreasing the fastest, allowing quicker training. Read more here.

The fit-one-cycle policy consists in starting the training phase with a very low LR, linearly increasing to the optimal rate obtained by the LR finder, and then annealing it to almost zero in a cosine fashion. At the same time, momentum (MOM) follows an opposite schedule, based on the logic that higher LRs, e.g. bigger jumps on the loss landscape, should not be constrained by high MOMs. Hence high LR-low MOM, and vice-versa. Read more here.

None of them actually moved the needle. Again, who knows if that was the right place to be looking into at all.

Eventually, to avoid getting stuck forever on my code base, I ended up forking the NST scripts from the official PyTorch examples repo and starting over from there. I brought my own artworks, tuned the style and content weights and successfully trained new models from scratch, as showed here.

Overall, this whole thing reminded me (once again) of the fact that Machine Learning projects are incredibly hard to deliver. They always encompass an unexpectedly large amount of details to get right. Not nailing all of them translates into degraded results. Skipping only a few might be the recipe for disaster. This brings me to the next, maybe obvious, point: the importance of working in a team. Whatever I miss, my mates catch. I would have paid to have someone look at my code and brainstorm potential culprits of my issues! Anyway, if it was easy, it wouldn’t be fun, right?

Twitter

Fast Neural Style Transfer: deploying PyTorch models to AWS Lambda

Why Fast Style Transfer again?

How the web app looks like

Deploying PyTorch models to AWS Lambda

Lambda warm-up (or cold start)

What went wrong in the previous iteration

Discover more from