TL;DR
- SageMaker Studio Lab is completely free. No credit card required for sign up. Go wild and never be woken up again at night by that “Did I switch my p3.2xlarge EC2?” daunting thought.
- Native Jupyter Lab. For real.
- 15 GB of persistent storage. Not huge but hey, everything you do gets saved. Files and, most importantly, your dev environments. Install packages once. Enjoy `import from whatever` anytime you use SMSL again.
- When you login for the first time, your env is completely empty. 15 virgin GB of disk space. No pre-installed libraries. Just the python basics. No need to adapt your installation scripts to whatever annoying `pytorch` version AWS thought would make sense for you. You know what makes sense for you. Period. Heard that, Colab?
- 12 hours of CPU and 4 hours of Tesla T4 GPU time. Uninterrupted. No timeouts. Yes, you heard me right Colab. No timeouts. As soon as you open a project, that project will “live” undisturbed for either 12 or 4 hours. You lose connectivity. You logoff. Not a problem. Login again and take it from where you left. Very handy.
What is it?
AWS re:Invent 2021 has just come to a close and, as each year, a landslide of announcements together with it. Among the plethora of new and updated services introduced at the conference, one specifically tickled my curiosity: SageMaker Studio Lab (SMSL).
If I had to spoil it I’d say that SageMaker Studio Lab is AWS’ attempt to replicate Google Colab. There you go, I said it. With a few very important differences though, but we’ll get there. SMSL offers a Jupyter-Lab-based environment to run code completely free of charge. No credit card required. Actually, the service is not part of a user’s standard AWS account, and, as a matter of fact, you don’t need an AWS account at all to access SMSL. You sign up with an email, wait for AWS to approve your request and you are in.
What does it include? 12 hours of CPU or 4 hours of GPU compute, plus 15 GB of persistent storage. It is obvious that the audience AWS is trying to reach is the one composed of ML enthusiasts, hobbyists, and learners who need a free, ready-to-go platform to conduct experiments.
Free is the keyword here. To be fair, AWS offers a free tier when creating an account of the platform but, let’s be honest, there are always a bunch of hidden fees here and there. If you stick to the basic services, they are mostly negligible, but they are there. Also, the fact you are obliged to provide a credit card is not reassuring. You better have a solid billing alerting system in place to catch the dollar-ometer spinning in dangerous territories. I am sure it happened to every AWS customer, at least once, to have nightmares about ever-running EC2 instances (I had those)!
Understandably, this kind of scenario puts a lot of people off. If you are a student, or simply someone who’d like to run a couple of experiments every now and then, you just want something free and usable, something letting you hit the ground running. I hear you: that’s Google Colab! Or maybe not. As soon as you have taken it for a spin, you might think about sticking to SageMaker Studio Lab (spoiler alert, you should!).
First things first. Let’s see what it takes to get access to SMSL and what you’ll find when landing there.
Getting access
- Head to https://studiolab.sagemaker.aws/ and
Request free account
.
2. Fill in the form, verify your email and wait for AWS to approve your request.
3. Complete your registration by creating an account. Then verify your email (yes, again).
4. Sign in and…
5. … there you go!
What’s in there?
Once you are in, you can select either a CPU or a GPU-based project. I picked GPU and, as you can see, my 4-hour countdown started (it’s showing 3h 57m in the screenshot below).
The pop-up (hovering on Time remaining
) brings up a very interesting point: “The project runtime and all running processes will automatically stop after […] Your work will be saved“. Aka persistent storage. This is absolutely great and the first stark difference with Colab. In practice, it means that whatever you do gets saved to disk (within the 15 GB limit): on top of the obvious things (e.g. files), what this entails is that any library you install or any specific venv
or conda
environment you create doesn’t get wiped away if you log off or if your session times out. How useful is that!
I installed IceVision** and all its dependencies (because what’s the purpose of testing with `sklearn` which “just” works? Let’s go with a full deep learning object detection stack!). That’s a lengthy and (due to some nastier packages) potentially painful process. I did it once. Then, upon logging into SMSL again, imagine my sense of gratitude when from icevision.all import *
worked out of the box (on GPU!).
**Note: I experienced some hiccups when trying to install directly from a GitHub repo, which I do a lot for IceVision as I want the latest version on master. `%pip install git+git://github.com/<your repo>.git` didn’t work. The HTTPS version did though (thanks Antje Barth for the hint!), e.g. `%pip install git+https//github.com/<your repo>.git`. Also, note how AWS recommends using `%pip` instead of `!pip` to install packages. It’s apparently more robust when it comes to copying relevant folders in the right environment’s directories. For reference, here the commands I used to install IceVision in SMSL.
%pip install torch==1.10.0+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html --upgrade
%pip install mmcv-full==1.3.17 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html --upgrade -q
%pip install mmdet==2.17.0 --upgrade -q
%pip install git+https://github.com/airctic/icevision.git\#egg=icevision[all] --upgrade
%pip install git+https://github.com/airctic/icedata.git -U -q
%pip install git+https://github.com/airctic/yolov5-icevision.git -U -q
%pip install sahi -q
On top of that, when you login for the first time, your 15 GB of disk space is completely virgin. You won’t find any major libraries pre-installed. I am mostly thinking about deep learning frameworks such as `pytorch` or `tensorflow`. There is a school of thought praising pre-setup environments featuring those libraries. I don’t belong there though. More often than I would have liked, I was obliged to create brand new envs to be able to pick the exact versions of the packages I wanted. Mainly to avoid conflicts with other dependencies. Also, pre-installed envs eat up a lot of space. A real waste. In SMSL, those 15 GB of storage are truly yours.
Adding something super important I didn’t emphasize so far. We are actually working inside Jupyter Lab. I mean, the real Jupyter Lab, not Colab’s fake one. This means all the magic cells’ tricks, widgets, shortcuts, extensions you are used to, and probably made you fall in love with Jupyter in the first place, are there. AWS is not reinventing the wheel, making the transition to SMSL super smooth.
Last but not least, you are not subject to timeouts disrupting your work. Yes, you heard this one right. As soon as you hit the `Open project` button and the 12-CPU-or-4-GPU-hour clock starts ticking, the session keeps going without interruptions, until the countdown hits zero. No matter what you do. Wi-Fi acting up? Obliged to logoff? Laptop going on sleep? No problemo. SageMaker Studio Lab has your back. Login again and pick up your work where you left off. How handy is that?
One of the huge pluses (at least for myself) of working with Colab is that it makes collaboration trivial. Sharing a notebook is as easy as granting access to whoever you feel like and then sending the link over. It cannot get smoother than that. In SMSL instead, you have to rely on Jupyter Lab, which means sharing is not as straightforward. You have to go through GitHub. Once you do that, e.g. push your notebook to a repo, you can take advantage of the Studio Lab badge (very similar to Colab’s), as described here (search for GitHub integration
). Everyone clicking the badge can preview the notebook in Studio Lab. If the receiver has an SMSL account, he can select “Copy to project” to either import just the notebook, or clone the whole repo. Quite neat. Regardless, that’s more work than just copy-pasting a link to the notebook you are currently working on, without having to push to GitHub. Far from being a deal-breaker but somehow removing the GitHub dependency would be a very nice touch to add.
Conclusion
I haven’t worked with SageMaker Studio Lab extensively yet, but my feedback is very positive so far. The product checks all the boxes in its way to potentially become a real hit in the ML-enthusiasts community. It also covers the major drawbacks of its main competitor, Google Colab, which is quite a significant achievement. I am sure that, if I start working on it more seriously, I’ll stumble upon issues here and there, but, hey, that’s life. All in all, it’s a promising service, so check it out and keep learning!