For this article we have talked with Felipe Huici from Unikraft about optimizing Cloud deployment with unikernels, how to achieve millisecond scale to 0 and 1 with KraftCloud examples.
Disclaimer: No financial compensation has been received for this mutual post.
Why
It is a reality that traffic on the Internet is intermittent and unpredictably so: peaks of high traffic request rates are followed by periods of little to no traffic. Contrast this with your billing, which is often always on: it is precisely this dissonance that KraftCloud is looking to solve.
The high level of goal is clear: if there’s no traffic, would it be possible to have a mechanism that could detect this, and automatically scale your app to zero? And then detect when traffic comes back, and wake the app back up, quickly enough such that your users don’t notice (i.e., in the milliseconds of the Internet’s round-trip times?).
In traditional cloud providers, the answer is a categoric no: you’d have to go to FaaS offerings, and even then, those are not silver bullets, since they come with issues like cold starts, run durations, limitations on what apps and languages you can run (they are, after all, function-based, not container/VM-based), memory limits, etc. Outside of the FaaS world, start times can range from seconds to minutes, and so implementing a fast scale to zero mechanism is next to impossible.
But Don’t Cloud Providers Already Offer Scale to Zero?
To be clear, many cloud providers offer scale to zero, but the definition of what they offer and what this mechanism is is a bit loose.
To a first approximation, any platform that can transparently shut down your app when it’s idle and can then transparently wake it up when traffic for it arrives once again can claim to provide a scale to zero mechanism.
But let’s zoom out and think about the high-level purpose of scale to 0: to make sure that you don’t pay for idle, when you’re app isn’t receiving any traffic, and that when traffic does arrive your app’s users don’t notice that a scale to 0 mechanism is in place. In short, what we’d like is:
Scale to 0: The cloud platform should quickly, in milliseconds, detect when an app is idle and immediately put it to sleep so you don’t get overcharged. The reality of current offering is, however, that it can take seconds or even minutes for this detection to actually happen; you’re getting billed for your cloud provider’s overhead.
Scale to 1: When traffic for your app shows up again, the platform should detect it, wake up your app, and your app respond, all ideally within a small amount of the request’s RTT, so that your end user doesn’t notice your app was ever sleeping. Once again, current offerings fall short of this, often taking seconds to have your app come up, causing a degraded experience for your users.
Breaking out of the Status Quo
Fundamentally, what KraftCloud is aiming to achieve is a scale to zero mechanism (and subsequent scale from 0 to 1) that can operate in scales of milliseconds, to ensure that the mechanism is completely transparent to end users. This means that the instances that run your apps have to be able to cold start in milliseconds.
But before we dive into how this can be achieved, let me show that on KraftCloud it does already exist and can be easily enabled: a simple -0 flag to OSS kraft CLI tool is sufficient to tell the platform that an instance should scale-to-zero:
kraft cloud deploy -0 -p 443:8080 .
And that’s it! Now we have an NGINX instance running with scale to 0. But wait, why is its state set to running if it has not traffic and we passed the -0 flag? In fact, when deployed, the instance’s state is set to running, but since it has no traffic, it then immediately gets scaled to 0 (in this case after 500 milliseconds of traffic inactivity, but this is configurable). We can easily check that this instance has been scaled to 0 by using the kraft cloud instance ls command:
In this output you can see that the instance’s current state is set to standby, meaning that it’s now dormant, consuming no resources, and waiting for the next request to come in before waking back up. You can test that this is the case by putting a watch on this commands, e.g., watch --color -n 0.5 kraft cloud instance ls, pointing your browser to the instance’s URL, and seeing that the instance immediately replies, and that its state goes from standby to running and back to standby.
Also note that in this case we used a web server (NGINX) as an example, but you can use the scale-to-zero flag with any other app/lang from this examples repo.
How Does it Work?
To have a millisecond scale to zero mechanism, several components have to come together, all of which need to carry out their tasks fast. When an instance is scaled to zero (in the standby state) and a request for it arrives, the first thing that happens is that a custom, front-end load balancer buffers the request and signals the platform’s controller.
The controller is built in-house, to be reactive in milliseconds and scale to potentially thousands of instances. Upon receiving the notification, the controller identifies the instance the request corresponds to, and immediately asks Firecracker, the virtual machine monitoring, to wake the instance up.
The instance has to also, of course, wake up quickly. To do so, KraftCloud leverages extremely specialized virtual machines called unikernels based on the Linux Foundation OSS Unikraft project. These uniknernels provide the same strong, hardware-level isolation VMs do (after all, they are VMs), but can cold start in a few milliseconds - you can read the blog post about cold starts for more details.
Finally, when up, the unikernel notifies the controller, which then signals the load balancer to send the request to the (now running) instance so it can answer — all of this within a small mount of an Internet RTT so end users don’t notice.
Conclusion
In the cloud, you can have fast autoscale for functions, or slow autoscale for apps, but not both. In this blog post we shown that on KraftCloud millisecond autoscale for apps is not only possible, but also available now.
Don't take our word for it, take it out for a spin.