One of the advantages of Serverless touted by cloud providers is seamless autoscaling. Ever since cloud computing gained traction, autoscaling is seen as the biggest advantage organizations can leverage for unexpected traffic. Autoscaling was seen as a major leap from traditional IT where capacity planning is critical and resource inefficiencies (and hence increased costs) are the norm. With the success of web applications and the unpredictability in traffic patterns, traditional approaches to scaling servers were not working. In this post, we will talk about how autoscaling works in the cloud and the advantage of Serverless in scaling up and down to meet the demands.
Autoscaling in Virtual Machines, Kubernetes and Serverless
With Cloud Computing and programmatic access to infrastructure, autoscaling became the norm. Users need not worry about whether their applications can meet a sudden spike in demand and the underlying infrastructure can be scaled up the meet the demand without any manual intervention. Once the demand cools down, the resources can be automatically brought down. In this case, the resources can be used efficiently with the associated cost savings compared to traditional approaches to scaling resources to meet demand.
The autoscaling in the cloud comes in different flavors depending on what type of cloud service you are using:
- Unlike traditional IT, where servers should be procured in advance to meet the scaling needs, the cloud gives programmatic access to scaling up just as the demand spikes and scaling down immediately after the demand goes down. With virtual machines on the cloud, users can use an autoscaling service that will automatically add additional virtual machines to meet the demand by taking into account CPU, memory, and network usage. While the autoscaling services can seamlessly bring up virtual machines and route traffic, it adds up significant operational overhead for developers to ensure that applications and the dependencies scale well in a scale-out architecture. Plus, virtual machines take a few minutes to boot up, and scaling is not instantaneous and some minimal capacity planning is critical for high performance. Plus, autoscaling with virtual machines might lead to resource inefficiencies if it is not managed well
- With containers gaining traction and Kubernetes becoming the de facto container orchestration tool, scaling became much more seamless because Kubernetes uses a declarative model and developers can simply define their end state in a YAML file. Kubernetes will take care of autoscaling, drastically reducing the operational overhead on developers. However, if Kubernetes is deployed on virtual machines in the cloud (say, on top of Amazon EC2), there is still an operational overhead in scaling the underlying nodes. Services like AWS Fargate take away these operational complexities of managing the virtual machines but developers are faced with YAML complexity to ensure that the Kubernetes environment meets the scaling needs of the application
- Serverless computing, especially the hosted offerings, takes the pain out of autoscaling and makes it seamless for developers to scale their application to meet the demand. Serverless offerings like AWS Lambda, Azure Functions, Catalyst, and others scale the infrastructure to meet the demand without any operational overhead or YAML complexity. Developers can just focus on the business logic and code and the Serverless compute offering will seamlessly handle the scaling needs. Since the compute costs are calculated based on invocations and it shuts down automatically after execution, autoscaling is so easy that anyone with zero operational knowledge can handle it. Moreover, the resource usage in the case of Serverless is more fine-grained and, therefore, the cost savings are much better without any resource waste
Autoscaling Patterns in Serverless
With Serverless compute, also known as Functions as a Service, there are some distinct autoscaling patterns that can be used to meet the demand. You could either invoke the function for every request and scale based on the requests. With certain Serverless offerings like AWS Lambda, it meets the cold start problem and you need to use a warm pool to avoid the delay due to cold start. But, keeping a warm pool costs more money in the case of hyperscale providers like AWS. The next-gen Serverless platforms like Catalyst, Nimbella, IBM Functions, and others ensure resource optimization in the backend to avoid the delays due to cold start.
Another approach to scaling with AWS Lambda and other Serverless offerings with the cold start problem is to allow a function invocation to handle multiple requests per function invocation. While this solves the colds start problem and even save some money in the invocation costs, it doesn’t completely avoid the cold start. Plus, it adds additional overhead on how applications are architected to follow this complex invocation pattern.
Serverless offers the most efficient (both in terms of resources and cost) way to autoscale to meet the demands. While using services like AWS, it is important for the developers to understand some of the constraints such as cold start and the complexity associated with concurrency. There are other platforms available that eliminate the cold start problem and give you a more straightforward way to autoscale your applications.