Serverless & Big Data: Is It A Good Fit?

Serverless gained traction among the developers because it allowed them to easily deploy small chunks of code without having to worry about bringing up servers. The early success due to the seamless deployment of event driven functions as well as the constraints imposed by early Serverless platforms like AWS Lambda on developers have given an impression that these platforms are suitable only for limited set of use cases. Since then, Serverless platforms evolved to support stateful applications. A question in the minds of users, especially enterprise developers, is whether these platforms can support big data workloads?

Data is overwhelming every organization, big or small. While large organizations can afford to spend resources building big data processing pipelines using Hadoop or Spark, smaller organizations were always at an disadvantage. This was glaring with data science and machine learning gaining traction, reshaping the business landscape. Can smaller organizations take advantage of big data with the help of Serverless? Can enterprises save money using Serverless to process large data volumes? After all, the Serverless model which can be triggered using an event is an ideal candidate for processing big data.

For many use cases, one can use Serverless platform like AWS Lambda, Catalyst, Google Functions, Azure Functions, etc. to do real time processing or batch processing on large volumes of data. The data flow from a data source or data firehose can trigger the Serverless function to process the data. In the past, big data processing required large number of powerful servers to run MapReduce or other such processing. This was pretty expensive with certain architectures costing millions of dollars. Only large enterprises could afford such big data processing pipelines using Hadoop or Spark. With cloud computing, virtual machines replace the physical servers and since these virtual machines can be brought up on-demand, the costs were dramatically reduced. However, it was still costing tens to hundreds of thousands of dollars, well beyond the reach of many small businesses. By tapping into Serverless Functions, it is easy to set up big data processing for a fraction of the cost.

Serverless functions has completely democratized big data, empowering organizations of any size take advantage of data. Whether it is a simple image processing application or a complex MapReduce job, Serverless Functions can help organizations build powerful big data pipelines.

The advantage of Serverless for big data are:

  • The cost savings is a big factor. There are no servers or virtual machines to manage. No operational costs. Serverless functions can be invoked when the data is ready and paid by the invocations. For example, offerings like Catalyst even offer more simplified pricing model compared to large cloud providers
  • Seamless scaling is another big advantage. As and when data flows in, the Serverless Functions can be triggered to meet the data volume. The concurrent scaling available with Serverless Functions makes scaling seamless without any bottlenecks
  • Developers can focus on code without having to worry about the availability and setting up of servers/virtual machines. Developers have little DevOps overhead using the data pipelines
  • Better security because the physical and network security is handled by the cloud provider. Developers have to focus only on the security needs around code

Serverless Functions are well suited to handle many big data use cases. The drastic reduction in costs democratizes big data and allows any organization to build data driven applications.

Share this post

Share your thoughts!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.