Lambda pipelines for serverless data processing

You get tens of thousands of events per hour. How do you process that?

You've got a ton of data. What do you do?

Users send hundreds of messages per minute. Now what?

You could learn Elixir and Erlang – purpose built languages for message processing used in networking. But is that where you want your career to go?

You could try Kafka or Hadoop. Tools designed for big data, used by large organizations for mind-boggling amounts of data. Are you ready for that?

Elixir, Erlang, Kafka, Hadoop are wonderful tools, if you know how to use them. But there's a significant learning curve and devops work to keep them running.

You have to maintain servers, write code in obscure languages, and deal with problems we're trying to avoid.

Serverless data processing

Instead, you can leverage existing skills to build a data processing pipeline.

I've used this approach to process millions of daily events with barely a 0.0007% loss of data. A rate of 7 events lost per 1,000,000.¹

We used it to gather business and engineering analytics. A distributed console.log that writes to a central database. That's how I know you should never build a distributed logging system unless it's your core business 😉

High level architecture for distributed logging and tracing

The system accepts batches of events, adds info about user and server state, then saves each event for easy retrieval.

It was so convenient, we even used it for tracing and debugging in production. Pepper your code with console.log, wait for an error, see what happened.

A similar system can process almost anything.

Great for problems you can split into independent tasks like prepping data. Less great for large inter-dependent operations like machine learning.

Architectures for serverless data processing

Serverless data processing works like .map and .reduce at scale. Inspired by Google's infamous MapReduce programming model and used by big data processing frameworks.

Work happens in 3 steps:

Accept chunks of data
Map over your data
Reduce into output format

Say you're building an adder: multiply every number by 2 then sum.

Dive into modern backend. Understand any backend

Serverless Handbook shows you how with 360 pages for people like you getting into backend programming.

With digital + paperback content Serverless Handbook has been more than 1 year in development. Lessons learned from 14 years of building production grade websites and webapps.

With Serverless Handbook, Swiz teaches the truths of distributed systems – things will fail – but he also gives you insight on how to architect projects using reliability and resilience perspectives so you can monitor and recover.

~ Thai Wood, author of Resilience Roundup

If you want to understand backends, grok serverless, or just get a feel for modern backend development, this is the book for you.

Serverless Handbook full of color illustrations, code you can try, and insights you can learn. But it's not a cookbook and it's not a tutorial.

Yes, there's a couple tutorials to get you started, to show you how it fits together, but the focus is on high-level concepts.

Ideas, tactics, and mindsets that you need. Because every project is different.

The Serverless Handbook takes you from your very first cloud function to modern backend mastery. In the words of an early reader:

Serverless Handbook taught me high-leveled topics. I don't like recipe courses and these chapters helped me to feel like I'm not a total noob anymore.

The hand-drawn diagrams and high-leveled descriptions gave me the feeling that I don't have any critical "knowledge gaps" anymore.

~ Marek C, engineer

If you can JavaScript, you can backend.

Plus it looks great on your bookshelf 😉

Cheers,
~Swizec

Lambda pipelines for serverless data processing

Serverless data processing

Architectures for serverless data processing

Unlock your free chapter!

Dive into modern backend. Understand any backend