The Zen of Data Pipelines is under development and review. If you have any idea, critics, suggestions or spotted any typo, please go ahead and open an issue in the book's Github repository.
The Zen of Data Pipelines is a compilation of essays about how to design, implement and maintain scalable data pipelines. The ideas and best practices described in the essays were learned while building a scalable and robust data pipeline from the bottom up.
The main goal of this book is to help you to make the best decisions early enough and to foster the discussion of details that often are overlooked by engineering teams. By following these principles, you'll be easily dodge the roadblocks introduced on your path while handling large amounts of data, bogus data input, security requirements and system complexity.
The essays in this book discuss:
- How to design your data pipeline so it is a no brainer to scale it;
- How to design the system to handle and recover from bogus data input;
- Strategies to store the data so it can be easily ingested by different services:
- How to manage processing services configuration;
- Autoscaling best practices that will save you money;
- How to create a development environment that keeps it easy to develop and test the data pipeline as the system grows;
- and more...
Have a copy of this book by your side while thinking and discussing about the architecture of your data system!
A warm thank you to all fellows from the RDS backend team. Thank you for the support and great times while designing and developing the data pipeline and all supporting services for the F-Secure's Rapid Detection Service.
Thank you Maiia for reviewing the essays.