ruk·si

Data Pipelines
Basics

Updated at 2017-09-16 18:02

Data pipeline or data "bus" is the flow of your data.

Collect  >  Store  >  Process  >  Store  >  Process  >  Consume

In general, collecting and consuming (e.g. visualizing) data is trivial.

Focus on decoupling steps in your data pipelines. You don't want to get caught in a situation where all your data passes through a single unscalable service.

Make sure all of the steps are scalable. Also, less management required, the better. Especially with small teams, try to get everything as a service you can afford.

If something feels slow or inflexible, you are probably using the wrong tool. Data structure, latency, throughput, access pattern and cost all help you defining what service, applications and design patterns to use.

Sources

  • Big Data Architectural Pattern, AWS Loft Big Data Day, 2017-09-12