🌿 Apache Druid
Apache Druid is an analytics data store for event-driven data. In more technical terms, Druid is a real-time columnar timeseries database that scales effortlessly.
It has a web application component to visualize the data.
It is common to use Apache Kafka as a buffer and feed events form Kafka to Druid. Druid should be able to use any event-oriented or time series Kafka topics without much of a hassle.
Druid doesn't support joins so you must do that in pre-processing phase e.g. with Spark or Flink.
Common use-cases:
- Network flows
- User activity
- Device metrics
- Application performance
- Digital marketing advertisement data
- Business intelligence
Data is stored into segments. Segments are immutable, and you configure how they are created; hourly, daily, monthly.
Druid installation is quite heavy. You need a SQL database, Zookeeper, S3 and a bunch of servers. Kubernetes + Helm can help with the setup.