2. Hello!
- “Committer” of Fluentd
- Treasure Data, Inc.
- Former Algorithmic Trader
- Stanford Math and CS
3. Table of Contents
1. Why you should care
2. Data Collection v. Data Ingestion
3. Examples: Data Collection Tools
4. Examples: Data Ingestion Tools
5. Case Study: Async App Logging
Links to be added after the talk.
8. Data Collection
- Happens where data originates
- “logging code”
- Batch v. Streaming
- Pull v. Push
log.error(“FUUUUU....WHY!?”)
cln.send({“uid”:1,”action”:”died”})
200 GET a.com/?utm=big%20data
14. Flume-ng
- Written and maintained by
Cloudera (successor to Flume)
- Commercial support by
Cloudera. Track record for
Hadoop
- Java can be heavy-handed for
some orgs/cases
15. Logstash
- Pluggable architecture, rich
ecosystem
- The “L” of the ELK stack by
Elastic
- JRuby
- HA uses Redis as a queue
http://apuntesdetrabajo.es/?p=263
16. Heka
- Developed at Mozilla
- Written in Go, extensible w/ Lua
- Plugin system, but compilation
needed (Go’s limitation, may
change)
17. Fluentd
- Plugin architecture
- Built-in HA
- CRuby (JRuby on the roadmap)
- google-fluentd, td-agent
- Lightweight multi-source, multi-
destination log routing
21. Kafka
- Begun at LinkedIn, now Confluent
- Topic-based Message Broker:
Producer/Broker/Consumer
- Distributed design
- Provides at least once, at most
once by consumers
22. Fluentd!?
- Used (abused?) as a bus/MQ
- tag-based event routing
- Can be combined with
RabbitMQ/Kafka, etc.