2. Who are you?
• Masahiro Nakagawa
• github: @repeatedly
• Treasure Data Inc.
• Fluentd / td-agent developer
• Fluentd Enterprise support
• I love OSS :)
• D Language, MessagePack, The organizer of several meetups, etc…
3. Fluentd
• Pluggable streaming event collector
• Lightweight, robust and flexible
• Lots of plugins on rubygems
• Used by AWS, GCP, MS and more companies
• Resources
• http://www.fluentd.org/
• Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
6. Push vs Pull
• Push:
• Easy to transfer data to multiple destinations
• Hard to control stream ratio in multiple streams
• Pull:
• Easy to control stream flow / ratio
• Should manage consumers correctly
7. There are 2 ways
• fluent-plugin-kafka
• kafka-fluentd-consumer
8. fluent-plugin-kafka
• Input / Output plugin for kafka
• https://github.com/htgc/fluent-plugin-kafka
• in_kafka, in_kafka_group, out_kafka, out_kafka_buffered
• Pros
• Easy to use and output support
• Cons
• Performance is not primary
9. Configuration example
<source>
@type kafka
topics web,system
format json
add_prefix kafka.
# more options
</source>
<match kafka.**>
@type kafka_buffered
output_data_type msgpack
default_topic metrics
compression_codec gzip
required_acks 1
</match>
https://github.com/htgc/fluent-plugin-kafka#usage
10. kafka fluentd consumer
• Stand-alone kafka consumer for fluentd
• https://github.com/treasure-data/kafka-fluentd-consumer
• Send cosumed events to fluentd’s in_forward
• Pros
• High performance and Java API features
• Cons
• Need Java runtime
11. Run consumer
• Edit log4j and fluentd-consumer properties
• Run following command:
$ java
-Dlog4j.configuration=file:///path/to/log4j.properties
-jar path/to/kafka-fluentd-consumer-0.2.1-all.jar
path/to/fluentd-consumer.properties
12. Properties example
fluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?<text>.*) # for regexp format
fluentd.consumer.topics=app.* # can use Java Rege
fluentd.consumer.topics.pattern=blacklist # default is whitelist
fluentd.consumer.threads=5
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
13. With Fluentd example
<source>
@type forward
</source>
<source>
@type exec
command java -
Dlog4j.configuration=file:///path/to/
log4j.properties -jar /path/to/kafka-
fluentd-consumer-0.2.1-all.jar /path/
to/config/fluentd-
consumer.properties
tag dummy
format json
</source>
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
14. Conclusion
• Kafka is now becomes important component
on data platform
• Fluentd can communicate with Kafka
• Fluentd plugin and kafka consumer
• Building reliable and flexible data pipeline with
Fluentd and Kafka