SlideShare a Scribd company logo
1 of 43
Download to read offline
The basics of Fluentd

            Masahiro Nakagawa
                  Treasuare Data, Inc.
             Senior Software Engineer
Structured logging

                       Reliable forwarding

http://ļ¬‚uentd.org/   Pluggable architecture
Agenda



>   Background
>   Overview
>   Product Comparison
>   Use cases
Background
Data Processing
                       Data source


   Collect   Store   Process   Visualize




 Reporting
Monitoring
Related Products
            easier & shorter time


  Collect   Store Process            Visualize




???         Cloudera                Excel
            Horton Works            Tableau
            Treasure Data           R
Before Fluentd
 Server1           Server2               Server3

Application       Application          Application


           ļ½„ļ½„ļ½„               ļ½„ļ½„ļ½„                   ļ½„ļ½„ļ½„




                                   High Latency!
                                   must wait for a day...
                  Fluent
                 Log Server
After Fluentd
 Server1                Server2              Server3

Application        Application              Application


 Fluentd   ļ½„ļ½„ļ½„          Fluentd   ļ½„ļ½„ļ½„        Fluentd   ļ½„ļ½„ļ½„




                                        In streaming!

              Fluentd             Fluentd
Overview
In short


>   Open sourced log collector written in Ruby
>   Using rubygems ecosystem for plugins


    Itā€™s like syslogd, but
uses JSON for log messages
Time       2012-12-11 07:26:27
     Apache                                                                Tag          apache.log
                                                                          Record {
                                                                                     "host": "127.0.0.1",
                                                                      tail           "method": "GET",
                                                                                     "path": "/",
                     write                                                           ...
                                                                                 }
                                                                                           insert
127.0.0.1
127.0.0.1
127.0.0.1
            -
            -
            -
                -
                -
                -
                    [11/Dec/2012:07:26:27]
                    [11/Dec/2012:07:26:30]
                    [11/Dec/2012:07:26:32]
                                             "GET
                                             "GET
                                             "GET
                                                    /
                                                    /
                                                    /
                                                        ...
                                                        ...
                                                        ...
                                                                     Fluentd
127.0.0.1   -   -   [11/Dec/2012:07:26:40]   "GET   /   ...
127.0.0.1   -   -   [11/Dec/2012:07:27:01]   "GET   /   ...
                             ...



                                                               event
                                                              buffering
                                                                                     Mongo
Event structure(log message)


āœ“ Time                    āœ“ Record
>   second unit           >   JSON format
>   from data source or       >   MessagePack
    adding parsed time            internally
āœ“ Tag                         >   non-unstructured
>   for message routing
Architecture
Pluggable      Pluggable   Pluggable



  Input         Buffer     Output

> Forward      > Memory    > Forward
> HTTP         > File      > File
> File tail                > Amazon S3
> dstat                    > MongoDB
> ...                      > ...
Client libraries
> Ruby
> Java           Application
> Perl
> PHP
                                  Time:Tag:Record
> Python
>D
> Scala
> ...
                 Fluentd
# Ruby
Fluent.open(ā€œmyappā€)
Fluent.event(ā€œloginā€, {ā€œuserā€ => 38})
#=> 2012-12-11 07:56:01 myapp.login     {ā€œuserā€:38}
Conļ¬guration and operation


>   No central / master node
    >   HTTP include helps conf sharing
>   Operation depends on your environment
    >   Use your deamon management
    >   Use chef in Treasure Data
>   Scribe like syntax
# receive events via HTTP       # save alerts to a ļ¬le
<source>                        <match alert.**>
 type http                       type ļ¬le
 port 8888                       path /var/log/ļ¬‚uent/alerts
</source>                       </match>

# read logs from a ļ¬le          # forward other logs to servers
<source>                        <match **>
 type tail                       type forward
 path /var/log/httpd.log         <server>
 format apache                     host 192.168.0.11
 tag apache.access                 weight 20
</source>                        </server>
                                 <server>
# save access logs to MongoDB      host 192.168.0.12
<match apache.access>              weight 60
 type mongo                      </server>
 database apache                </match>
 collection log
</match>                        include http://example.com/conf
Reliability (core + plugin)

>   Buffering
    >   Use file buffer for persistent data
    >   buffer chunk has ID for idempotent
>   Retrying
>   Error handling
    >   transaction, failover, etc on forward plugin
    >   secondary
Plugins - use rubygems


$ fluent-gem search -rd fluent-plugin


$ fluent-gem search -rd fluent-mixin


$ fluent-gem install fluent-plugin-mongo

ā€» Today, donā€™t talk the plugin development
http://fluentd.org/plugin/
in_tail


     Apache                    Fluentd
                                                āœ“ read a log ļ¬le
                                                āœ“ custom regexp
                                                āœ“ custom parser in Ruby
       access.log

Supported format:
 >   apache         >   json
 >   apache2        >   csv
 >   syslog         >   tsv
 >   nginx          >   ltsv (since v0.10.32)
out_mongo


  Apache        Fluentd



   access.log      buffer


                            āœ“ retry automatically
                            āœ“ exponential retry wait
                            āœ“ persistent on a ļ¬le
out_webhdfs                                 āœ“ custom text formatter



  Apache                   Fluentd



   access.log                    buffer
                                                  HDFS


    āœ“ slice ļ¬les based on time            āœ“ retry automatically
      2013-01-01/01/access.log.gz         āœ“ exponential retry wait
      2013-01-01/02/access.log.gz         āœ“ persistent on a ļ¬le
      2013-01-01/03/access.log.gz
      ...
out_copy + other plugins
                                       Hadoop
  Apache        Fluentd



   access.log      buffer

                                    Amazon S3

                          āœ“ routing based on tags
                          āœ“ copy to multiple storages
out_forward            āœ“ automatic fail-over
                       āœ“ load balancing

                                       Fluentd
    apache
  Apache        Fluentd
                                       Fluentd

                                       Fluentd
   access.log      buffer


                            āœ“ retry automatically
                            āœ“ exponential retry wait
                            āœ“ persistent on a ļ¬le
Forward topology


Fluentd   send/ack


Fluentd          Fluentd   send/ack

                                 Fluentd
Fluentd
                 Fluentd

Fluentd
Access logs                               Alerting
 Apache                                     Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
              filter / buffer / routing   Archiving
Databases                                  Amazon S3
Access logs                               Alerting
 Apache                                     Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
              filter / buffer / routing   Archiving
Databases                                  Amazon S3
Access logs                               Alerting
 Apache                                     Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
              filter / buffer / routing   Archiving
Databases                                  Amazon S3
td-agent

>   Open sourced distribution package of fluentd
>   ETL part of Treasure Data
>   Including useful components
    >   ruby, jemalloc, fluentd
    >   3rd party gems: td, mongo, webhdfs, etc...
        td plugin is for TD

>   http://packages.treasure-data.com/
v11

>   Breaking source code compatibility
    >   Not protocol.
>   Windows support
>   Error handling enhancement
>   Better DSL configuration
>   etc: https://gist.github.com/frsyuki/2638703
Product Comparison
Scribe
     Scribe: log collector by Facebook
Pros and Cons

>
ā—   Pros
    >   Fast (written in C++)
>
ā—   Cons
    >   Hard to install and extend
        Are you a C++ magician?
    >   Deal with unstructured logs
    >   No longer maintained
        Replaced with Calligraphus at Facebook
Flume
Flume: distributed log collector by Cloudera

  Phisical           Flume Master
 Topology

             Flume      Flume       Flume



  Logical
 Topology                                      Hadoop
                                                HDFS
Network topology
                    Master
           Agent                        ack

           Agent   Collector
Flume OG                                Collector
           Agent   Collector     send
           Agent

                    Master     Option
           Agent
                                 send/ack
           Agent   Collector
Flume NG                                Collector
           Agent   Collector
           Agent
Pros and Cons

>
ā—   Pros
    >    Using central master to manage all nodes
>
ā—   Cons
    >    Java culture (Pros for Java-er?)
        Difficult configuration and setup
    >    Difficult topology
    >    Mainly for Hadoop
        less plugins?
Use cases
Treasure Data

   Frontend                           Worker
                          Job Queue                      Hadoop




                                                         Hadoop

 Applications push
 metrics to Fluentd
                                                   sums up data minutes
 (via local Fluentd)       Fluentd    Fluentd      (partial aggregation)


       Treasure                                 Librato
           Data                                 Metrics
for historical analysis                         for realtime analysis
Cookpad
hundreds of app servers


  Rails app           td-agent
               sends event logs                            Daily/Hourly      Google
                                                           Batch           Spreadsheet

  Rails app           td-agent             Treasure Data
               sends event logs
                                                                            MySQL

  Rails app           td-agent
                                  Logs are available
               sends event logs
                                  after several mins.

                                                                     KPI
                                  Feedback rankings        visualization
  Unlimited scalability
  Flexible schema
  Realtime
  Less performance impact               āœ“ Over 100 RoR servers (2012/2/4)
NHN Japan
                                                                                       Archive
                                                                                       Storage
        Web
       Servers                               Fluentd                                  (scribed)
                                             Cluster
                                                                                      Notiļ¬cations
                               STREAM                                                    (IRC)
                                                           Fluentd
                                                           Watchers
                                                                                          Graph
                                                                                          Tools

                               webhdfs                                        SCHEDULED
                                                              BATCH              BATCH
āœ“ 16 nodes               Hadoop Cluster               hive
                                                     server
āœ“ 120,000+ lines/sec
                             CDH4                                        Shib              ShibUI
āœ“ 400Mbps at peak                                  Huahin
āœ“ 1.5+ TB/day (raw)      (HDFS, YARN)              Manager

           http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
Other companies
Conclusion


>   Fluentd is a widely-used log collector
    >   There are many use cases
    >   Many contributors and plugins
>   Keep it simple
    >   Easy to integrate your environment

More Related Content

What's hot

Apache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆ
Apache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆApache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆ
Apache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆ
SANG WON PARK
Ā 
Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to Docker
Docker, Inc.
Ā 

What's hot (20)

cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouse
Ā 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
Ā 
nexus helm ģ„¤ģ¹˜, docker/helm repo ģ„¤ģ •ź³¼ ģ˜ˆģ œ
nexus helm ģ„¤ģ¹˜, docker/helm repo ģ„¤ģ •ź³¼ ģ˜ˆģ œnexus helm ģ„¤ģ¹˜, docker/helm repo ģ„¤ģ •ź³¼ ģ˜ˆģ œ
nexus helm ģ„¤ģ¹˜, docker/helm repo ģ„¤ģ •ź³¼ ģ˜ˆģ œ
Ā 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
Ā 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
Ā 
Airflowė„¼ ģ“ģš©ķ•œ ė°ģ“ķ„° Workflow ź“€ė¦¬
Airflowė„¼ ģ“ģš©ķ•œ  ė°ģ“ķ„° Workflow ź“€ė¦¬Airflowė„¼ ģ“ģš©ķ•œ  ė°ģ“ķ„° Workflow ź“€ė¦¬
Airflowė„¼ ģ“ģš©ķ•œ ė°ģ“ķ„° Workflow ź“€ė¦¬
Ā 
[ģ˜¤ķ”ˆģ†ŒģŠ¤ģ»Øģ„¤ķŒ…] ģæ ė²„ė„¤ķ‹°ģŠ¤ģ™€ ģæ ė²„ė„¤ķ‹°ģŠ¤ on ģ˜¤ķ”ˆģŠ¤ķƒ ė¹„źµ ė° źµ¬ģ¶• ė°©ė²•
[ģ˜¤ķ”ˆģ†ŒģŠ¤ģ»Øģ„¤ķŒ…] ģæ ė²„ė„¤ķ‹°ģŠ¤ģ™€ ģæ ė²„ė„¤ķ‹°ģŠ¤ on ģ˜¤ķ”ˆģŠ¤ķƒ ė¹„źµ  ė° źµ¬ģ¶• ė°©ė²•[ģ˜¤ķ”ˆģ†ŒģŠ¤ģ»Øģ„¤ķŒ…] ģæ ė²„ė„¤ķ‹°ģŠ¤ģ™€ ģæ ė²„ė„¤ķ‹°ģŠ¤ on ģ˜¤ķ”ˆģŠ¤ķƒ ė¹„źµ  ė° źµ¬ģ¶• ė°©ė²•
[ģ˜¤ķ”ˆģ†ŒģŠ¤ģ»Øģ„¤ķŒ…] ģæ ė²„ė„¤ķ‹°ģŠ¤ģ™€ ģæ ė²„ė„¤ķ‹°ģŠ¤ on ģ˜¤ķ”ˆģŠ¤ķƒ ė¹„źµ ė° źµ¬ģ¶• ė°©ė²•
Ā 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
Ā 
Apache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆ
Apache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆApache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆ
Apache kafka ėŖØė‹ˆķ„°ė§ģ„ ģœ„ķ•œ Metrics ģ“ķ•“ ė° ģµœģ ķ™” ė°©ģ•ˆ
Ā 
Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to Docker
Ā 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Ā 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
Ā 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
Ā 
Twitterģ˜ snowflake ģ†Œź°œ ė° ķ™œģš©
Twitterģ˜ snowflake ģ†Œź°œ ė° ķ™œģš©Twitterģ˜ snowflake ģ†Œź°œ ė° ķ™œģš©
Twitterģ˜ snowflake ģ†Œź°œ ė° ķ™œģš©
Ā 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Ā 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
Ā 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Ā 
Logs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK StackLogs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK Stack
Ā 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Ā 
Best practices for ansible
Best practices for ansibleBest practices for ansible
Best practices for ansible
Ā 

Viewers also liked

Life of an Fluentd event
Life of an Fluentd eventLife of an Fluentd event
Life of an Fluentd event
Kiyoto Tamura
Ā 
Fluentd in Co-Work
Fluentd in Co-WorkFluentd in Co-Work
Fluentd in Co-Work
Makoto Haruyama
Ā 
Should we write such like plugin or not?
Should we write such like plugin or not?Should we write such like plugin or not?
Should we write such like plugin or not?
SATOSHI TAGOMORI
Ā 
Fluentd message forwarding with authentication and encryption
Fluentd message forwarding with authentication and encryptionFluentd message forwarding with authentication and encryption
Fluentd message forwarding with authentication and encryption
SATOSHI TAGOMORI
Ā 
å¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆ
å¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆå¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆ
å¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆ
Leo Zhou
Ā 

Viewers also liked (20)

Life of an Fluentd event
Life of an Fluentd eventLife of an Fluentd event
Life of an Fluentd event
Ā 
Fluentd Hacking Guide at RubyKaigi 2014
Fluentd Hacking Guide at RubyKaigi 2014Fluentd Hacking Guide at RubyKaigi 2014
Fluentd Hacking Guide at RubyKaigi 2014
Ā 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
Ā 
Fluentd in Co-Work
Fluentd in Co-WorkFluentd in Co-Work
Fluentd in Co-Work
Ā 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
Ā 
Should we write such like plugin or not?
Should we write such like plugin or not?Should we write such like plugin or not?
Should we write such like plugin or not?
Ā 
Fluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log ManagementFluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log Management
Ā 
Fluentd/LogStash + elastic search + kibana
Fluentd/LogStash + elastic search + kibanaFluentd/LogStash + elastic search + kibana
Fluentd/LogStash + elastic search + kibana
Ā 
Fluentd Meetup 2016 - ServerEngine Integration & Windows support
Fluentd Meetup 2016 - ServerEngine Integration & Windows supportFluentd Meetup 2016 - ServerEngine Integration & Windows support
Fluentd Meetup 2016 - ServerEngine Integration & Windows support
Ā 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentd
Ā 
Fluentd v0.14 Overview
Fluentd v0.14 OverviewFluentd v0.14 Overview
Fluentd v0.14 Overview
Ā 
Fluentd message forwarding with authentication and encryption
Fluentd message forwarding with authentication and encryptionFluentd message forwarding with authentication and encryption
Fluentd message forwarding with authentication and encryption
Ā 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
Ā 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and Fluentd
Ā 
å¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆ
å¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆå¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆ
å¦‚ä½•é€‰ę‹© Docker ē›‘ęŽ§ę–¹ę”ˆ
Ā 
Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3
Ā 
fluentdčØ­å®šč”Œę•°ćØć‚·ć‚¹ćƒ†ćƒ č¤‡é›‘ę€§ć®ć‚«ć‚øćƒ„ć‚¢ćƒ«ćŖ話
fluentdčØ­å®šč”Œę•°ćØć‚·ć‚¹ćƒ†ćƒ č¤‡é›‘ę€§ć®ć‚«ć‚øćƒ„ć‚¢ćƒ«ćŖ話fluentdčØ­å®šč”Œę•°ćØć‚·ć‚¹ćƒ†ćƒ č¤‡é›‘ę€§ć®ć‚«ć‚øćƒ„ć‚¢ćƒ«ćŖ話
fluentdčØ­å®šč”Œę•°ćØć‚·ć‚¹ćƒ†ćƒ č¤‡é›‘ę€§ć®ć‚«ć‚øćƒ„ć‚¢ćƒ«ćŖ話
Ā 
Fluentd and docker monitoring
Fluentd and docker monitoringFluentd and docker monitoring
Fluentd and docker monitoring
Ā 
Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)
Ā 
Fluentd v0.12 master guide
Fluentd v0.12 master guideFluentd v0.12 master guide
Fluentd v0.12 master guide
Ā 

Similar to The basics of fluentd

Fluentd meetup at Slideshare
Fluentd meetup at SlideshareFluentd meetup at Slideshare
Fluentd meetup at Slideshare
Sadayuki Furuhashi
Ā 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
Rich Bowen
Ā 

Similar to The basics of fluentd (20)

Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
Ā 
Fluentd - RubyKansai 65
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65
Ā 
Fluentd meetup at Slideshare
Fluentd meetup at SlideshareFluentd meetup at Slideshare
Fluentd meetup at Slideshare
Ā 
Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4
Ā 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
Ā 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open source
Ā 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
Ā 
Fluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUFluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EU
Ā 
Melbourne Infracoders: Compliance as Code with InSpec
Melbourne Infracoders: Compliance as Code with InSpecMelbourne Infracoders: Compliance as Code with InSpec
Melbourne Infracoders: Compliance as Code with InSpec
Ā 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
Ā 
upload test 1
upload test 1upload test 1
upload test 1
Ā 
IT Operations for Web Developers
IT Operations for Web DevelopersIT Operations for Web Developers
IT Operations for Web Developers
Ā 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
Ā 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)
Ā 
Odoo command line interface
Odoo command line interfaceOdoo command line interface
Odoo command line interface
Ā 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Ā 
Fluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At FossasiaFluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At Fossasia
Ā 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
Ā 
Fluentd meetup in japan
Fluentd meetup in japanFluentd meetup in japan
Fluentd meetup in japan
Ā 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into Hadoop
Ā 

More from Treasure Data, Inc.

źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)
źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)
źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)
Treasure Data, Inc.
Ā 

More from Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
Ā 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
Ā 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
Ā 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
Ā 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Ā 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Ā 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
Ā 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
Ā 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
Ā 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
Ā 
źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)
źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)
źø€ė”œė²Œ ģ‚¬ė”€ė”œ ė³“ėŠ” ė°ģ“ķ„°ė”œ ėˆ ė²„ėŠ” ė²• - ķŠøė ˆģ €ė°ģ“ķ„° (Treasure Data)
Ā 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
Ā 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
Ā 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
Ā 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
Ā 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
Ā 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Ā 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
Ā 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
Ā 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
Ā 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Ā 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Ā 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Ā 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Ā 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 

The basics of fluentd

  • 1. The basics of Fluentd Masahiro Nakagawa Treasuare Data, Inc. Senior Software Engineer
  • 2. Structured logging Reliable forwarding http://ļ¬‚uentd.org/ Pluggable architecture
  • 3. Agenda > Background > Overview > Product Comparison > Use cases
  • 5. Data Processing Data source Collect Store Process Visualize Reporting Monitoring
  • 6. Related Products easier & shorter time Collect Store Process Visualize ??? Cloudera Excel Horton Works Tableau Treasure Data R
  • 7.
  • 8. Before Fluentd Server1 Server2 Server3 Application Application Application ļ½„ļ½„ļ½„ ļ½„ļ½„ļ½„ ļ½„ļ½„ļ½„ High Latency! must wait for a day... Fluent Log Server
  • 9. After Fluentd Server1 Server2 Server3 Application Application Application Fluentd ļ½„ļ½„ļ½„ Fluentd ļ½„ļ½„ļ½„ Fluentd ļ½„ļ½„ļ½„ In streaming! Fluentd Fluentd
  • 11. In short > Open sourced log collector written in Ruby > Using rubygems ecosystem for plugins Itā€™s like syslogd, but uses JSON for log messages
  • 12. Time 2012-12-11 07:26:27 Apache Tag apache.log Record { "host": "127.0.0.1", tail "method": "GET", "path": "/", write ... } insert 127.0.0.1 127.0.0.1 127.0.0.1 - - - - - - [11/Dec/2012:07:26:27] [11/Dec/2012:07:26:30] [11/Dec/2012:07:26:32] "GET "GET "GET / / / ... ... ... Fluentd 127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ... ... event buffering Mongo
  • 13. Event structure(log message) āœ“ Time āœ“ Record > second unit > JSON format > from data source or > MessagePack adding parsed time internally āœ“ Tag > non-unstructured > for message routing
  • 14. Architecture Pluggable Pluggable Pluggable Input Buffer Output > Forward > Memory > Forward > HTTP > File > File > File tail > Amazon S3 > dstat > MongoDB > ... > ...
  • 15. Client libraries > Ruby > Java Application > Perl > PHP Time:Tag:Record > Python >D > Scala > ... Fluentd # Ruby Fluent.open(ā€œmyappā€) Fluent.event(ā€œloginā€, {ā€œuserā€ => 38}) #=> 2012-12-11 07:56:01 myapp.login {ā€œuserā€:38}
  • 16. Conļ¬guration and operation > No central / master node > HTTP include helps conf sharing > Operation depends on your environment > Use your deamon management > Use chef in Treasure Data > Scribe like syntax
  • 17. # receive events via HTTP # save alerts to a ļ¬le <source> <match alert.**> type http type ļ¬le port 8888 path /var/log/ļ¬‚uent/alerts </source> </match> # read logs from a ļ¬le # forward other logs to servers <source> <match **> type tail type forward path /var/log/httpd.log <server> format apache host 192.168.0.11 tag apache.access weight 20 </source> </server> <server> # save access logs to MongoDB host 192.168.0.12 <match apache.access> weight 60 type mongo </server> database apache </match> collection log </match> include http://example.com/conf
  • 18. Reliability (core + plugin) > Buffering > Use file buffer for persistent data > buffer chunk has ID for idempotent > Retrying > Error handling > transaction, failover, etc on forward plugin > secondary
  • 19. Plugins - use rubygems $ fluent-gem search -rd fluent-plugin $ fluent-gem search -rd fluent-mixin $ fluent-gem install fluent-plugin-mongo ā€» Today, donā€™t talk the plugin development
  • 21. in_tail Apache Fluentd āœ“ read a log ļ¬le āœ“ custom regexp āœ“ custom parser in Ruby access.log Supported format: > apache > json > apache2 > csv > syslog > tsv > nginx > ltsv (since v0.10.32)
  • 22. out_mongo Apache Fluentd access.log buffer āœ“ retry automatically āœ“ exponential retry wait āœ“ persistent on a ļ¬le
  • 23. out_webhdfs āœ“ custom text formatter Apache Fluentd access.log buffer HDFS āœ“ slice ļ¬les based on time āœ“ retry automatically 2013-01-01/01/access.log.gz āœ“ exponential retry wait 2013-01-01/02/access.log.gz āœ“ persistent on a ļ¬le 2013-01-01/03/access.log.gz ...
  • 24. out_copy + other plugins Hadoop Apache Fluentd access.log buffer Amazon S3 āœ“ routing based on tags āœ“ copy to multiple storages
  • 25. out_forward āœ“ automatic fail-over āœ“ load balancing Fluentd apache Apache Fluentd Fluentd Fluentd access.log buffer āœ“ retry automatically āœ“ exponential retry wait āœ“ persistent on a ļ¬le
  • 26. Forward topology Fluentd send/ack Fluentd Fluentd send/ack Fluentd Fluentd Fluentd Fluentd
  • 27. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3
  • 28. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3
  • 29. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3
  • 30. td-agent > Open sourced distribution package of fluentd > ETL part of Treasure Data > Including useful components > ruby, jemalloc, fluentd > 3rd party gems: td, mongo, webhdfs, etc... td plugin is for TD > http://packages.treasure-data.com/
  • 31. v11 > Breaking source code compatibility > Not protocol. > Windows support > Error handling enhancement > Better DSL configuration > etc: https://gist.github.com/frsyuki/2638703
  • 33. Scribe Scribe: log collector by Facebook
  • 34. Pros and Cons > ā— Pros > Fast (written in C++) > ā— Cons > Hard to install and extend Are you a C++ magician? > Deal with unstructured logs > No longer maintained Replaced with Calligraphus at Facebook
  • 35. Flume Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Topology Hadoop HDFS
  • 36. Network topology Master Agent ack Agent Collector Flume OG Collector Agent Collector send Agent Master Option Agent send/ack Agent Collector Flume NG Collector Agent Collector Agent
  • 37. Pros and Cons > ā— Pros > Using central master to manage all nodes > ā— Cons > Java culture (Pros for Java-er?) Difficult configuration and setup > Difficult topology > Mainly for Hadoop less plugins?
  • 39. Treasure Data Frontend Worker Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minutes (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Librato Data Metrics for historical analysis for realtime analysis
  • 40. Cookpad hundreds of app servers Rails app td-agent sends event logs Daily/Hourly Google Batch Spreadsheet Rails app td-agent Treasure Data sends event logs MySQL Rails app td-agent Logs are available sends event logs after several mins. KPI Feedback rankings visualization Unlimited scalability Flexible schema Realtime Less performance impact āœ“ Over 100 RoR servers (2012/2/4)
  • 41. NHN Japan Archive Storage Web Servers Fluentd (scribed) Cluster Notiļ¬cations STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH āœ“ 16 nodes Hadoop Cluster hive server āœ“ 120,000+ lines/sec CDH4 Shib ShibUI āœ“ 400Mbps at peak Huahin āœ“ 1.5+ TB/day (raw) (HDFS, YARN) Manager http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
  • 43. Conclusion > Fluentd is a widely-used log collector > There are many use cases > Many contributors and plugins > Keep it simple > Easy to integrate your environment