I Heart Log: Real-time Data and Apache Kafka

•Download as PPTX, PDF•

63 likes•10,530 views

This presentation discusses how logs and stream-processing can form a backbone for data flow, ETL, and real-time data processing. It will describe the challenges and lessons learned as LinkedIn built out its real-time data subscription and processing infrastructure. It will also discuss the role of real-time processing and its relationship to offline processing frameworks such as MapReduce.

Engineering

Real-time Data and Apache Kafka
Jay Kreps
I ♥ Log

The Plan
1. Apache Kafka
2. Logs and Distributed Systems
3. Logs and Data Integration
4. Logs and Stream Processing

Three principles
1. One pipeline to rule them all
2. Stream processing >> messaging
3. Clusters not servers

Characteristics
• Scalability of a filesystem
– Hundreds of MB/sec/server throughput
– Many TB per server
• Guarantees of a database
– Messages strictly ordered
– All data persistent
• Distributed by default
– Replication
– Partitioning model

Kafka At LinkedIn
• 175 TB of in-flight log data per colo
• Low-latency: ~1.5 ms
• Replicated to each datacenter
• Tens of thousands of data producers
• Thousands of consumers
• 7 million messages written/sec
• 35 million messages read/sec
• Hadoop integration

Open source
• Apache Software Foundation
• Very healthy usage outside LinkedIn
• Broad base of committers
• 30 clients in 15 languages
• Great ecosystem of supporting tools

Example:
A Fault-tolerant CEO Hash Table

Types of Data
• Database data
– Users, products, orders, etc
• Events
– Clicks, Impressions, Pageviews, etc
• Application metrics
– CPU usage, requests/sec
• Application logs
– Service calls, errors

Systems at LinkedIn
• Live Stores
– Voldemort
– Espresso
– Graph
– OLAP
– Search
– InGraphs
• Offline
– Hadoop
– Teradata

Stream processing is a
generalization
of batch processing

Examples
• Monitoring
• Security
• Content processing
• Recommendations
• Newsfeed
• ETL

Kafka
http://kafka.apache.org
Samza
http://samza.incubator.apache.org
Log Blog
http://linkd.in/199iMwY
Me
http://www.linkedin.com/in/jaykreps
@jaykreps

What's hot

Apache kafka

Long Nguyen

Design Patterns for working with Fast Data

MapR Technologies

Since it was open sourced, Apache Kafka has been adopted very widely from web companies like Uber, Netflix, LinkedIn to more traditional enterprises like Cerner, Goldman Sachs and Cisco. At these companies, Kafka is used in a variety of ways - as a pipeline for collecting high-volume log data for load into Hadoop, a means for collecting operational metrics to feed monitoring and alerting applications, for low latency messaging use cases and to power near realtime stream processing.

The Many Faces of Apache Kafka: Leveraging real-time data at scale

Neha Narkhede

Kafka and Spark Streaming

datamantra

Kafka blr-meetup-presentation - Kafka internals

Ayyappadas Ravindran (Appu)

kafka for db as postgres

PivotalOpenSourceHub

Singer, Pinterest's Logging Infrastructure

Discover Pinterest

Stream processing using Kafka

Knoldus Inc.

Kafka is a scalable, distributed publish subscribe messaging system that's used as a data transmission backbone in many data intensive digital businesses. Couchbase Server is a scalable, flexible document database that's fast, agile, and elastic. Because they both appeal to the same type of customers, Couchbase and Kafka are often used together. This presentation from a meetup in Mountain View describes Kafka's design and why people use it, Couchbase Server and its uses, and the use cases for both together. Also covered is a description and demo of Couchbase Server writing documents to a Kafka topic and consuming messages from a Kafka topic. using the Couchbase Kafka Connector.

Real time Messages at Scale with Apache Kafka and Couchbase

Will Gardella

Kafka internals

David Groozman

Kafka presentation

Mohammed Fazuluddin

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges. The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier.

Building Realtim Data Pipelines with Kafka Connect and Spark Streaming

Apache kafka

Apache Kafka

Apache kafka

Fundamentals of Apache Kafka

Chhavi Parasher

Introduction to Apache Kafka and why it matters - Madrid

Paolo Castagna

Apache Kafka - Martin Podval

Martin Podval

Introduction to Kafka

Akash Vacher

Fundamentals and Architecture of Apache Kafka

Angelo Cesaro

What's hot (20)

Apache kafka

Design Patterns for working with Fast Data

The Many Faces of Apache Kafka: Leveraging real-time data at scale

Kafka and Spark Streaming

Kafka blr-meetup-presentation - Kafka internals

kafka for db as postgres

Singer, Pinterest's Logging Infrastructure

Stream processing using Kafka

Real time Messages at Scale with Apache Kafka and Couchbase

Kafka internals

Kafka presentation

Building Realtim Data Pipelines with Kafka Connect and Spark Streaming

Apache kafka

Apache Kafka

Apache kafka

Fundamentals of Apache Kafka

Introduction to Apache Kafka and why it matters - Madrid

Apache Kafka - Martin Podval

Introduction to Kafka

Fundamentals and Architecture of Apache Kafka

Similar to I Heart Log: Real-time Data and Apache Kafka

Apache Kafka

emreakis

Hadoop Ecosystem and Low Latency Streaming Architecture

InSemble

Data Streaming For Big Data

Seval Çapraz

MyHeritage Kakfa use cases - Feb 2014 Meetup

Ran Levy

Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target. This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.

Apache Kafka - Scalable Message-Processing and more !

Guido Schmutz

Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...

Amazon Web Services

Streaming and real-time data has high business value, but that value can rapidly decay if not processed quickly. If the value of the data is not realized in a certain window of time, its value is lost and the decision or action that was needed as a result never occurs. Streaming data – whether from sensors, devices, applications, or events – needs special attention because a sudden price change, a critical threshold met, a sensor reading changing rapidly, or a blip in a log file can all be of immense value, but only if the alert is in time.

ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware

DATAVERSITY

___________________________________________ Meetup#7 | Session 2 | 21/03/2018 | Taboola _____________________________________________ In this talk, we will present our multi-DC Kafka architecture, and discuss how we tackle sending and handling 10B+ messages per day, with maximum availability and no tolerance for data loss. Our architecture includes technologies such as Cassandra, Spark, HDFS, and Vertica - with Kafka as the backbone that feeds them all.

Distributed Kafka Architecture Taboola Scale

Apache Kafka TLV

Apache kafka

Kumar Shivam

Building Scalable Big Data Infrastructure Using Open Source Software Presenta...

ssuserd3a367

Streaming Analytics

Neera Agarwal

Realtime traffic analyser

Alex Moskvin

Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)

Ontico

JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing

Luis Gonzalez

JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing

BEEVA_es

Cloud computing UNIT 2.1 presentation in

RahulBhole12

HDFS_architecture.ppt

vijayapraba1

Messaging, storage, or both? The real time story of Pulsar and Apache Distri...

Streamlio

Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays. In this talk, we are going to look at some of the most common misconceptions about stream processing and debunk them. - Myth 1: Streaming is approximate and exactly-once is not possible. - Myth 2: Streaming is for real-time only. - Myth 4: Streaming is harder to learn than Batch Processing. - Myth 3: You need to choose between latency and throughput. We will look at these and other myths and debunk them at the example of Apache Flink. We will discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.

Debunking Common Myths in Stream Processing

DataWorks Summit/Hadoop Summit

Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022 As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed. I will walk through how to get started, some use cases and demos and answer questions. https://www.devfest-uki.com/schedule https://linktr.ee/tspannhw

Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...

Timothy Spann

Similar to I Heart Log: Real-time Data and Apache Kafka (20)

Apache Kafka

Hadoop Ecosystem and Low Latency Streaming Architecture

Data Streaming For Big Data

MyHeritage Kakfa use cases - Feb 2014 Meetup

Apache Kafka - Scalable Message-Processing and more !

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...

ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware

Distributed Kafka Architecture Taboola Scale

Apache kafka

Building Scalable Big Data Infrastructure Using Open Source Software Presenta...

Streaming Analytics

Realtime traffic analyser

Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)

JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing

JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing

Cloud computing UNIT 2.1 presentation in

HDFS_architecture.ppt

Messaging, storage, or both? The real time story of Pulsar and Apache Distri...

Debunking Common Myths in Stream Processing

Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...

Recently uploaded

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Booking Booking Now open +91- 7737669865 Why you Choose Us- +91- 7737669865 HOT⇄ 7737669865 Mr ashu ji Call Mr ashu Ji +91- 7737669865 (V020524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

roncy bisnoi

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

rknatarajan

MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING

SIVASHANKAR N

Data security is rapidly gaining importance as the volume of data companies collect, analyze and monetize grows exponentially. New data processing tools and platforms are emerging at an increasing rate, as are the ways in which an organization consumes data. In this presentation Mukund Sarma and Feni Chawla talk about the unique technical and cultural challenges of running a data security program and share some practical solutions that have worked well at our company. These slides were presented at the BSides Seattle 2024 conference.

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

fenichawla

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Sex Service At Affordable Rate Booking Contact Details WhatsApp Chat: +91-6297143586 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 01-may-2024(v.n)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

Call Girls in Nagpur High Profile

ONLINE FOOD ORDER SYSTEM is a website designed primarily for use in the food delivery industry. This system will allow hotels and restaurants to increase scope of business by reducing the labor cost involved. The system also allows to quickly and easily manage an online menu which customers can browse and use to place orders with just few clicks. Restaurant employees then use these orders through an easy to navigate graphical interface for efficient processing.

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

Kamal Acharya

UNIT-III FMM. DIMENSIONAL ANALYSIS

rknatarajan

Processing & Properties of Floor and Wall Tiles.pptx

pranjaldaimarysona

The Educational Administration: Theory and Practice publishes prominent empirical and conceptual articles focused on timely and critical leadership and policy issues of educational organizations. The journal embraces traditional and emergent research paradigms, methods, and issues. The journal particularly promotes the publication of rigorous and relevant scholarly work that enhances linkages among and utility for educational policy, practice, and research arenas. The goal of the editorial team and the journal’s editorial board is to promote sound scholarship and a clear and continuing dialogue among scholars and practitioners from a broad spectrum of education. Educational Administration: Theory and Practice presents prominent empirical and conceptual articles focused on timely and critical leadership and policy issues facing educational organizations. As an editorial team, we embrace traditional and emergent theoretical frameworks, research methods, and topics. We particularly promote the publication of rigorous and relevant scholarly work with utility for educational policy, practice, and research. The journal’s primary focus is on studies of educational leadership, organizations, leadership development, and policy as they relate to elementary and secondary levels of education. Examinations of leadership and policy that fall outside K-12 are considered insofar as there are meaningful connections to the K-12 arena (e.g., college pipeline). International comparative investigations are welcome to the extent they have implications for a broad audience.s.

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

Christo Ananth

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts Booking Contact Details WhatsApp Chat: +91-7001035870 Nagpur Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts Nagpur understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 27-april-2024(v.n)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

ranjana rawat

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts Booking Contact Details WhatsApp Chat: +91-7001035870 Nagpur Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts Nagpur understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 27-april-2024(v.n)

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts

Call Girls in Nagpur High Profile

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

slot gacor bisa pakai pulsa

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts Booking Contact Details WhatsApp Chat: +91-7001035870 nashik Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts nashik understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 27-april-2024(v.n)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

ranjana rawat

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

SIVASHANKAR N

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss This Chance Of Getting Into My Sexy Boobs? Booking Contact Details WhatsApp Chat: +91-8250192130 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 30-april-2024(v.n)

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for Students and Professionals. Unlock the potential of foundation design with Dr. Costas Sachpazis’s enlightening handbook, a meticulously crafted guide poised to become an indispensable resource for both budding and seasoned civil engineers. This comprehensive manual illuminates the theoretical and practical aspects of structural analysis and design across various types of foundations and retaining walls. Within these pages, Dr. Sachpazis distills complex engineering principles into digestible, step-by-step processes, enhanced by detailed diagrams, case studies, and real-world examples that bridge the gap between academic study and professional application. From soil mechanics and load calculations to innovative design techniques and sustainability considerations, this book covers a vast landscape of structural engineering. Key Features: • In-Depth Analysis and Design: Explore thorough explanations of both shallow and deep foundation designs, supported by case studies that demonstrate their practical implementations. • Practical Guides: Benefit from detailed guides on site investigation, bearing capacity calculations, and settlement analysis, ensuring designs are both robust and reliable. • Innovative Techniques: Discover the latest advancements in foundation technology and retaining wall design, preparing you for future trends in civil engineering. • Educational Tools: Utilize this handbook as an educational tool, perfect for both classroom learning and professional development. Whether you're a student eager to learn the fundamentals or a professional seeking to deepen your expertise, Dr. Sachpazis’s handbook is designed to support and inspire excellence in the field of structural engineering. Embrace this opportunity to enhance your skills and contribute to building safer, more efficient structures.

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Dr.Costas Sachpazis

UNIT - IV - Air Compressors and its Performance

sivaprakash250

African Journal of Biological Sciences is an International peer-reviewed, Open Access journal that publishes original research articles as well as review articles in all areas of Biological Sciences. It operates a fully open access publishing model which allows open global access to its published content. This model is supported through Article Processing Charges. For more information on Article Processing charges click here. Its scope embraces Animal Sciences, Biochemistry, Bioinformatics, Biotechnology, Botany, Cell Biology, Developmental Biology, Ecology, Environmental Sciences, Ethno Medicine, Food Science, Freshwater Biology, Genetics, Immunology, Marine Biology, Microbiology, Molecular Biology, Physiology, Plant Sciences, Structural Biology,Toxicology,Zoology etc. It is essential that authors prepare their manuscripts according to established specifications. Failure to follow them may result in papers being delayed or rejected. Therefore, contributors are strongly encouraged to read the author guidelines carefully before preparing a manuscript for submission. The manuscripts should be checked carefully for grammatical, punctuation errors. All papers are subjected to peer review. All articles published in this journal represent the opinion of the authors and not reflect the official policy of the Journal of African Journal of Biological Sciences

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Christo Ananth

LIST OF EXPERIMENTS: 1. Implement simple vector addition in Tensor Flow. 2. Implement a regression model in Keras. 3. Implement a perception in TensorFlow/Keras Environment. 4. Implement a Feed Forward Network in TensorFlow/Keras. 5. Implement an image classifier using CNN in TensorFlow/Keras. 6. Improve the deep Learning model by fine tuning hyper parameters. 7. Implement a Transfer Learning concept in image classification. 8. Using a pre trained model on Keras for transfer learning. 9. Perform Sentimental Analysis using RNN. 10. Implement an LSTM based Auto encoding inTensorflow/Keras. 11. Image generation using GAN. ADDITIONAL EXPERIMENTS 12. Train a deep Learning model to classify a given image using pre trained model. 13. Recommendation system from sales data using Deep Learning. 14. Implement Object detection using CNN. 15. Implement any simple Reinforcement Algorithm for an NLP problem.

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Asst.prof M.Gokilavani

AKTU Computer Networks notes --- Unit 3.pdf

ankushspencer015

Recently uploaded (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

UNIT-III FMM. DIMENSIONAL ANALYSIS

Processing & Properties of Floor and Wall Tiles.pptx

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

UNIT - IV - Air Compressors and its Performance

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

AKTU Computer Networks notes --- Unit 3.pdf

I Heart Log: Real-time Data and Apache Kafka

1. Real-time Data and Apache Kafka Jay Kreps I ♥ Log

2. The Plan 1. Apache Kafka 2. Logs and Distributed Systems 3. Logs and Data Integration 4. Logs and Stream Processing

3. Apache Kafka

5. A brief history of Kafka

6. Three principles 1. One pipeline to rule them all 2. Stream processing >> messaging 3. Clusters not servers

7. Characteristics • Scalability of a filesystem – Hundreds of MB/sec/server throughput – Many TB per server • Guarantees of a database – Messages strictly ordered – All data persistent • Distributed by default – Replication – Partitioning model

8. Kafka At LinkedIn • 175 TB of in-flight log data per colo • Low-latency: ~1.5 ms • Replicated to each datacenter • Tens of thousands of data producers • Thousands of consumers • 7 million messages written/sec • 35 million messages read/sec • Hadoop integration

9. Open source • Apache Software Foundation • Very healthy usage outside LinkedIn • Broad base of committers • 30 clients in 15 languages • Great ecosystem of supporting tools

10. The Plan 1. Apache Kafka 2. Logs and Distributed Systems 3. Logs and Data Integration 4. Logs and Stream Processing

11. Kafka is about logs

12. What is a log?

13.

14.

15. Partitioning

16. Logs: pub/sub done right

17. Logs And Distributed Systems

18. Example: A Fault-tolerant CEO Hash Table

19. Operations Final State

20.

21. State-machine Replication

22. Primary-backup

23. What use is a log?

24. The Plan 1. Apache Kafka 2. Logs and Distributed Systems 3. Logs and Data Integration 4. Logs and Stream Processing

25. Data Integration

26. Types of Data • Database data – Users, products, orders, etc • Events – Clicks, Impressions, Pageviews, etc • Application metrics – CPU usage, requests/sec • Application logs – Service calls, errors

27. Systems at LinkedIn • Live Stores – Voldemort – Espresso – Graph – OLAP – Search – InGraphs • Offline – Hadoop – Teradata

28. Bad

29. Good

30. Example: User views job

31. The Plan 1. Apache Kafka 2. Logs and Distributed Systems 3. Logs and Data Integration 4. Logs and Stream Processing

32. Stream Processing

33.

34. Stream processing is a generalization of batch processing

35. Examples • Monitoring • Security • Content processing • Recommendations • Newsfeed • ETL

36. Stream Processing = Logs + Jobs

37. Systems Can Help

38. Samza Architecture

39. Example: Top Articles By Company

40. Log-centric Architecture

41. Kafka http://kafka.apache.org Samza http://samza.incubator.apache.org Log Blog http://linkd.in/199iMwY Me http://www.linkedin.com/in/jaykreps @jaykreps

Editor's Notes

Who are you?What is this talk about? What is a log and what is it good forExciting topic
Producers, consumers distributed
First project was an open source clone of Amazon Dynamo (Project Voldemort)Makes explaining things easier
1 Pipeline for database data1 Pipeline for metrics1 Pipeline for events1 Pipeline for real-time processingNo pipeline for application logs300ActiveMQ brokers
10,000 messages/sec * 100 byte messages = ~1MB/sec
200 Kafka-related projects on github1000+ emails/month
The log is fundamental abstraction Kafka providesYou can use a log as a drop-in replacement for a messaging system, but it can also do a lot more
What is a log?Traditional uses?Non-traditional uses…
Time orderedSemi-structured
List of changesContents of record doesn’t matterIndexed by “time”Not application log (i.e. text file)
Data model of Kafka: A topicPartitions can be spread over machines, replicated
The whole system is one big distributed system
Paxos,Zookeeper (Zab), Raft, etc.Traditional databases, Hbase/Bigtable, Spanner, HDFS namenode, etcLog has two purposes:ReplicationConsistency
Very important problem
What if replica is down?Ordering is importantTime is important
Log is list of changesKey point: can re-create any point-in-timeIn banking: credits and debitsIn software: the version control changelog
State-machine replication: log the incoming requests (logical logging)
Log the changed rows (physical logging)
Outside of distributed systems internals…
AKA ETLMany systemsEvent dataMost important problem for data-centric companiesIntegration >> ML
Two exacerbating factors
One-size fits all
Database cache coherencyData deployment from HadoopNever get to full connectivity
Metcalfe’s lawAll data in multi-subscriber, real-time logsThe company is a big distributed system
Batch is dominant paradigm for data processing, why?First thing you want when you have real-time data streams is real-time transformations
1790Collected data by Networks=>stream processing3,929,214 people$44kHorses and wagons are a high latency, batch channel
Service: One input = one outputBatch job: All inputs = all outputsStream computing: any window = output for that window
Importance of the log—buffering, multisubscriberOutput goes to a live serving system or another batch processing system (Hadoop, DWH)Examples: RecommendationsEmailMonitoringSecurity
Storm and SamzaAbout process management – both integrate with KafkaMapReduce and HDFS

I Heart Log: Real-time Data and Apache Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to I Heart Log: Real-time Data and Apache Kafka

Similar to I Heart Log: Real-time Data and Apache Kafka (20)

Recently uploaded

Recently uploaded (20)

I Heart Log: Real-time Data and Apache Kafka

Editor's Notes