SlideShare a Scribd company logo
1 of 44
Download to read offline
Logisland
event-mining@scale
@baile'homas
Schedule
• Introduc*on
• Core concepts
• Knowledge Paradigm
• API Design
• Quick start
Introduc)on
Logisland Big picture
• Mul$-Purpose real'me analy'cs framework
• High scalability and Fault-tolerant.
• High throughput (billions messages / day).
• Easy to operate on Hadoop or on standalone containers
• Easily Extensible to build high level apps
• Open source, ini'ated by Hurence
Use cases
• Log aggrega(on : low latency log processing
• Stream processing : mul4ple stages of processing (enriching, ...)
• Complex Event processing : write custom business Rules to
generate alerts, for fraud detec4on
• click stream tracking : capture user click stream data
• SIEM : security manager for intrusion detec4on
• IoT : generate alerts based on outliers and forecas4ng.
Challengers
• ELK is great to start with, but hard to centralize processing and
lacks of real offline ML
• Splunk is fantas;c but clients are not rich enough to afford it ;)
• NIFI is a great tool but doesn't play well with distributed
processing
• Metron, Eagle are security centric
Features
• out-of-the-box components (no code required)
• high level extensible framework
• raw data to structured records automa9c conversion
• alert percola9on or query matching
• event governance with Avro schema management
• online predic9on with offline trained ML models
Features 2
• I/O to Elas,csearch, HBase, HDFS, RocksDB, ...
• telemetry sources (bro, pcap, neAlow)
• live enrichement (geoip, custom lookups)
• SQL aggrega,ons
• Time series sampling
• Outliers detec,on
• Network footprint clustering
Core
Concepts
event =
chronological change
in the system state
log =
centralized registry of
chronologically ordered events
Log centric architecture
• async event produc-on and
consump-on.
• uncorrelated publishers and
subscribers.
• acts as a Messaging system.
• replay the log from any point in -me.
• real2me event availability.
How to handle distributed logs ?
Logisland =
high level stream analy/cs solu/on
to handle massive scale
event processing
Pyramidal Knowledge
Paradigm
Logisland con,nuously transforms
data into informa+on &
informa(on into knowledge
by using asynchronous processing on
increasingly abstract
and meaningful records.
(credits : David McCandless, Informa7on is Beau7ful)
Data Driven Compu/ng
API
Design
Record
The basic unit of processing is the Record.
A Record is a collec7on of Field, while a Field has a name, a
type and a value.
String id = "firewall_record1";
String type = "cisco";
Record record = new Record(type).setId(id);
assertTrue(record.isEmpty());
assertEquals(record.size(), 0);
Field
A record holds a collec,on of fields.
record.setStringField("url_host", "origin-www.20minutes.fr")
.setField("method", FieldType.STRING, "GET")
.setField("response_size", FieldType.INT, 452)
.setField("is_outside_office_hours", FieldType.BOOLEAN, false)
.setField("tags",
FieldType.ARRAY,
Arrays.asList("spam", "filter", "mail"));
assertEquals( record.getField("method").asString(), "GET");
assertTrue( record.getField("response_size").asInteger() - 452 == 0);
record.removeField("is_outside_office_hours");
assertFalse( record.hasField("is_outside_office_hours"));
Special Field
A Record also has some special fields (type, 5me and id).
// shortcut for id
assertEquals(record.getId(), id);
assertEquals(record.getField(FieldDictionary.RECORD_ID).asString(),
id);
// shortcut for time
assertEquals(record.getTime(),
record.getField(FieldDictionary.RECORD_TIME).asLong());
// shortcut for type
assertEquals(record.getType(), type);
Field typing and valida.on
Fields are strongly typed, you can validate them
Record record = new StandardRecord();
record.setField("request_size", FieldType.INT, 1399);
assertTrue(record.isValid());
record.setField("request_size", FieldType.INT, "tom");
assertFalse(record.isValid());
record.setField("request_size", FieldType.DOUBLE, 45.5d);
assertTrue(record.isValid());
record.setField("request_size", FieldType.STRING, 45L);
assertFalse(record.isValid());
Processor
Logisland is a component centric framework,
It's built over an abstrac1on layer to build configurable
components.
A component can be Configurable and Configured.
The most common component you'll use is the Processor which
takes a collec4on of Record and publish another collec4on of
records
A configurable component that process
Records
public interface Processor extends ConfigurableComponent {
void init(final ProcessContext context);
Collection<Record> process(ProcessContext context,
Collection<Record> records);
}
SplitText implementa-on
Define PropertyDescriptor to handle components config.
@Tags({"parser", "regex", "log", "record"})
@CapabilityDescription("This is a processor that is used ...")
@DynamicProperty(name = "alternative regex & mapping", ...)
public class SplitText extends AbstractProcessor {
public static final PropertyDescriptor VALUE_REGEX =
new PropertyDescriptor.Builder()
.name("value.regex")
.description("the regex to match for the message value")
.required(true)
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
.build();
...
}
SplitText config
Use the components with simple yaml blocs.
- processor: apache_parser
component: com.hurence.logisland.processor.SplitText
type: parser
documentation: a parser for apache log REGEX
configuration:
record.type: apache_log
value.regex: (S+)s+(S+)s+(S+)s+[([w:/] ...
value.fields: src_ip,identd,user,record_time,http_method, ...
Stream
a record Stream basically :
• is a configurable Component
• reads a distributed collec5on of Record from Ka8a input topics
• transmits them to a chain of Processor
• write the output collec5on of Record to some Ka8a output
topics
Streaming paradigm
You can handle par.onned data in 2 ways :
• fully in parrallel, eg. a thread by par//on, like with
KafkaRecordStreamParallelProcessing, when records
have no link with each other
• by joining par//ons like with
KafkaRecordStreamSQLAggregator or
KafkaRecordStreamHDFSBurner when you need to join
related records (costly join and shuffling opera/ons)
Sample Stream configura0on
Define a processing pipeline
- stream: parsing_stream
component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing
type: stream
documentation: a processor that links
configuration:
kafka.input.topics: logisland_raw
kafka.output.topics: logisland_events
kafka.error.topics: logisland_errors
kafka.input.topics.serializer: none
kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer
kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer
...
processorConfigurations:
Engine
• The Engine manage a collec-on of Stream
• this is the abstrac-on of the execu-on model, mainly in Spark
actually but plans are to integrate Beam to move on Storm and
Ka?a Streams
• you configure here your Spark job parameters
Sample engine configura0on
Define a processing job
engine:
component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine
type: engine
documentation: Index some apache logs with logisland
configuration:
spark.app.name: IndexApacheLogsDemo
spark.master: yarn-cluster
spark.driver.memory: 1G
spark.driver.cores: 1
spark.executor.memory: 2G
spark.executor.instances: 4
spark.executor.cores: 2
spark.yarn.queue: default
...
streamConfigurations:
Transverse service injec,on : ControllerService
we o%en need to share access to external Services across the
Processors, for example
• bulk buffers or client connec0ons to external data
• a cache service that could cache K/V tuple across the worker
node.
Sample ControllerService component
We need to provide an interface API for this service :
public interface CacheService<K,V> extends ControllerService {
PropertyDescriptor CACHE_SIZE = new PropertyDescriptor.Builder()
.name("cache.size")
.description("The maximum number of element in the cache.")
.required(false)
.defaultValue("16384")
.addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
.build();
public V get(K k);
public void set(K k, V v);
}
Inject service in Processor
You can then use this service in a custom processor :
public class TestProcessor extends AbstractProcessor {
static final PropertyDescriptor CACHE_SERVICE = new PropertyDescriptor.Builder()
.name("cache.service")
.description("CacheService")
.identifiesControllerService(CacheService.class)
.required(true)
.build();
@Override
public boolean hasControllerService() {
return true;
}
}
Define service in config
The injec)on is done through yaml config files by injec)ng the
instance of lru_cache Service.
controllerServiceConfigurations:
- controllerService: lru_cache
component: com.hurence.logisland.service.elasticsearch.LRUKeyValueCacheService
configuration:
cache.size: 5000
streamConfigurations:
- stream: parsing_stream
processorConfigurations:
- processor: mock_processor
component: com.hurence.logisland.processor.TestProcessorhing
configuration:
cache.service: lru_cache
Quick
Start
Ge#ng started (Hadoop cluster)
Download the latest release from github
tar -xzf logisland-0.10.1-bin.tar.gz
Create a job configura/on
vim conf/index-apache-logs.yml
Run the job
export SPARK_HOME=/usr/hdp/current/spark-client
bin/logisland.sh --conf conf/index-apache-logs.yml
Ge#ng started (lightweight container)
Pull & run the image from Docker Repository
docker pull hurence/logisland
docker run -it --name logisland 
-p 8080:8080 -p 5601:5601 -p 9200:9200 
-h sandbox hurence/logisland bash
Run the job
bin/logisland.sh --conf conf/index-apache-logs.yml
Next ?
Roadmap
• Ambari Agent for job dynamic interac3on (REST Api)
• visual Stream configura3on / dashboards through Ambari views
• Auto-scaling to op3mize cluster resources
• Density based automa3c Usage profiling
• PaHern discovery through Deep Learning
• App store, per use-case knowledge bundles (cybersecurity,
fraud, ...)
Resources
• source : h%ps://github.com/Hurence/logisland/releases
• Docker : h%ps://hub.docker.com/r/hurence/logisland/tags/
• Documenta-on : h%p://logisland.readthedocs.io/en/latest/
concepts.html
• support : h%ps://gi%er.im/logisland/logisland
• contact : bailet.thomas@gmail.com
Q&A

More Related Content

What's hot

Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
Getting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesGetting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesEastBanc Tachnologies
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...HostedbyConfluent
 
First Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud PlatformFirst Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud Platformconfluent
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...HostedbyConfluent
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...HostedbyConfluent
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processingconfluent
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...HostedbyConfluent
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Cloud Native Application Development - build fast, low TCO, scalable & agile ...Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Cloud Native Application Development - build fast, low TCO, scalable & agile ...Lucas Jellema
 
Big Data, HPC and Streaming
Big Data, HPC and StreamingBig Data, HPC and Streaming
Big Data, HPC and StreamingAnjani Phuyal
 

What's hot (20)

Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Getting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesGetting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics services
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
 
First Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud PlatformFirst Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud Platform
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory Reporting
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
CCT Check and Calculate Transfer
CCT Check and Calculate TransferCCT Check and Calculate Transfer
CCT Check and Calculate Transfer
 
AWS Kinesis Streams
AWS Kinesis StreamsAWS Kinesis Streams
AWS Kinesis Streams
 
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Cloud Native Application Development - build fast, low TCO, scalable & agile ...Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
 
Big Data, HPC and Streaming
Big Data, HPC and StreamingBig Data, HPC and Streaming
Big Data, HPC and Streaming
 

Viewers also liked

Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...DataWorks Summit/Hadoop Summit
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Kai Wähner
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Kai Wähner
 

Viewers also liked (6)

Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
 

Similar to Logisland "Event Mining at scale"

From Kafka to BigQuery - Strata Singapore
From Kafka to BigQuery - Strata SingaporeFrom Kafka to BigQuery - Strata Singapore
From Kafka to BigQuery - Strata SingaporeOfir Sharony
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdfStephanie Locke
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020Thodoris Bais
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWAREFIWARE
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudDigital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudVelocidex Enterprises
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinSigma Software
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservicesmarius_bogoevici
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flinkRenato Guimaraes
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3uzzal basak
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2Elana Krasner
 
Divide and Conquer – Microservices with Node.js
Divide and Conquer – Microservices with Node.jsDivide and Conquer – Microservices with Node.js
Divide and Conquer – Microservices with Node.jsSebastian Springer
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 

Similar to Logisland "Event Mining at scale" (20)

From Kafka to BigQuery - Strata Singapore
From Kafka to BigQuery - Strata SingaporeFrom Kafka to BigQuery - Strata Singapore
From Kafka to BigQuery - Strata Singapore
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudDigital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The Cloud
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita Galkin
 
Aplicaciones distribuidas con Dapr
Aplicaciones distribuidas con DaprAplicaciones distribuidas con Dapr
Aplicaciones distribuidas con Dapr
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservices
 
Vert.x devoxx london 2013
Vert.x devoxx london 2013Vert.x devoxx london 2013
Vert.x devoxx london 2013
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2
 
Divide and Conquer – Microservices with Node.js
Divide and Conquer – Microservices with Node.jsDivide and Conquer – Microservices with Node.js
Divide and Conquer – Microservices with Node.js
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 

Recently uploaded

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Logisland "Event Mining at scale"

  • 2. Schedule • Introduc*on • Core concepts • Knowledge Paradigm • API Design • Quick start
  • 4. Logisland Big picture • Mul$-Purpose real'me analy'cs framework • High scalability and Fault-tolerant. • High throughput (billions messages / day). • Easy to operate on Hadoop or on standalone containers • Easily Extensible to build high level apps • Open source, ini'ated by Hurence
  • 5. Use cases • Log aggrega(on : low latency log processing • Stream processing : mul4ple stages of processing (enriching, ...) • Complex Event processing : write custom business Rules to generate alerts, for fraud detec4on • click stream tracking : capture user click stream data • SIEM : security manager for intrusion detec4on • IoT : generate alerts based on outliers and forecas4ng.
  • 6. Challengers • ELK is great to start with, but hard to centralize processing and lacks of real offline ML • Splunk is fantas;c but clients are not rich enough to afford it ;) • NIFI is a great tool but doesn't play well with distributed processing • Metron, Eagle are security centric
  • 7. Features • out-of-the-box components (no code required) • high level extensible framework • raw data to structured records automa9c conversion • alert percola9on or query matching • event governance with Avro schema management • online predic9on with offline trained ML models
  • 8. Features 2 • I/O to Elas,csearch, HBase, HDFS, RocksDB, ... • telemetry sources (bro, pcap, neAlow) • live enrichement (geoip, custom lookups) • SQL aggrega,ons • Time series sampling • Outliers detec,on • Network footprint clustering
  • 11. log = centralized registry of chronologically ordered events
  • 12. Log centric architecture • async event produc-on and consump-on. • uncorrelated publishers and subscribers. • acts as a Messaging system. • replay the log from any point in -me. • real2me event availability.
  • 13. How to handle distributed logs ?
  • 14. Logisland = high level stream analy/cs solu/on to handle massive scale event processing
  • 16. Logisland con,nuously transforms data into informa+on & informa(on into knowledge by using asynchronous processing on increasingly abstract and meaningful records.
  • 17. (credits : David McCandless, Informa7on is Beau7ful)
  • 20.
  • 21. Record The basic unit of processing is the Record. A Record is a collec7on of Field, while a Field has a name, a type and a value. String id = "firewall_record1"; String type = "cisco"; Record record = new Record(type).setId(id); assertTrue(record.isEmpty()); assertEquals(record.size(), 0);
  • 22. Field A record holds a collec,on of fields. record.setStringField("url_host", "origin-www.20minutes.fr") .setField("method", FieldType.STRING, "GET") .setField("response_size", FieldType.INT, 452) .setField("is_outside_office_hours", FieldType.BOOLEAN, false) .setField("tags", FieldType.ARRAY, Arrays.asList("spam", "filter", "mail")); assertEquals( record.getField("method").asString(), "GET"); assertTrue( record.getField("response_size").asInteger() - 452 == 0); record.removeField("is_outside_office_hours"); assertFalse( record.hasField("is_outside_office_hours"));
  • 23. Special Field A Record also has some special fields (type, 5me and id). // shortcut for id assertEquals(record.getId(), id); assertEquals(record.getField(FieldDictionary.RECORD_ID).asString(), id); // shortcut for time assertEquals(record.getTime(), record.getField(FieldDictionary.RECORD_TIME).asLong()); // shortcut for type assertEquals(record.getType(), type);
  • 24. Field typing and valida.on Fields are strongly typed, you can validate them Record record = new StandardRecord(); record.setField("request_size", FieldType.INT, 1399); assertTrue(record.isValid()); record.setField("request_size", FieldType.INT, "tom"); assertFalse(record.isValid()); record.setField("request_size", FieldType.DOUBLE, 45.5d); assertTrue(record.isValid()); record.setField("request_size", FieldType.STRING, 45L); assertFalse(record.isValid());
  • 25. Processor Logisland is a component centric framework, It's built over an abstrac1on layer to build configurable components. A component can be Configurable and Configured. The most common component you'll use is the Processor which takes a collec4on of Record and publish another collec4on of records
  • 26. A configurable component that process Records public interface Processor extends ConfigurableComponent { void init(final ProcessContext context); Collection<Record> process(ProcessContext context, Collection<Record> records); }
  • 27. SplitText implementa-on Define PropertyDescriptor to handle components config. @Tags({"parser", "regex", "log", "record"}) @CapabilityDescription("This is a processor that is used ...") @DynamicProperty(name = "alternative regex & mapping", ...) public class SplitText extends AbstractProcessor { public static final PropertyDescriptor VALUE_REGEX = new PropertyDescriptor.Builder() .name("value.regex") .description("the regex to match for the message value") .required(true) .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); ... }
  • 28. SplitText config Use the components with simple yaml blocs. - processor: apache_parser component: com.hurence.logisland.processor.SplitText type: parser documentation: a parser for apache log REGEX configuration: record.type: apache_log value.regex: (S+)s+(S+)s+(S+)s+[([w:/] ... value.fields: src_ip,identd,user,record_time,http_method, ...
  • 29. Stream a record Stream basically : • is a configurable Component • reads a distributed collec5on of Record from Ka8a input topics • transmits them to a chain of Processor • write the output collec5on of Record to some Ka8a output topics
  • 30. Streaming paradigm You can handle par.onned data in 2 ways : • fully in parrallel, eg. a thread by par//on, like with KafkaRecordStreamParallelProcessing, when records have no link with each other • by joining par//ons like with KafkaRecordStreamSQLAggregator or KafkaRecordStreamHDFSBurner when you need to join related records (costly join and shuffling opera/ons)
  • 31. Sample Stream configura0on Define a processing pipeline - stream: parsing_stream component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing type: stream documentation: a processor that links configuration: kafka.input.topics: logisland_raw kafka.output.topics: logisland_events kafka.error.topics: logisland_errors kafka.input.topics.serializer: none kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer ... processorConfigurations:
  • 32. Engine • The Engine manage a collec-on of Stream • this is the abstrac-on of the execu-on model, mainly in Spark actually but plans are to integrate Beam to move on Storm and Ka?a Streams • you configure here your Spark job parameters
  • 33. Sample engine configura0on Define a processing job engine: component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine type: engine documentation: Index some apache logs with logisland configuration: spark.app.name: IndexApacheLogsDemo spark.master: yarn-cluster spark.driver.memory: 1G spark.driver.cores: 1 spark.executor.memory: 2G spark.executor.instances: 4 spark.executor.cores: 2 spark.yarn.queue: default ... streamConfigurations:
  • 34. Transverse service injec,on : ControllerService we o%en need to share access to external Services across the Processors, for example • bulk buffers or client connec0ons to external data • a cache service that could cache K/V tuple across the worker node.
  • 35. Sample ControllerService component We need to provide an interface API for this service : public interface CacheService<K,V> extends ControllerService { PropertyDescriptor CACHE_SIZE = new PropertyDescriptor.Builder() .name("cache.size") .description("The maximum number of element in the cache.") .required(false) .defaultValue("16384") .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) .build(); public V get(K k); public void set(K k, V v); }
  • 36. Inject service in Processor You can then use this service in a custom processor : public class TestProcessor extends AbstractProcessor { static final PropertyDescriptor CACHE_SERVICE = new PropertyDescriptor.Builder() .name("cache.service") .description("CacheService") .identifiesControllerService(CacheService.class) .required(true) .build(); @Override public boolean hasControllerService() { return true; } }
  • 37. Define service in config The injec)on is done through yaml config files by injec)ng the instance of lru_cache Service. controllerServiceConfigurations: - controllerService: lru_cache component: com.hurence.logisland.service.elasticsearch.LRUKeyValueCacheService configuration: cache.size: 5000 streamConfigurations: - stream: parsing_stream processorConfigurations: - processor: mock_processor component: com.hurence.logisland.processor.TestProcessorhing configuration: cache.service: lru_cache
  • 39. Ge#ng started (Hadoop cluster) Download the latest release from github tar -xzf logisland-0.10.1-bin.tar.gz Create a job configura/on vim conf/index-apache-logs.yml Run the job export SPARK_HOME=/usr/hdp/current/spark-client bin/logisland.sh --conf conf/index-apache-logs.yml
  • 40. Ge#ng started (lightweight container) Pull & run the image from Docker Repository docker pull hurence/logisland docker run -it --name logisland -p 8080:8080 -p 5601:5601 -p 9200:9200 -h sandbox hurence/logisland bash Run the job bin/logisland.sh --conf conf/index-apache-logs.yml
  • 42. Roadmap • Ambari Agent for job dynamic interac3on (REST Api) • visual Stream configura3on / dashboards through Ambari views • Auto-scaling to op3mize cluster resources • Density based automa3c Usage profiling • PaHern discovery through Deep Learning • App store, per use-case knowledge bundles (cybersecurity, fraud, ...)
  • 43. Resources • source : h%ps://github.com/Hurence/logisland/releases • Docker : h%ps://hub.docker.com/r/hurence/logisland/tags/ • Documenta-on : h%p://logisland.readthedocs.io/en/latest/ concepts.html • support : h%ps://gi%er.im/logisland/logisland • contact : bailet.thomas@gmail.com
  • 44. Q&A