SlideShare a Scribd company logo
1 of 25
Roma – 7 Febbraio 2017
presenta Alberto Paro, Seacom
Elastic & Spark.
Building A Search Geo Locator
Alberto Paro
 Laureato in Ingegneria Informatica (POLIMI)
 Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech
review
 Lavoro principalmente in Scala e su tecnologie BD
(Akka, Spray.io, Playframework, Apache Spark) e NoSQL
(Accumulo, Cassandra, ElasticSearch e MongoDB)
 Evangelist linguaggio Scala e Scala.JS
Elasticseach 5.x - Cookbook
 Choose the best ElasticSearch cloud topology to deploy and power it up
with external plugins
 Develop tailored mapping to take full control of index steps
 Build complex queries through managing indices and documents
 Optimize search results through executing analytics aggregations
 Monitor the performance of the cluster and nodes
 Install Kibana to monitor cluster and extend Kibana for plugins.
 Integrate ElasticSearch in Java, Scala, Python and Big Data applications
Discount code for Ebook: ALPOEB50
Discount code for Print Book: ALPOPR15
Expiration Date: 21st Feb 2017
Obiettivi
 Architetture Big Data con ES
 Apache Spark
 GeoIngester
 Data Collection
 Ottimizzazione Indici
 Ingestion via Apache Spark
 Ricerca per un luogo
 Cenni di Big Data Tools
Architettura
Hadoop / Spark
Input
Iter 1
HDFS
Iter 2
HDFS
HDFS
Read
HDFS
Read
HDFS
Write
HDFS
Write
Input
Iter 1 Iter 2
Hadoop MapReduce
Apache Spark
Evoluzione del modello Map Reduce
Apache Spark
 Scritto in Scala con API in Java, Python e R
 Evoluzione del modello Map/Reduce
 Potenti moduli a corredo:
 Spark SQL
 Spark Streaming
 MLLib (Machine Learning)
 GraphX (graph)
Geoname
GeoNames è un database geografico, scaricabile gratuitamente sotto
licenza creative commons.
Contiene circa 10 millioni di nomi geografici e consiste di circa 9
milioni di feature uniqche di cui 2.8 milioni di posti popolati e 5.5
millioni di nomi alternativi.
Può essere facilmente scaricato da
http://download.geonames.org/export/dump come file CSV.
Il codice è disponibile all’indirizzo:
https://github.com/aparo/elasticsearch-geonames-locator
Geoname - Struttura
No. Attribute name Explanation
1 geonameid Unique ID for this geoname
2 name The name of the geoname
3 asciiname ASCII representation of the name
4 alternatenames Other forms of this name. Generally in several languages
5 latitude Latitude in decimal degrees of the Geoname
6 longitude Longitude in decimal degrees of the Geoname
7 fclass Feature class see http://www.geonames.org/export/codes.html
8 fcode Feature code see http://www.geonames.org/export/codes.html
9 country ISO-3166 2-letter country code
10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code
11 admin1 Fipscode (subject to change to iso code
12 admin2 Code for the second administrative division, a county in the US
13 admin3 Code for third level administrative division
14 admin4 Code for fourth level administrative division
14 population The Population of Geoname
14 elevation The elevation in meters of Geoname
14 gtopo30 Digital elevation model
14 timezone The timezone of Geoname
14 moddate The date of last change of this Geoname
Ottimizzazione indici – 1/2
Necessario per:
 Rimuove campi non richiesti.
 Gestire campi Geo Point.
 Ottimizzare i campi stringa (text, keyword)
 Numeri shard corretto (11M records => 2 shards)
Vantaggi => performances/spazio/CPU
Ottimizzazione indici – 2/2
{
"mappings": {
"geoname": {
"properties": {
"admin1": {
"type": "keyword",
"ignore_above": 256
},
…
"alternatenames": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
…
…
"location": {
"type": "geo_point"
},
…
"longitude": {
"type": "float"
},
"moddate": {
"type": "date"
},
Ingestion via Spark – GeonameIngester – 1/7
Il nostro ingester eseguirà i seguenti steps:
 Inizializzazione Job Spark
 Parse del CSV
 Definizione della struttura di indicizzazione
 Popolamento delle classi
 Scrittura dati in Elasticsearch
 Esecuzione del Job Spark
Ingestion via Spark – GeonameIngester – 2/7
Inizializzazione di un Job Spark
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.elasticsearch.spark.rdd.EsSpark
import scala.util.Try
object GeonameIngester {
def main(args: Array[String]) {
val sparkSession = SparkSession.builder
.master("local")
.appName("GeonameIngester")
.getOrCreate()
Ingestion via Spark – GeonameIngester – 3/7
Parse del CSV
val geonameSchema = StructType(Array(
StructField("geonameid", IntegerType, false),
StructField("name", StringType, false),
StructField("asciiname", StringType, true),
StructField("alternatenames", StringType, true),
StructField("latitude", FloatType, true), ….
val GEONAME_PATH = "downloads/allCountries.txt"
val geonames = sparkSession.sqlContext.read
.option("header", false)
.option("quote", "")
.option("delimiter", "t").option("maxColumns", 22)
.schema(geonameSchema)
.csv(GEONAME_PATH)
.cache()
Ingestion via Spark – GeonameIngester – 4/7
Definizione delle nostre classi per l’Inidicizzazione
case class GeoPoint(lat: Double, lon: Double)
case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String],
latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String,
cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4:
Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate:
String)
implicit def emptyToOption(value: String): Option[String] = {
if (value == null) return None
val clean = value.trim
if (clean.isEmpty) { None } else { Some(clean)}
}
Ingestion via Spark – GeonameIngester – 5/7
Definizione delle nostre classi per l’Inidicizzazione
case class GeoPoint(lat: Double, lon: Double)
case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String],
latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String,
cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4:
Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate:
String)
implicit def emptyToOption(value: String): Option[String] = {
if (value == null) return None
val clean = value.trim
if (clean.isEmpty) { None } else { Some(clean)}
}
Ingestion via Spark – GeonameIngester – 6/7
Popolazione delle nostre classi
val records = geonames.map {
row =>
val id = row.getInt(0)
val lat = row.getFloat(4)
val lon = row.getFloat(5)
Geoname(id, row.getString(1), row.getString(2),
Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil),
lat, lon, GeoPoint(lat, lon),
row.getString(6), row.getString(7), row.getString(8), row.getString(9),
row.getString(10), row.getString(11), row.getString(12), row.getString(13),
row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17),
row.getDate(18).toString
)
}
Ingestion via Spark – GeonameIngester – 7/7
Scrittura in Elasticsearch
EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" ->
"geonameid"))
Esecuzione di uno Spark Job
spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator-
assembly-1.0.jar
(~20 minuti su singola macchina)
Ricerca di un luogo
curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{ "term": { "name": "moscow"}},
{ "term": { "alternatenames": "moscow"}},
{ "term": { "asciiname": "moscow" }}
],
"filter": [
{ "term": { "fclass": "P" }},
{ "range": { "population": {"gt": 0}}}
]
}
},
"sort": [ { "population": { "order": "desc"}}]
}'
NoSQL
Key-Value
 Redis
 Voldemort
 Dynomite
 Tokio*
BigTable Clones
 Accumulo
 Hbase
 Cassandra
Document
 CouchDB
 MongoDB
 ElasticSearch
GraphDB
 Neo4j
 OrientDB
 …Graph
Message Queue
 Kafka
 RabbitMQ
 ...MQ
NoSQL - Evolution
MicroServices
Linguaggio – Scala vs Java
public class User {
private String firstName;
private String lastName;
private String email;
private Password password;
public User(String firstName, String lastName,
String email, Password password) {
this.firstName = firstName;
this.lastName = lastName;
this.email = email;
this.password = password;
}
public String getFirstName() {return firstName; }
public void setFirstName(String firstName) { this.firstName = firstName; }
public String getLastName() { return lastName; }
public void setLastName(String lastName) { this.lastName = lastName; }
public String getEmail() { return email; }
public void setEmail(String email) { this.email = email; }
public Password getPassword() { return password; }
public void setPassword(Password password) { this.password = password; }
@Override public String toString() {
return "User [email=" + email + ", firstName=" + firstName + ", lastName=" + lastName + "]"; }
@Override public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : firstName.hashCode());
result = prime * result + ((password == null) ? 0 : password.hashCode());
return result; }
@Override public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
User other = (User) obj;
if (email == null) {
if (other.email != null)
return false;
} else if (!email.equals(other.email))
return false;
if (password == null) {
if (other.password != null)
return false;
} else if (!password.equals(other.password))
return false;
if (firstName == null) {
if (other.firstName != null)
return false;
} else if (!firstName.equals(other.firstName))
return false;
if (lastName == null) {
case class User(
var firstName:String,
var lastName:String,
var email:String,
var password:Password)
JAVASCALA
Grazie per
l’attenzione
Alberto Paro
Q&A

More Related Content

What's hot

Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrDatabricks
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...Spark Summit
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonDatabricks
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbJen Aman
 
Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15Nezih Yigitbasi
 
Lessons from Running Large Scale Spark Workloads
Lessons from Running Large Scale Spark WorkloadsLessons from Running Large Scale Spark Workloads
Lessons from Running Large Scale Spark WorkloadsDatabricks
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Communityjeykottalam
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Databricks
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Databricks
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)Spark Summit
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWes McKinney
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0 Databricks
 

What's hot (20)

Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15
 
Lessons from Running Large Scale Spark Workloads
Lessons from Running Large Scale Spark WorkloadsLessons from Running Large Scale Spark Workloads
Lessons from Running Large Scale Spark Workloads
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
 

Viewers also liked

ElasticSearch on AWS
ElasticSearch on AWSElasticSearch on AWS
ElasticSearch on AWSPhilipp Garbe
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. Dscoopnewsgroup
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewJosh Elser
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionJoão Gabriel Lima
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] SparktaStratio
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Neil Andrassy
 
Webinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkWebinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkMongoDB
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptxMoldovan Radu Adrian
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageSandeep Patil
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 

Viewers also liked (20)

ElasticSearch on AWS
ElasticSearch on AWSElasticSearch on AWS
ElasticSearch on AWS
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
 
SQRRL threat hunting platform
SQRRL threat hunting platformSQRRL threat hunting platform
SQRRL threat hunting platform
 
Near Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and InterpretationNear Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and Interpretation
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
 
Webinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkWebinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for Spark
 
963
963963
963
 
Introduction to Accumulo
Introduction to AccumuloIntroduction to Accumulo
Introduction to Accumulo
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 

Similar to 2017 02-07 - elastic & spark. building a search geo locator

Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Jesper Kamstrup Linnet
 
Best of build 2021 - C# 10 & .NET 6
Best of build 2021 -  C# 10 & .NET 6Best of build 2021 -  C# 10 & .NET 6
Best of build 2021 - C# 10 & .NET 6Moaid Hathot
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Sunghyouk Bae
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs偉格 高
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Kotlin, smarter development for the jvm
Kotlin, smarter development for the jvmKotlin, smarter development for the jvm
Kotlin, smarter development for the jvmArnaud Giuliani
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghStuart Roebuck
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsBartosz Kosarzycki
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016STX Next
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecLoïc Descotte
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Guillaume Laforge
 
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6Dmitry Soshnikov
 

Similar to 2017 02-07 - elastic & spark. building a search geo locator (20)

Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?
 
Scala - en bedre Java?
Scala - en bedre Java?Scala - en bedre Java?
Scala - en bedre Java?
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Best of build 2021 - C# 10 & .NET 6
Best of build 2021 -  C# 10 & .NET 6Best of build 2021 -  C# 10 & .NET 6
Best of build 2021 - C# 10 & .NET 6
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017
 
3 database-jdbc(1)
3 database-jdbc(1)3 database-jdbc(1)
3 database-jdbc(1)
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Scala in Places API
Scala in Places APIScala in Places API
Scala in Places API
 
Kotlin, smarter development for the jvm
Kotlin, smarter development for the jvmKotlin, smarter development for the jvm
Kotlin, smarter development for the jvm
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup Edinburgh
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projects
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Java
JavaJava
Java
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
 
TechTalk - Dotnet
TechTalk - DotnetTechTalk - Dotnet
TechTalk - Dotnet
 

More from Alberto Paro

LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19Alberto Paro
 
2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patternsAlberto Paro
 
Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Alberto Paro
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup Alberto Paro
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
 
2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big DataAlberto Paro
 
What's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeWhat's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeAlberto Paro
 
ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014Alberto Paro
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSAlberto Paro
 

More from Alberto Paro (10)

Data streaming
Data streamingData streaming
Data streaming
 
LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19
 
2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns
 
Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data
 
What's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeWhat's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - Firenze
 
ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJS
 

Recently uploaded

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Recently uploaded (20)

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

2017 02-07 - elastic & spark. building a search geo locator

  • 1. Roma – 7 Febbraio 2017 presenta Alberto Paro, Seacom Elastic & Spark. Building A Search Geo Locator
  • 2. Alberto Paro  Laureato in Ingegneria Informatica (POLIMI)  Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech review  Lavoro principalmente in Scala e su tecnologie BD (Akka, Spray.io, Playframework, Apache Spark) e NoSQL (Accumulo, Cassandra, ElasticSearch e MongoDB)  Evangelist linguaggio Scala e Scala.JS
  • 3. Elasticseach 5.x - Cookbook  Choose the best ElasticSearch cloud topology to deploy and power it up with external plugins  Develop tailored mapping to take full control of index steps  Build complex queries through managing indices and documents  Optimize search results through executing analytics aggregations  Monitor the performance of the cluster and nodes  Install Kibana to monitor cluster and extend Kibana for plugins.  Integrate ElasticSearch in Java, Scala, Python and Big Data applications Discount code for Ebook: ALPOEB50 Discount code for Print Book: ALPOPR15 Expiration Date: 21st Feb 2017
  • 4. Obiettivi  Architetture Big Data con ES  Apache Spark  GeoIngester  Data Collection  Ottimizzazione Indici  Ingestion via Apache Spark  Ricerca per un luogo  Cenni di Big Data Tools
  • 6. Hadoop / Spark Input Iter 1 HDFS Iter 2 HDFS HDFS Read HDFS Read HDFS Write HDFS Write Input Iter 1 Iter 2 Hadoop MapReduce Apache Spark Evoluzione del modello Map Reduce
  • 7. Apache Spark  Scritto in Scala con API in Java, Python e R  Evoluzione del modello Map/Reduce  Potenti moduli a corredo:  Spark SQL  Spark Streaming  MLLib (Machine Learning)  GraphX (graph)
  • 8. Geoname GeoNames è un database geografico, scaricabile gratuitamente sotto licenza creative commons. Contiene circa 10 millioni di nomi geografici e consiste di circa 9 milioni di feature uniqche di cui 2.8 milioni di posti popolati e 5.5 millioni di nomi alternativi. Può essere facilmente scaricato da http://download.geonames.org/export/dump come file CSV. Il codice è disponibile all’indirizzo: https://github.com/aparo/elasticsearch-geonames-locator
  • 9. Geoname - Struttura No. Attribute name Explanation 1 geonameid Unique ID for this geoname 2 name The name of the geoname 3 asciiname ASCII representation of the name 4 alternatenames Other forms of this name. Generally in several languages 5 latitude Latitude in decimal degrees of the Geoname 6 longitude Longitude in decimal degrees of the Geoname 7 fclass Feature class see http://www.geonames.org/export/codes.html 8 fcode Feature code see http://www.geonames.org/export/codes.html 9 country ISO-3166 2-letter country code 10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code 11 admin1 Fipscode (subject to change to iso code 12 admin2 Code for the second administrative division, a county in the US 13 admin3 Code for third level administrative division 14 admin4 Code for fourth level administrative division 14 population The Population of Geoname 14 elevation The elevation in meters of Geoname 14 gtopo30 Digital elevation model 14 timezone The timezone of Geoname 14 moddate The date of last change of this Geoname
  • 10. Ottimizzazione indici – 1/2 Necessario per:  Rimuove campi non richiesti.  Gestire campi Geo Point.  Ottimizzare i campi stringa (text, keyword)  Numeri shard corretto (11M records => 2 shards) Vantaggi => performances/spazio/CPU
  • 11. Ottimizzazione indici – 2/2 { "mappings": { "geoname": { "properties": { "admin1": { "type": "keyword", "ignore_above": 256 }, … "alternatenames": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, … … "location": { "type": "geo_point" }, … "longitude": { "type": "float" }, "moddate": { "type": "date" },
  • 12. Ingestion via Spark – GeonameIngester – 1/7 Il nostro ingester eseguirà i seguenti steps:  Inizializzazione Job Spark  Parse del CSV  Definizione della struttura di indicizzazione  Popolamento delle classi  Scrittura dati in Elasticsearch  Esecuzione del Job Spark
  • 13. Ingestion via Spark – GeonameIngester – 2/7 Inizializzazione di un Job Spark import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ import org.elasticsearch.spark.rdd.EsSpark import scala.util.Try object GeonameIngester { def main(args: Array[String]) { val sparkSession = SparkSession.builder .master("local") .appName("GeonameIngester") .getOrCreate()
  • 14. Ingestion via Spark – GeonameIngester – 3/7 Parse del CSV val geonameSchema = StructType(Array( StructField("geonameid", IntegerType, false), StructField("name", StringType, false), StructField("asciiname", StringType, true), StructField("alternatenames", StringType, true), StructField("latitude", FloatType, true), …. val GEONAME_PATH = "downloads/allCountries.txt" val geonames = sparkSession.sqlContext.read .option("header", false) .option("quote", "") .option("delimiter", "t").option("maxColumns", 22) .schema(geonameSchema) .csv(GEONAME_PATH) .cache()
  • 15. Ingestion via Spark – GeonameIngester – 4/7 Definizione delle nostre classi per l’Inidicizzazione case class GeoPoint(lat: Double, lon: Double) case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean)} }
  • 16. Ingestion via Spark – GeonameIngester – 5/7 Definizione delle nostre classi per l’Inidicizzazione case class GeoPoint(lat: Double, lon: Double) case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean)} }
  • 17. Ingestion via Spark – GeonameIngester – 6/7 Popolazione delle nostre classi val records = geonames.map { row => val id = row.getInt(0) val lat = row.getFloat(4) val lon = row.getFloat(5) Geoname(id, row.getString(1), row.getString(2), Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil), lat, lon, GeoPoint(lat, lon), row.getString(6), row.getString(7), row.getString(8), row.getString(9), row.getString(10), row.getString(11), row.getString(12), row.getString(13), row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17), row.getDate(18).toString ) }
  • 18. Ingestion via Spark – GeonameIngester – 7/7 Scrittura in Elasticsearch EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" -> "geonameid")) Esecuzione di uno Spark Job spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator- assembly-1.0.jar (~20 minuti su singola macchina)
  • 19. Ricerca di un luogo curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "minimum_should_match": 1, "should": [ { "term": { "name": "moscow"}}, { "term": { "alternatenames": "moscow"}}, { "term": { "asciiname": "moscow" }} ], "filter": [ { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }'
  • 20. NoSQL Key-Value  Redis  Voldemort  Dynomite  Tokio* BigTable Clones  Accumulo  Hbase  Cassandra Document  CouchDB  MongoDB  ElasticSearch GraphDB  Neo4j  OrientDB  …Graph Message Queue  Kafka  RabbitMQ  ...MQ
  • 23. Linguaggio – Scala vs Java public class User { private String firstName; private String lastName; private String email; private Password password; public User(String firstName, String lastName, String email, Password password) { this.firstName = firstName; this.lastName = lastName; this.email = email; this.password = password; } public String getFirstName() {return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } public Password getPassword() { return password; } public void setPassword(Password password) { this.password = password; } @Override public String toString() { return "User [email=" + email + ", firstName=" + firstName + ", lastName=" + lastName + "]"; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((email == null) ? 0 : email.hashCode()); result = prime * result + ((firstName == null) ? 0 : firstName.hashCode()); result = prime * result + ((lastName == null) ? 0 : firstName.hashCode()); result = prime * result + ((password == null) ? 0 : password.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; User other = (User) obj; if (email == null) { if (other.email != null) return false; } else if (!email.equals(other.email)) return false; if (password == null) { if (other.password != null) return false; } else if (!password.equals(other.password)) return false; if (firstName == null) { if (other.firstName != null) return false; } else if (!firstName.equals(other.firstName)) return false; if (lastName == null) { case class User( var firstName:String, var lastName:String, var email:String, var password:Password) JAVASCALA
  • 25. Q&A

Editor's Notes

  1. Spark SQL (DB, Json, case class)
  2. Vantaggi: 100% opensource Svantaggi: Breaking modulli
  3. Spark SQL (DB, Json, case class)
  4. Spark SQL (DB, Json, case class)
  5. Spark SQL (DB, Json, case class)
  6. Spark SQL (DB, Json, case class)
  7. Spark SQL (DB, Json, case class)
  8. Spark SQL (DB, Json, case class)
  9. Spark SQL (DB, Json, case class)
  10. Spark SQL (DB, Json, case class)
  11. Spark SQL (DB, Json, case class)
  12. Spark SQL (DB, Json, case class)
  13. Spark SQL (DB, Json, case class)
  14. Spark SQL (DB, Json, case class)
  15. Spark SQL (DB, Json, case class)
  16. Key Value: Focus on scaling to huge amounts of data Designed to handle massive load Based on Amazon’s Dynamo paper Data model: (global) collection of Key-Value pairs Dynamo ring partitioning and replication Big Table Clones Like column oriented Relational Databases, but with a twist Tables similarly to RDBMS, but handles semi-structured ๏Based on Google’s BigTable paper Data model: ‣Columns → column families → ACL ‣Datums keyed by: row, column, time, index ‣Row-range → tablet → distribution Document Similar to Key-Value stores,but the DB knows what theValue is Inspired by Lotus Notes Data model: Collections of Key-Value collections Documents are often versioned GraphDB Focus on modeling the structure of data – interconnectivity Scales to the complexity of the data Inspired by mathematical Graph Theory ( G=(E,V) ) Data model: “Property Graph” ‣Nodes ‣Relationships/Edges between Nodes (first class) ‣Key-Value pairs on both ‣Possibly Edge Labels and/or Node/Edge Types