SlideShare a Scribd company logo
1 of 72
Download to read offline
val inputStream = s.readStream
.format("kinesis")
.option("streamName", "stream-name") // Kinesis Stream Name
.option("region", "ap-northeast-1") // Kinesis Region
.option("initialPosition", "TRIM_HORIZON") // Kinesis initial position
.load()
case class SampleSchema(column1: String, column2: Int)
val testSchema = ScalaReflection.schemaFor[SampleSchema].dataType.asInstanceOf[StructType]
val structuredStream = inputStream
.select(from_json(col("data").cast("string"), testSchema).as("data")) // Parse JSON and convert to testSchema
.select("data.*") // select JSON columns
val kclSource = KCLSource("scalamatsuri")
val producer = Flow[UserLog]
.map(_.toJson.compactPrint)
.map(ByteString.fromString(_))
.map(KinesisStreamObject(UUID.randomUUID().toString, _))
.toMat(KinesisProducer("scalamatsuri2", KinesisProducerSetting.fromShardCount(1)))(Keep.right)
val extract = Flow.fromFunction[ByteString, PurchaseLog](_.utf8String.parseJson.convertTo[PurchaseLog])
val convert = Flow[PurchaseLog].mapConcat { log =>
log.items.map { item =>
UserLog(
log.userId,
// properties...
)
}
}
val result =
kclSource
.via(extract)
.via(convert)
.toMat(producer)(Keep.right)
.run()
val stream = inputStream
.select(from_json(col("data").cast("string"), purchaseLogSchema).as("data"))
.select("data.*")
.select(
$"user_id",
$"time",
expr("explode(items)").as("item")
)
.select(
$"user_id",
$"time",
lit("PURCHASE").as("log_type"),
$"item.item_id".as("item_id"),
$"item.price".as("price"),
$"item.quantity".as("quantity"),
unix_timestamp().as("created_time")
)
.as[UserLog] // case class UserLog(
stream
.writeStream
.foreach(sink)
.outputMode("update")
.start()
val http = Http()
val separator = ByteString.fromString(Properties.lineSeparator)
val separatorBytes = separator.length
val MAX_FILE_SIZE = 10 * 1024 * 1024
val MAX_BYTES_PER_HOUR = 100 * 1024 * 1024
val kclSource = KCLSource("scalamatsuri")
val buffering = Flow[ByteString]
.batchWeighted[ByteString](MAX_FILE_SIZE, _.length, seed => seed)(_ ++ separator ++ _)
.throttle(MAX_BYTES_PER_HOUR, Duration(1, HOURS), _.length)
val send: Flow[ByteString, ByteString] = ??? //
val logging = Sink.foreach[ByteString] { body =>
logger.info(body.utf8String)
}
val request = kclSource
.via(buffering)
.via(send)
.toMat(logging)(Keep.right)
.run()
val inputStream = spark.readStream
.format("kinesis")
.option("streamName", scalamatsuriStraem)
.option("region", "ap-northeast-1")
.option("initialPosition", "TRIM_HORIZON")
.option("awsAccessKey", awsAccessKeyId)
.option("awsSecretKey", awsSecretKey)
.option("maxRecordsPerFetch", 100000)
.option("fetchBufferSize", 100000)
.option("maxFetchDuration", 1.0)
.load()
val strteamWithWM = stream1.as("in1")
.withColumn("dt", to_timestamp(from_unixtime(col("time"))))
.withWatermark("dt", "3 hours")
val stream2WithWM = stream2.as("in2")
.withColumn("dt", to_timestamp(from_unixtime(col("time"))))
.withWatermark("dt", "1 hours")
strteamWithWM.join(
stream2WithWM,
expr("""
in1.user_id = in2.user_id
AND
in1.time < in2.time
AND
in1.time >= in2.time + interval 3 hour
"""),
"leftOuter"
)
Akka streams vs spark structured streaming
Akka streams vs spark structured streaming
Akka streams vs spark structured streaming

More Related Content

What's hot

メタプログラミングって何だろう
メタプログラミングって何だろうメタプログラミングって何だろう
メタプログラミングって何だろうKota Mizushima
 
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)Kuniyasu Suzaki
 
JP1/AJS2オペレータ勉強会
JP1/AJS2オペレータ勉強会JP1/AJS2オペレータ勉強会
JP1/AJS2オペレータ勉強会mizuky fujitani
 
ネットワークエンジニア的Ansibleの始め方
ネットワークエンジニア的Ansibleの始め方ネットワークエンジニア的Ansibleの始め方
ネットワークエンジニア的Ansibleの始め方akira6592
 
SAP Extractorのソースエンドポイントとしての利用
SAP Extractorのソースエンドポイントとしての利用SAP Extractorのソースエンドポイントとしての利用
SAP Extractorのソースエンドポイントとしての利用QlikPresalesJapan
 
Scalaで型クラス入門
Scalaで型クラス入門Scalaで型クラス入門
Scalaで型クラス入門Makoto Fukuhara
 
MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...
MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...
MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...NTT DATA Technology & Innovation
 
Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)Yoshiaki Shibutani
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Mineaki Motohashi
 
C#メタプログラミング概略 in 2021
C#メタプログラミング概略 in 2021C#メタプログラミング概略 in 2021
C#メタプログラミング概略 in 2021Atsushi Nakamura
 
Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)
Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)
Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)akira6592
 
Drupalサイトの管理を楽にしてくれる? Multi-site機能について
Drupalサイトの管理を楽にしてくれる? Multi-site機能についてDrupalサイトの管理を楽にしてくれる? Multi-site機能について
Drupalサイトの管理を楽にしてくれる? Multi-site機能についてShumpei Kishi
 
Nervesが開拓する「ElixirでIoT」の新世界
Nervesが開拓する「ElixirでIoT」の新世界Nervesが開拓する「ElixirでIoT」の新世界
Nervesが開拓する「ElixirでIoT」の新世界Hideki Takase
 
Ansible勉強会資料
Ansible勉強会資料Ansible勉強会資料
Ansible勉強会資料Makoto Oya
 
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3オラクルエンジニア通信
 
IT系エンジニアのためのプレゼンテーション入門
IT系エンジニアのためのプレゼンテーション入門IT系エンジニアのためのプレゼンテーション入門
IT系エンジニアのためのプレゼンテーション入門Masahito Zembutsu
 
モジュールの凝集度・結合度・インタフェース
モジュールの凝集度・結合度・インタフェースモジュールの凝集度・結合度・インタフェース
モジュールの凝集度・結合度・インタフェースHajime Yanagawa
 

What's hot (20)

メタプログラミングって何だろう
メタプログラミングって何だろうメタプログラミングって何だろう
メタプログラミングって何だろう
 
Oracle Database Applianceのご紹介(詳細)
Oracle Database Applianceのご紹介(詳細)Oracle Database Applianceのご紹介(詳細)
Oracle Database Applianceのご紹介(詳細)
 
分散トレーシング技術について(Open tracingやjaeger)
分散トレーシング技術について(Open tracingやjaeger)分散トレーシング技術について(Open tracingやjaeger)
分散トレーシング技術について(Open tracingやjaeger)
 
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
遠隔デバイスとの信頼を築くための技術とその標準(TEEP RATS)
 
GitLab から GitLab に移行したときの思い出
GitLab から GitLab に移行したときの思い出GitLab から GitLab に移行したときの思い出
GitLab から GitLab に移行したときの思い出
 
JP1/AJS2オペレータ勉強会
JP1/AJS2オペレータ勉強会JP1/AJS2オペレータ勉強会
JP1/AJS2オペレータ勉強会
 
ネットワークエンジニア的Ansibleの始め方
ネットワークエンジニア的Ansibleの始め方ネットワークエンジニア的Ansibleの始め方
ネットワークエンジニア的Ansibleの始め方
 
SAP Extractorのソースエンドポイントとしての利用
SAP Extractorのソースエンドポイントとしての利用SAP Extractorのソースエンドポイントとしての利用
SAP Extractorのソースエンドポイントとしての利用
 
Scalaで型クラス入門
Scalaで型クラス入門Scalaで型クラス入門
Scalaで型クラス入門
 
MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...
MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...
MLOps に基づく AI/ML 実運用最前線 ~画像、動画データにおける MLOps 事例のご紹介~(映像情報メディア学会2021年冬季大会企画セッショ...
 
Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
 
C#メタプログラミング概略 in 2021
C#メタプログラミング概略 in 2021C#メタプログラミング概略 in 2021
C#メタプログラミング概略 in 2021
 
Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)
Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)
Ansible とネットワーク自動化の概要(SmartCS と Ansible の連携による自動化の可能性を体験!)
 
Drupalサイトの管理を楽にしてくれる? Multi-site機能について
Drupalサイトの管理を楽にしてくれる? Multi-site機能についてDrupalサイトの管理を楽にしてくれる? Multi-site機能について
Drupalサイトの管理を楽にしてくれる? Multi-site機能について
 
Nervesが開拓する「ElixirでIoT」の新世界
Nervesが開拓する「ElixirでIoT」の新世界Nervesが開拓する「ElixirでIoT」の新世界
Nervesが開拓する「ElixirでIoT」の新世界
 
Ansible勉強会資料
Ansible勉強会資料Ansible勉強会資料
Ansible勉強会資料
 
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
 
IT系エンジニアのためのプレゼンテーション入門
IT系エンジニアのためのプレゼンテーション入門IT系エンジニアのためのプレゼンテーション入門
IT系エンジニアのためのプレゼンテーション入門
 
モジュールの凝集度・結合度・インタフェース
モジュールの凝集度・結合度・インタフェースモジュールの凝集度・結合度・インタフェース
モジュールの凝集度・結合度・インタフェース
 

Similar to Akka streams vs spark structured streaming

Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Using spark data frame for sql
Using spark data frame for sqlUsing spark data frame for sql
Using spark data frame for sqlDaeMyung Kang
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Java and XML Schema
Java and XML SchemaJava and XML Schema
Java and XML SchemaRaji Ghawi
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
import java-util--- import java-io--- class Vertex { -- Constructo.docx
import java-util--- import java-io---   class Vertex {   -- Constructo.docximport java-util--- import java-io---   class Vertex {   -- Constructo.docx
import java-util--- import java-io--- class Vertex { -- Constructo.docxBlake0FxCampbelld
 
05 Geographic scripting in uDig - halfway between user and developer
05 Geographic scripting in uDig - halfway between user and developer05 Geographic scripting in uDig - halfway between user and developer
05 Geographic scripting in uDig - halfway between user and developerAndrea Antonello
 
AWS CloudFormation Session
AWS CloudFormation SessionAWS CloudFormation Session
AWS CloudFormation SessionKamal Maiti
 
[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)croquiscom
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Andrew Yongjoon Kong
 
Jackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVJackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVTatu Saloranta
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scalab0ris_1
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code ExamplesNaresh Chintalcheru
 
Rule Your Geometry with the Terraformer Toolkit
Rule Your Geometry with the Terraformer ToolkitRule Your Geometry with the Terraformer Toolkit
Rule Your Geometry with the Terraformer ToolkitAaron Parecki
 
(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...
(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...
(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...Amazon Web Services
 

Similar to Akka streams vs spark structured streaming (20)

Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Using spark data frame for sql
Using spark data frame for sqlUsing spark data frame for sql
Using spark data frame for sql
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Java and XML Schema
Java and XML SchemaJava and XML Schema
Java and XML Schema
 
Scala in Places API
Scala in Places APIScala in Places API
Scala in Places API
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
import java-util--- import java-io--- class Vertex { -- Constructo.docx
import java-util--- import java-io---   class Vertex {   -- Constructo.docximport java-util--- import java-io---   class Vertex {   -- Constructo.docx
import java-util--- import java-io--- class Vertex { -- Constructo.docx
 
05 Geographic scripting in uDig - halfway between user and developer
05 Geographic scripting in uDig - halfway between user and developer05 Geographic scripting in uDig - halfway between user and developer
05 Geographic scripting in uDig - halfway between user and developer
 
Aimaf
AimafAimaf
Aimaf
 
AWS CloudFormation Session
AWS CloudFormation SessionAWS CloudFormation Session
AWS CloudFormation Session
 
[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Jackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVJackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSV
 
3 database-jdbc(1)
3 database-jdbc(1)3 database-jdbc(1)
3 database-jdbc(1)
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scala
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code Examples
 
Rule Your Geometry with the Terraformer Toolkit
Rule Your Geometry with the Terraformer ToolkitRule Your Geometry with the Terraformer Toolkit
Rule Your Geometry with the Terraformer Toolkit
 
(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...
(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...
(BDT401) Big Data Orchestra - Harmony within Data Analysis Tools | AWS re:Inv...
 

Recently uploaded

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Recently uploaded (20)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Akka streams vs spark structured streaming

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. val inputStream = s.readStream .format("kinesis") .option("streamName", "stream-name") // Kinesis Stream Name .option("region", "ap-northeast-1") // Kinesis Region .option("initialPosition", "TRIM_HORIZON") // Kinesis initial position .load() case class SampleSchema(column1: String, column2: Int) val testSchema = ScalaReflection.schemaFor[SampleSchema].dataType.asInstanceOf[StructType] val structuredStream = inputStream .select(from_json(col("data").cast("string"), testSchema).as("data")) // Parse JSON and convert to testSchema .select("data.*") // select JSON columns
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58. val kclSource = KCLSource("scalamatsuri") val producer = Flow[UserLog] .map(_.toJson.compactPrint) .map(ByteString.fromString(_)) .map(KinesisStreamObject(UUID.randomUUID().toString, _)) .toMat(KinesisProducer("scalamatsuri2", KinesisProducerSetting.fromShardCount(1)))(Keep.right) val extract = Flow.fromFunction[ByteString, PurchaseLog](_.utf8String.parseJson.convertTo[PurchaseLog]) val convert = Flow[PurchaseLog].mapConcat { log => log.items.map { item => UserLog( log.userId, // properties... ) } } val result = kclSource .via(extract) .via(convert) .toMat(producer)(Keep.right) .run()
  • 59.
  • 60. val stream = inputStream .select(from_json(col("data").cast("string"), purchaseLogSchema).as("data")) .select("data.*") .select( $"user_id", $"time", expr("explode(items)").as("item") ) .select( $"user_id", $"time", lit("PURCHASE").as("log_type"), $"item.item_id".as("item_id"), $"item.price".as("price"), $"item.quantity".as("quantity"), unix_timestamp().as("created_time") ) .as[UserLog] // case class UserLog( stream .writeStream .foreach(sink) .outputMode("update") .start()
  • 61.
  • 62.
  • 63. val http = Http() val separator = ByteString.fromString(Properties.lineSeparator) val separatorBytes = separator.length val MAX_FILE_SIZE = 10 * 1024 * 1024 val MAX_BYTES_PER_HOUR = 100 * 1024 * 1024 val kclSource = KCLSource("scalamatsuri") val buffering = Flow[ByteString] .batchWeighted[ByteString](MAX_FILE_SIZE, _.length, seed => seed)(_ ++ separator ++ _) .throttle(MAX_BYTES_PER_HOUR, Duration(1, HOURS), _.length) val send: Flow[ByteString, ByteString] = ??? // val logging = Sink.foreach[ByteString] { body => logger.info(body.utf8String) } val request = kclSource .via(buffering) .via(send) .toMat(logging)(Keep.right) .run()
  • 64. val inputStream = spark.readStream .format("kinesis") .option("streamName", scalamatsuriStraem) .option("region", "ap-northeast-1") .option("initialPosition", "TRIM_HORIZON") .option("awsAccessKey", awsAccessKeyId) .option("awsSecretKey", awsSecretKey) .option("maxRecordsPerFetch", 100000) .option("fetchBufferSize", 100000) .option("maxFetchDuration", 1.0) .load()
  • 65.
  • 66.
  • 67.
  • 68.
  • 69. val strteamWithWM = stream1.as("in1") .withColumn("dt", to_timestamp(from_unixtime(col("time")))) .withWatermark("dt", "3 hours") val stream2WithWM = stream2.as("in2") .withColumn("dt", to_timestamp(from_unixtime(col("time")))) .withWatermark("dt", "1 hours") strteamWithWM.join( stream2WithWM, expr(""" in1.user_id = in2.user_id AND in1.time < in2.time AND in1.time >= in2.time + interval 3 hour """), "leftOuter" )