SlideShare a Scribd company logo
1 of 36
Download to read offline
© 2023 Cloudera, Inc. All rights reserved.
Adding Generative AI to Real-Time
Streaming Pipelines
Tim Spann
Principal Developer Advocate
26-October-2023
© 2023 Cloudera, Inc. All rights reserved. 2
© 2023 Cloudera, Inc. All rights reserved. 3
© 2023 Cloudera, Inc. All rights reserved. 4
https://events.dzone.com/dzone/Data-Pipelines-Investigating-the-Modern-Day-Stack
© 2023 Cloudera, Inc. All rights reserved. 6
Tim Spann
@PaasDev www.datainmotion.dev
github.com/tspannhw medium.com/@tspann
Principal Developer Advocate
Princeton/NYC/Philly Future of Data Meetup
ex-Pivotal, ex-Hortonworks, ex-StreamNative,
ex-PwC, ex-EY, ex-HPE.
Apache NiFi x Apache Kafka x Apache Flink x AI
Future of Data - NYC + NJ + Philly + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python, Java,
AI, ML, LLM and Open Source friends.
https://bit.ly/32dAJft
© 2023 Cloudera, Inc. All rights reserved. 9
REAL-TIME LLM REQUIRES A PLATFORM
SQL
Stream
Builder
LLM USE CASE
Vector DB
AI Model
Unstructured file types
Data in Motion
on Cloudera Data
Platform (CDP)
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams
meta-llama/llama-2-70b-chat
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 11
Cloudera + LLMs
Knowledge Repository
Data Storage / Management
Data Preparation
Data Engineering
LLM Fine Tuning Process
Training Framework
LLM Serving
Serving Framework
Key:
CPU Task
GPU Task
CML
CDE
CDP
Vector DB
CDF
Streaming Classification
Real-Time Model Deployment
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 12
Streaming LLM Going Forward
RAG (retrieval augmented)
FLARE pattern
https://github.com/mit-han-lab/streaming-llm
Current Research
Kafka Integration with Vector
Stores
NiFi Integration with
HuggingFace Transformers
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 13
Streaming LLM With Java
(NiFi, Kafka,Flink, Spark)
https://github.com/deepjavalibrary/djl-demo/tree/master/huggingface/nlp/src/main/java/com/examples
https://pub.towardsai.net/deploy-huggingface-nlp-models-in-java-with-deep-java-library-e36c635b2053
https://milvus.io/docs/create_collection.md
https://github.com/langchain4j/langchain4j
https://github.com/mariofusco/droolsGPT
© 2023 Cloudera, Inc. All rights reserved.
MAGIC!
© 2023 Cloudera, Inc. All rights reserved. 15
Greg Phillips - GETTING THE BEST RESULTS FROM LLMs
Prompt Engineering ($)
getting better responses through better questions & context
Retrieval Augmented Generation ($$)
adding search results as context to the prompt
Parameter-Efficient Fine Tuning ($$$)
training a smaller “adapter” to combine with a LLM
Re-Training a Model ($$$$$)
creating a new model from a base LLM & enterprise-specific data
Or use a combination of multiple approaches…
© 2023 Cloudera, Inc. All rights reserved.
Run collection and streaming on any cloud, server, container, bare metal, device or VM
Data Sources Cloudera Data
Flow
Cloudera Data
Platform
Kafka
Lake House
INGEST
© 2023 Cloudera, Inc. All rights reserved.
ENRICH
© 2023 Cloudera, Inc. All rights reserved.
FUNNEL
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
watsonx.ai
© 2023 Cloudera, Inc. All rights reserved.
DISTRIBUTE
DEPLOY
https://github.com/tspannhw/FLaNK-Edge-Models
© 2023 Cloudera, Inc. All rights reserved.
STORE
© 2023 Cloudera, Inc. All rights reserved.
Generative AI
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
watsonx.ai
© 2023 Cloudera, Inc. All rights reserved. 22
© 2023 Cloudera, Inc. All rights reserved.
NEW APPLIED ML
PROTOTYPE
RELEASE
LLM Chatbot Augmented with
Enterprise Data
• Hosts an LLM from
HuggingFace
• Creates and populates Milvus
vector database with Cloudera
Docs
• Demonstration of Retrieval
Augmented Generation (RAG)
Architecture
• Deploys ChatBot-like web
application to interact with base
and RAG LLMs
https://github.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data
© 2023 Cloudera, Inc. All rights reserved. 23
IN THE CURRENT CML PRODUCT, YOU CAN …
• Host and serve an open source LLM
• Create and host enterprise ready
applications as front ends to these LLMs
• Instantiate a vector database to do
semantic search on your enterprise
knowledge base
• Provide enterprise specific context to an
LLM to generate factual responses
All this without making any external calls to openAPI or any other SAAS AI service
© 2023 Cloudera, Inc. All rights reserved.
DEMO AND Q&A
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved.
FREE LEARNING ENVIRONMENT
© 2023 Cloudera, Inc. All rights reserved. 28
Cloudera Streams
Processing -
Community Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
© 2023 Cloudera, Inc. All rights reserved.
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
© 2023 Cloudera, Inc. All rights reserved.
RESOURCES AND WRAP-UP
© 2023 Cloudera, Inc. All rights reserved. 31
Apache NiFi in a few numbers
A very active project with a dynamic community & comparison with ACEU 2019
2800+ members on the Slack channel (535+ - 4 years ago)
475+ contributors on Github across the repositories (260+ - 4
years ago)
65 committers in the Apache NiFi community (45 - 4 years ago)
Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming
soon (NiFi 1.10 - 4 years ago)
14M+ docker pulls of the Apache NiFi image (1M+ - 4 years
ago)
© 2023 Cloudera, Inc. All rights reserved.
https://medium.com/@tspann/cdc-not-cat-data-capture-e43713879c03
© 2023 Cloudera, Inc. All rights reserved.
https://www.youtube.com/watch?v=Y1JeOrJIoKI
https://www.youtube.com/watch?v=he7bcQDbAIQ&t=6s
https://www.youtube.com/watch?v=RPz7Xm4fLF4
© 2023 Cloudera, Inc. All rights reserved. 34
Streaming Resources
• https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
• https://flipstackweekly.com/
• https://www.datainmotion.dev/
• https://www.flankstack.dev/
• https://github.com/tspannhw
• https://medium.com/@tspann
• https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
• https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
© 2023 Cloudera, Inc. All rights reserved. 35
Java - LLM Resources
• https://github.com/TheoKanning/openai-java
• https://github.com/Knowly-ai/langtorch
• https://onnxruntime.ai/docs/get-started/with-java.html
• https://github.com/tjake/Jlama
• https://github.com/HamaWhiteGG/langchain-java
• https://nightlies.apache.org/flink/flink-ml-docs-master/docs/operators/fe
ature/sqltransformer/
• https://research.ibm.com/blog/retrieval-augmented-generation-RAG
© 2023 Cloudera, Inc. All rights reserved. 36
TH N Y U

More Related Content

What's hot

Meet up roadmap cloudera 2020 - janeiro
Meet up   roadmap cloudera 2020 - janeiroMeet up   roadmap cloudera 2020 - janeiro
Meet up roadmap cloudera 2020 - janeiroThiago Santiago
 
Monitoring, Logging and Tracing on Kubernetes
Monitoring, Logging and Tracing on KubernetesMonitoring, Logging and Tracing on Kubernetes
Monitoring, Logging and Tracing on KubernetesMartin Etmajer
 
A Deepdive into Azure Networking
A Deepdive into Azure NetworkingA Deepdive into Azure Networking
A Deepdive into Azure NetworkingKarim Vaes
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with BackstageOpsta
 
Azure key vault
Azure key vaultAzure key vault
Azure key vaultRahul Nath
 
Red Hat OpenShift Container Storage
Red Hat OpenShift Container StorageRed Hat OpenShift Container Storage
Red Hat OpenShift Container StorageTakuya Utsunomiya
 
Deploying Azure DevOps using Terraform
Deploying Azure DevOps using TerraformDeploying Azure DevOps using Terraform
Deploying Azure DevOps using TerraformAdin Ermie
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseAraf Karsh Hamid
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container PlatformDLT Solutions
 
Creating AWS infrastructure using Terraform
Creating AWS infrastructure using TerraformCreating AWS infrastructure using Terraform
Creating AWS infrastructure using TerraformKnoldus Inc.
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouseAltinity Ltd
 
Microsoft Azure Traffic Manager
Microsoft Azure Traffic ManagerMicrosoft Azure Traffic Manager
Microsoft Azure Traffic ManagerIdo Katz
 
The Complete Guide to Service Mesh
The Complete Guide to Service MeshThe Complete Guide to Service Mesh
The Complete Guide to Service MeshAspen Mesh
 
Azure Migrate
Azure MigrateAzure Migrate
Azure MigrateMustafa
 

What's hot (20)

Meet up roadmap cloudera 2020 - janeiro
Meet up   roadmap cloudera 2020 - janeiroMeet up   roadmap cloudera 2020 - janeiro
Meet up roadmap cloudera 2020 - janeiro
 
Introduce to Terraform
Introduce to TerraformIntroduce to Terraform
Introduce to Terraform
 
Monitoring, Logging and Tracing on Kubernetes
Monitoring, Logging and Tracing on KubernetesMonitoring, Logging and Tracing on Kubernetes
Monitoring, Logging and Tracing on Kubernetes
 
A Deepdive into Azure Networking
A Deepdive into Azure NetworkingA Deepdive into Azure Networking
A Deepdive into Azure Networking
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
Azure key vault
Azure key vaultAzure key vault
Azure key vault
 
Terraform
TerraformTerraform
Terraform
 
Red Hat OpenShift Container Storage
Red Hat OpenShift Container StorageRed Hat OpenShift Container Storage
Red Hat OpenShift Container Storage
 
(ARC307) Infrastructure as Code
(ARC307) Infrastructure as Code(ARC307) Infrastructure as Code
(ARC307) Infrastructure as Code
 
Deploying Azure DevOps using Terraform
Deploying Azure DevOps using TerraformDeploying Azure DevOps using Terraform
Deploying Azure DevOps using Terraform
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container Platform
 
Intro to Terraform
Intro to TerraformIntro to Terraform
Intro to Terraform
 
Creating AWS infrastructure using Terraform
Creating AWS infrastructure using TerraformCreating AWS infrastructure using Terraform
Creating AWS infrastructure using Terraform
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Microsoft Azure Traffic Manager
Microsoft Azure Traffic ManagerMicrosoft Azure Traffic Manager
Microsoft Azure Traffic Manager
 
The Complete Guide to Service Mesh
The Complete Guide to Service MeshThe Complete Guide to Service Mesh
The Complete Guide to Service Mesh
 
Azure Migrate
Azure MigrateAzure Migrate
Azure Migrate
 
Terraform Basics
Terraform BasicsTerraform Basics
Terraform Basics
 
Terraform on Azure
Terraform on AzureTerraform on Azure
Terraform on Azure
 

Similar to 26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup

AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AITimothy Spann
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19ssuser73434e
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsTimothy Spann
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Timothy Spann
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentTimothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023ssuser73434e
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-PipelinesTimothy Spann
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Similar to 26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup (20)

AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

More from Timothy Spann

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationTimothy Spann
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 

More from Timothy Spann (18)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 

Recently uploaded

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup

  • 1. © 2023 Cloudera, Inc. All rights reserved. Adding Generative AI to Real-Time Streaming Pipelines Tim Spann Principal Developer Advocate 26-October-2023
  • 2. © 2023 Cloudera, Inc. All rights reserved. 2
  • 3. © 2023 Cloudera, Inc. All rights reserved. 3
  • 4. © 2023 Cloudera, Inc. All rights reserved. 4 https://events.dzone.com/dzone/Data-Pipelines-Investigating-the-Modern-Day-Stack
  • 5.
  • 6. © 2023 Cloudera, Inc. All rights reserved. 6 Tim Spann @PaasDev www.datainmotion.dev github.com/tspannhw medium.com/@tspann Principal Developer Advocate Princeton/NYC/Philly Future of Data Meetup ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC, ex-EY, ex-HPE. Apache NiFi x Apache Kafka x Apache Flink x AI
  • 7. Future of Data - NYC + NJ + Philly + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 8. FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java, AI, ML, LLM and Open Source friends. https://bit.ly/32dAJft
  • 9. © 2023 Cloudera, Inc. All rights reserved. 9 REAL-TIME LLM REQUIRES A PLATFORM SQL Stream Builder
  • 10. LLM USE CASE Vector DB AI Model Unstructured file types Data in Motion on Cloudera Data Platform (CDP) Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams meta-llama/llama-2-70b-chat
  • 11. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 11 Cloudera + LLMs Knowledge Repository Data Storage / Management Data Preparation Data Engineering LLM Fine Tuning Process Training Framework LLM Serving Serving Framework Key: CPU Task GPU Task CML CDE CDP Vector DB CDF Streaming Classification Real-Time Model Deployment
  • 12. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 12 Streaming LLM Going Forward RAG (retrieval augmented) FLARE pattern https://github.com/mit-han-lab/streaming-llm Current Research Kafka Integration with Vector Stores NiFi Integration with HuggingFace Transformers
  • 13. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 13 Streaming LLM With Java (NiFi, Kafka,Flink, Spark) https://github.com/deepjavalibrary/djl-demo/tree/master/huggingface/nlp/src/main/java/com/examples https://pub.towardsai.net/deploy-huggingface-nlp-models-in-java-with-deep-java-library-e36c635b2053 https://milvus.io/docs/create_collection.md https://github.com/langchain4j/langchain4j https://github.com/mariofusco/droolsGPT
  • 14. © 2023 Cloudera, Inc. All rights reserved. MAGIC!
  • 15. © 2023 Cloudera, Inc. All rights reserved. 15 Greg Phillips - GETTING THE BEST RESULTS FROM LLMs Prompt Engineering ($) getting better responses through better questions & context Retrieval Augmented Generation ($$) adding search results as context to the prompt Parameter-Efficient Fine Tuning ($$$) training a smaller “adapter” to combine with a LLM Re-Training a Model ($$$$$) creating a new model from a base LLM & enterprise-specific data Or use a combination of multiple approaches…
  • 16. © 2023 Cloudera, Inc. All rights reserved. Run collection and streaming on any cloud, server, container, bare metal, device or VM Data Sources Cloudera Data Flow Cloudera Data Platform Kafka Lake House INGEST
  • 17. © 2023 Cloudera, Inc. All rights reserved. ENRICH
  • 18. © 2023 Cloudera, Inc. All rights reserved. FUNNEL https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis https://github.com/tspannhw/FLaNK-LLM watsonx.ai
  • 19. © 2023 Cloudera, Inc. All rights reserved. DISTRIBUTE DEPLOY https://github.com/tspannhw/FLaNK-Edge-Models
  • 20. © 2023 Cloudera, Inc. All rights reserved. STORE
  • 21. © 2023 Cloudera, Inc. All rights reserved. Generative AI https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis https://github.com/tspannhw/FLaNK-LLM watsonx.ai
  • 22. © 2023 Cloudera, Inc. All rights reserved. 22 © 2023 Cloudera, Inc. All rights reserved. NEW APPLIED ML PROTOTYPE RELEASE LLM Chatbot Augmented with Enterprise Data • Hosts an LLM from HuggingFace • Creates and populates Milvus vector database with Cloudera Docs • Demonstration of Retrieval Augmented Generation (RAG) Architecture • Deploys ChatBot-like web application to interact with base and RAG LLMs https://github.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data
  • 23. © 2023 Cloudera, Inc. All rights reserved. 23 IN THE CURRENT CML PRODUCT, YOU CAN … • Host and serve an open source LLM • Create and host enterprise ready applications as front ends to these LLMs • Instantiate a vector database to do semantic search on your enterprise knowledge base • Provide enterprise specific context to an LLM to generate factual responses All this without making any external calls to openAPI or any other SAAS AI service
  • 24. © 2023 Cloudera, Inc. All rights reserved. DEMO AND Q&A
  • 25. © 2023 Cloudera, Inc. All rights reserved.
  • 26. © 2023 Cloudera, Inc. All rights reserved.
  • 27. © 2023 Cloudera, Inc. All rights reserved. FREE LEARNING ENVIRONMENT
  • 28. © 2023 Cloudera, Inc. All rights reserved. 28 Cloudera Streams Processing - Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 29. © 2023 Cloudera, Inc. All rights reserved. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 30. © 2023 Cloudera, Inc. All rights reserved. RESOURCES AND WRAP-UP
  • 31. © 2023 Cloudera, Inc. All rights reserved. 31 Apache NiFi in a few numbers A very active project with a dynamic community & comparison with ACEU 2019 2800+ members on the Slack channel (535+ - 4 years ago) 475+ contributors on Github across the repositories (260+ - 4 years ago) 65 committers in the Apache NiFi community (45 - 4 years ago) Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi 1.10 - 4 years ago) 14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)
  • 32. © 2023 Cloudera, Inc. All rights reserved. https://medium.com/@tspann/cdc-not-cat-data-capture-e43713879c03
  • 33. © 2023 Cloudera, Inc. All rights reserved. https://www.youtube.com/watch?v=Y1JeOrJIoKI https://www.youtube.com/watch?v=he7bcQDbAIQ&t=6s https://www.youtube.com/watch?v=RPz7Xm4fLF4
  • 34. © 2023 Cloudera, Inc. All rights reserved. 34 Streaming Resources • https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative • https://flipstackweekly.com/ • https://www.datainmotion.dev/ • https://www.flankstack.dev/ • https://github.com/tspannhw • https://medium.com/@tspann • https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 • https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 35. © 2023 Cloudera, Inc. All rights reserved. 35 Java - LLM Resources • https://github.com/TheoKanning/openai-java • https://github.com/Knowly-ai/langtorch • https://onnxruntime.ai/docs/get-started/with-java.html • https://github.com/tjake/Jlama • https://github.com/HamaWhiteGG/langchain-java • https://nightlies.apache.org/flink/flink-ml-docs-master/docs/operators/fe ature/sqltransformer/ • https://research.ibm.com/blog/retrieval-augmented-generation-RAG
  • 36. © 2023 Cloudera, Inc. All rights reserved. 36 TH N Y U