Serverless Clojure ML Prototyping Report

Serverless Clojure and ML prototyping: an experience report
Toni Väisänen & Kimmo Koskinen
Helsinki Clojure Meetup
21.6.2022

Table of Contents
● Background
● The Team
● Technologies
○ Serverless Infrastructure, AWS, Terraform
○ CI/CD, Github Actions
○ Clojure
○ ML, NLP
● Closing Thoughts

Background
https://repliance.com/product.html

The Team
Toni: Fullstacker with an ML angle
● The main dev in the project
Clojurians, Koodiklinikka: @tvaisanen
Kimmo: Long time Clojure enthusiast, likes to dabble in data
projects
● The mentor in the project
https://twitter.com/KimmoKoskinen & https://github.com/viesti

Serverless Infrastructure
● AWS Organizations
○ AWS Root account and AWS account per client
■ Separate infra for each client
○ Terraform modules for logical parts of the infra
● DynamoDB used as the database, ML is run on-demand
with Sagemaker
○ Serverless infra requires a Serverless database
○ GPU’s on demand
● API Gateway
○ Frontend uses AWS services directly via API Gateway
● Terraform deﬁning everything

Serverless Infrastructure Simpliﬁed

CI / CD
● GitHub Actions
○ Build & Test
○ Deploys the development environment
■ GitHub Actions assumes an AWS IAM Role
● Production deploy
○ Publish build
■ Triggered by new tag push
■ Publish versioned release artifacts to S3
○ Deployment
■ Manually triggered workﬂow
■ Artifacts are downloaded from S3 and
■ Deployed with Terraform

Clojure Applications
● Dashboard
○ ClojureScript Reagent Single Page Application
● Dashboard Backend
○ Node/ClojureScript Lambda
■ Used to presigns S3 URLs upload and download
■ Node JS for faster cold-start
● Event Processor
○ JVM/Clojure Lambda
○ Processes events from services such as:
■ SES, SQS, S3, SageMaker etc…

Clojure Tooling
● ClojureScript
○ Shadow-CLJS
■ Builds the Dashboard SPA and
■ The Lambda that runs on NodeJS
● JVM/Clojure
○ deps.edn for project conﬁguration and
○ depstar for building the uberjar
● Babashka
○ Build, test and release tasks
○ bb.edn ﬁles small, task code required and shared
● Kaocha for testing

Natural Language Processing
Deepset AI’s Haystack for NLP tasks
https://haystack.deepset.ai/pipeline_nodes/retriever

● NLP tasks are based on having source material from which
natural language queries are asked.
○ Natural language texts (policy ﬁles)
○ Answer question pairs (FAQ items)

● The source material collection is called the document store
which stores the data in SQLite database

● The source material collection is called the document store
which stores the data in SQLite database
● In addition to the DB the document store has another
component (FAISS index) that stores the vectorized
(embeddings) representations from text passages

“FAISS (Facebook AI Similarity Search) is a library that allows developers to quickly search for
embeddings of multimedia documents that are similar to each other. It solves limitations of
traditional query search engines that are optimized for hash-based searches, and provides more
scalable similarity search functions.”

User Workﬂow
● User upload a questionnaire ﬁle (excel)
○ This triggers the Event Processor Lambda
○ File is transformed to JSON format and stored for later use

User Workﬂow
● There’s an UI tool that enables the user to
○ select the question rows and
○ pick the property columns

User Workﬂow
● The selection is saved and stored for later use

User Workﬂow
● User can also upload pre answered questions
○ to be used in the document store where the answers are searched for

User Workﬂow
● User can also upload pre answered questions
○ to be used in the document store where the answers are searched for
● User can trigger inference
○ Event sent to SQS, Fires a Lambda that trigger SageMaker Batch
Transform Job 🧠

AWS SageMaker
Batch Transform
“Run inference when you don't need a persistent endpoint”

Inference Workflow
- On startup Batch Transform Job fetches
- Policy files from S3
- FAQ items from DynamoDB
- Initializes the document store
- Pre-process policy files
- Create the embeddings
- Starts a web server (Flask)
- SageMaker reads the questions from S3
- SageMaker writes the answers to S3
- PutNotification triggered on new object
- Event Processor listens to these events and writes the results to
Dynamo DB

Closing Thoughts
● The project continues, next phase reveals how this actually
works :grimacing:
○ But there’s more angles also
● Pros
○ Interesting technology
○ Exploratory coding
○ Full stack: Infra, Backend, Frontend, ML, Design, UX, you name it!
● Cons
○ Complexity creeping, how to maintain…
○ See last pros bullet :D
● Learnings
○ Using tools that ﬁt the job is good
○ ML & Serverless is not too difﬁcult with Clojure

Serverless Clojure ML Prototyping Report

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Serverless Clojure ML Prototyping Report

Similar to Serverless Clojure ML Prototyping Report (20)

More from Metosin Oy

More from Metosin Oy (20)

Recently uploaded

Recently uploaded (20)

Serverless Clojure ML Prototyping Report