Make Application Devs More Productive with Real-time Analytics APIs
1. Making Application Devs More
Productive with Real-time Analytics as
an API
Nadine Farah (Senior Developer Advocate at Rockset)
nadine@rockset.com
2. Rockset: Built for Real-time Analytics in the Cloud
2
Self-service on semi-structured data
Enable high query performance (Converged Index)
Eliminate the need to manage clusters
Turn powerful sql queries into real-time analytical APIs
3. Agenda
● Overview for building real-time analytics in 3 steps
● Object Relational Mapping (ORM)
● Elasticsearch
● Rockset Query Lambdas
● Building a Developer API Platform for Real-time Analytics
● Demo
3
3
5. Build Real-time Analytics in 3 Major Steps
5
PostGreSQL,
MongoDB,
DynamoDB,
etc...
Elasticsearch,
PostgreSQL,
Data warehouses,
etc...
Examples of databases and data stores
8. Construct an Analytical Query with ORMs
8
ORM:
u2 = Users.objects.select_related('city').get(id=2)
SQL:
SELECT u.user_id, u.title, y.city_id, y.city_name
FROM users AS u
INNER JOIN city AS y ON y.id = u.city_id
WHERE u.user_id = 2
11. ORMs for Analytical Queries: Considerations
● Great for getting started on an app that needs structured &
relational data
● ORMs hide how many reads are happening behind the scenes,
possibly stressing the database
● ORMs can cache, trading off memory for speed
11
12. 12
SQL:
cursor.execute("""
SELECT products.*, purchases.number_purchases,
reviews.average_rating
FROM commons.products
LEFT JOIN (
SELECT product_id, COUNT(*) as number_purchases
FROM commons.purchases
GROUP BY 1
) purchases on products.id =
purchases.product_id
LEFT JOIN (
SELECT product_id,
AVG(CAST(rating as int)) average_rating
FROM commons.reviews
GROUP BY 1
) reviews on products.id = reviews.product_id
WHERE revews.product_id= ‘%s’”””%id)
Using Raw SQL to Write Analytical Queries
● Be vigilant about not using string
interpolation
16. Joining Data in Elasticsearch vs. SQL
16
SQL:
SELECT products.*,
purchases.number_purchases,
reviews.average_rating
FROM commons.products
LEFT JOIN (
SELECT product_id, COUNT(*) as
number_purchases
FROM commons.purchases
GROUP BY 1
) purchases on products.id =
purchases.product_id
LEFT JOIN (
SELECT product_id, AVG(CAST(rating
as int)) average_rating
FROM commons.reviews
GROUP BY 1
) reviews on products.id =
reviews.product_id
WHERE + whereClause
Elasticsearch:
...(next slide)...
18. Application-side Joins in Elasticsearch: Considerations
18
● Great for text search & log search on semi-structured data/less
structured data
● Writing joins natively within your app increases complexity at the
implementation level
● You can denormalize data to combine data from different
models, but it comes at an expense
20. Build Real-time Analytics with Rockset
20
$ curl --request POST --url
https://api.rs2.usw2.rockset.com/v1/orgs/self/ws/commons/lambdas/MyReco/versions/832f29e
fdad4e57b/...
21. Real-time Analytical Queries in Rockset
21
SQL:
SELECT products.*,
purchases.number_purchases,
reviews.average_rating
FROM commons.products
LEFT JOIN (
SELECT product_id, COUNT(*) as
number_purchases
FROM commons.purchases
GROUP BY 1
) purchases on products.id =
purchases.product_id
LEFT JOIN (
SELECT product_id, AVG(CAST(rating as
int)) average_rating
FROM commons.reviews
GROUP BY 1
) reviews on products.id =
reviews.product_id
WHERE + whereClause
SQL in Rockset: TLDR no change!
SELECT products.*,
purchases.number_purchases,
reviews.average_rating
FROM commons.products
LEFT JOIN (
SELECT product_id, COUNT(*) as
number_purchases
FROM commons.purchases
GROUP BY 1
) purchases on products.id =
purchases.product_id
LEFT JOIN (
SELECT product_id, AVG(CAST(rating as
int)) average_rating
FROM commons.reviews
GROUP BY 1
) reviews on products.id =
reviews.product_id
WHERE + whereClause
23. 23
Real-time Analytics with Query Lambdas: Considerations
● Used for search and analytics on semi-structured & structured
data
● Rockset is cost efficient for 10’s of terabytes of data
● Rockset is optimized for cloud-only analytics
●
24. Building a Developer API Platform for Real-time Analytics
● Teams can easily collaborate
and version Query Lambdas
● Millisecond Query performance
out-of-the-box
● Less server-side code to create
and maintain real-time
analytical APIs
24