Re-architecting the Government with Lua and Hadoop

•

1 like•359 views

Hear the tale of how we revamped an open source API management platform (API Umbrella) to meet the growing needs of the U.S. federal government. Over the past year, we've transitioned from a stack running on Node.js and Elasticsearch to a more efficient stack utilizing Lua/OpenResty and Hadoop/Apache Kylin. We'll dive into why we underwent this migration, and why you may or may not want to consider these technologies in other cases. We'll detail how we rolled out these significant architecture changes without anybody noticing, and also touch on the US government's increasing adoption of open source.

Software

Re-architecting
the Government
with Lua and Hadoop
Nick Muerdter @nickblah
National Renewable Energy Laboratory
Gluecon
2016-05-26
★ ★ ★
★ ★ ★

Chapter 2
★ ★ ★
A Journey from
NodeJS to Lua

Internet
nginx
API
nginx
Redis
NodeJS
NodeJS
NodeJS
API
…

Why Change?
•  Operational simplicity
•  Eﬀiciency & speed
•  Benefits of hindsight

1. Async/event-driven
(but without
callbacks/promises)

$NodeJS Callbacks var pg = require('pg'); pg.connect({ database: 'mydb' }, function(err, client, done) { client.query('INSERT INTO people(name) VALUES('Jane')', function(err, res) { client.query('SELECT first_name FROM people', function(err, res) { done(); }); }); });$

Lua
local pgmoon = require("pgmoon")

local pg = pgmoon.new({ database = "mydb" })
pg:connect()

local res, err = pg:query("INSERT INTO people(name) VALUES('Jane')")
local res, err = pg:query("SELECT name FROM people")

Proxy Overhead
•  NodeJS: ~10ms
•  OpenResty: ~1ms
•  50% less CPU use
*Benchmarking caveats…

Chapter 3
★ ★ ★
Growing with
Hadoop, Kylin, and SQL

request_at: 2016-05-25T20:02:53.752Z
request_method: GET
request_scheme: https
request_host: developer.nrel.gov
request_path: /alt-fuel-stations/v1.json
request_query: fuel_type=ELEC&limit=10
user_id: ad2d94b6-e0f8-4e26-b1a6-1bc6b12f3d76
request_user_agent: curl/7.33.0
request_ip_country: US
request_ip_region: CO
request_ip_city: Golden
response_status: 200
response_time: 40
…
…

Data Cubes
•  Pick the dimensions (columns) you
want to aggregate on.
•  Define measurements (counts,
sums, etc).
•  For every possible combination of
dimensions, measurements are pre-
aggregated.

Dimensions: host
Measurements: count
Raw Rows: 150

host count
developer.nrel.gov 50
api.nasa.gov 100

SELECT host, COUNT(*)
FROM logs
GROUP BY host;

Dimensions: host, path
Measurements: count
Raw Rows: 150

host path count
developer.nrel.gov /alt-fuel 30
developer.nrel.gov /solar 20
api.nasa.gov /photos 40
api.nasa.gov /asteroids 50
api.nasa.gov /mars 10

SELECT path, COUNT(*)
FROM logs
WHERE host = 'developer.nrel.gov'
GROUP BY path;

SELECT path, COUNT(*)
FROM logs
WHERE user_agent = 'curl'
GROUP BY path;

ANSI SQL is a lovely thing
•  If Kylin can’t answer a query from its
pre-aggregations, we can fallback
to slower Hadoop queries against
the raw dataset.
•  Presto gives an ANSI SQL interface
against the raw dataset.

Query Performance
•  ElasticSearch: 5s, 10s, 60s, 2m…
•  Kylin: ~1s
*Benchmarking caveats…

Perhaps some other
things for your toolbelt:
OpenResty/Lua
Kylin
API Umbrella

Sleeping Material
•  Lua
– https://github.com/NREL/api-umbrella/issues/86
– https://github.com/NREL/api-umbrella/pull/183
•  Kylin
– https://github.com/18F/api.data.gov/issues/235

★ ★ ★
Questions?
★ ★ ★
Nick Muerdter
nick.muerdter@nrel.gov
@nickblah

What's hot

Lambda architectureIvan Kosianenko

Monitoring as codeIcinga

[WSO2Con USA 2018] Deploying Applications in K8S and DockerWSO2

From business requirements to working pipelines with apache airflowDerrick Qin

Order from chaos: automating monitoring configurationSensu Inc.

Icinga 2 - Apify them all at Icinga Camp Amsterdam 2016Icinga

SFScon19 - Luca Romano Simone Vianello - ORM and RDBMS, how to make them work...South Tyrol Free Software Conference

Towards a self automated CERN CloudJose Castro Leon

Into to Node.js: Building Fast, Scaleable Network ApplicationsFlatiron School

Athena 0.2.0 - NimbleNimble

Acd19 kubertes cluster at scale on aws at intuitJohn Varghese

7 Years of Sensu: Then, Now, and SoonSensu Inc.

Run Containerized Database SQL Server 2017 LinuxNilesh Gule

Git ops: Git based application deployment patterns for KubernetesShahidh K Muhammed

Introduction to GraphQLİlker Güller

GraphQL in Kiwi.comMichal Sänger

TDC São Paulo 2015 Ruby - Crescimento e performance em uma aplicação em Railsandrehjr

SFScon16 - Michele Baldessari: "OpenStack – An introduction"South Tyrol Free Software Conference

What's hot (18)

Lambda architecture

Monitoring as code

[WSO2Con USA 2018] Deploying Applications in K8S and Docker

From business requirements to working pipelines with apache airflow

Order from chaos: automating monitoring configuration

Icinga 2 - Apify them all at Icinga Camp Amsterdam 2016

SFScon19 - Luca Romano Simone Vianello - ORM and RDBMS, how to make them work...

Towards a self automated CERN Cloud

Into to Node.js: Building Fast, Scaleable Network Applications

Athena 0.2.0 - Nimble

Acd19 kubertes cluster at scale on aws at intuit

7 Years of Sensu: Then, Now, and Soon

Run Containerized Database SQL Server 2017 Linux

Git ops: Git based application deployment patterns for Kubernetes

Introduction to GraphQL

GraphQL in Kiwi.com

TDC São Paulo 2015 Ruby - Crescimento e performance em uma aplicação em Rails

SFScon16 - Michele Baldessari: "OpenStack – An introduction"

Similar to Re-architecting the Government with Lua and Hadoop

Microservices and modularity with javaDPC Consulting Ltd

Why try angularJS?Jergus Lejko

Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino

Backend as a Service ComparisonSerhiy Snizhny

Android rest client applications-services approach @Droidcon Bucharest 2012Droidcon Eastern Europe

Data Modeling and Relational to NoSQLDATAVERSITY

There is REST and then there is "REST"Radovan Semancik

New World of Angular (v2+)Rahat Khanna a.k.a mAppMechanic

Jenkins Workflow - An IntroductionBen Snape

AWS_Data_PipelineAhasan Habib

GraphQL is actually restJakub Riedl

Backend, app e internet das coisas com NodeJS no Google Cloud PlatformDevMT

Backend, app e internet das coisas com NodeJS no Google Cloud PlatformAlvaro Viebrantz

Node.js and Selenium Webdriver, a journey from the Java sideMek Srunyu Stittri

Swagger Code GenerationPlain Concepts

Java2 days 5_agile_steps_to_cloud-ready_appsPayara

EmberDusan Koutny

Lean and mean MongoDBOleg Podsechin

Single Page Web Apps As WordPress Admin Interfaces Using AngularJS & The Word...Caldera Labs

NoSQL design pitfalls with JavaOtávio Santana

Similar to Re-architecting the Government with Lua and Hadoop (20)

Microservices and modularity with java

Why try angularJS?

Building a serverless company on AWS lambda and Serverless framework

Backend as a Service Comparison

Android rest client applications-services approach @Droidcon Bucharest 2012

Data Modeling and Relational to NoSQL

There is REST and then there is "REST"

New World of Angular (v2+)

Jenkins Workflow - An Introduction

AWS_Data_Pipeline

GraphQL is actually rest

Backend, app e internet das coisas com NodeJS no Google Cloud Platform

Node.js and Selenium Webdriver, a journey from the Java side

Swagger Code Generation

Java2 days 5_agile_steps_to_cloud-ready_apps

Ember

Lean and mean MongoDB

Single Page Web Apps As WordPress Admin Interfaces Using AngularJS & The Word...

NoSQL design pitfalls with Java

Recently uploaded

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

How to submit a standout Adobe Champion ApplicationBradBedford3

React Server Component in Next.js by Hanief UtamaHanief Utama

PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López

SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler

Cyber security and its impact on E commercemanigoyal112

VK Business Profile - provides IT solutions and Web Developmentvyaparkranti

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort

Recently uploaded (20)

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

Folding Cheat Sheet #4 - fourth in a series

Intelligent Home Wi-Fi Solutions | ThinkPalm

Cloud Data Center Network Construction - IEEE

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Unveiling the Future: Sylius 2.0 New Features

How to submit a standout Adobe Champion Application

React Server Component in Next.js by Hanief Utama

PREDICTING RIVER WATER QUALITY ppt presentation

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...

英国UN学位证,北安普顿大学毕业证书1:1制作

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

Cyber security and its impact on E commerce

VK Business Profile - provides IT solutions and Web Development

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

Automate your Kamailio Test Calls - Kamailio World 2024

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)

Re-architecting the Government with Lua and Hadoop

1. Re-architecting the Government with Lua and Hadoop Nick Muerdter @nickblah National Renewable Energy Laboratory Gluecon 2016-05-26 ★ ★ ★ ★ ★ ★

3. Chapter 1 ★ ★ ★ Origins

5. 2010: APIs!

6. API Management

7. 2013: 💬 with other agencies

8. Open Sourcing API Umbrella

9. api.data.gov

10. Chapter 2 ★ ★ ★ A Journey from NodeJS to Lua

11. Custom sauce (was NodeJS)

12. Internet nginx API nginx Redis NodeJS NodeJS NodeJS API …

13. Why Change? •  Operational simplicity •  Eﬀiciency & speed •  Benefits of hindsight

14. OpenResty (nginx + LuaJIT)

15. Fun & useful things about OpenResty

16. 1. Async/event-driven (but without callbacks/promises)

17. NodeJS Callbacks var pg = require('pg'); pg.connect({ database: 'mydb' }, function(err, client, done) { client.query('INSERT INTO people(name) VALUES('Jane')', function(err, res) { client.query('SELECT first_name FROM people', function(err, res) { done(); }); }); });

18. Lua local pgmoon = require("pgmoon") local pg = pgmoon.new({ database = "mydb" }) pg:connect() local res, err = pg:query("INSERT INTO people(name) VALUES('Jane')") local res, err = pg:query("SELECT name FROM people")

19. 2. Shared memory dictionaries

20. 3. The nginx we know and love

21. 4. It’s fast!

22. How to pull this off?

23. API Compatibility

24. 💖 Integration Tests

25. Internet nginx API nginx Redis NodeJS NodeJS NodeJS API …

26. Internet API nginx + Lua API …

27. Proxy Overhead •  NodeJS: ~10ms •  OpenResty: ~1ms •  50% less CPU use *Benchmarking caveats…

28. It may not all be 🍑s

29. Chapter 3 ★ ★ ★ Growing with Hadoop, Kylin, and SQL

30. Log Data & Analytics

31. request_at: 2016-05-25T20:02:53.752Z request_method: GET request_scheme: https request_host: developer.nrel.gov request_path: /alt-fuel-stations/v1.json request_query: fuel_type=ELEC&limit=10 user_id: ad2d94b6-e0f8-4e26-b1a6-1bc6b12f3d76 request_user_agent: curl/7.33.0 request_ip_country: US request_ip_region: CO request_ip_city: Golden response_status: 200 response_time: 40 … …

32. ElasticSearch

33. Apache Kylin

34. Data Cubes •  Pick the dimensions (columns) you want to aggregate on. •  Define measurements (counts, sums, etc). •  For every possible combination of dimensions, measurements are pre- aggregated.

35. Dimensions: host Measurements: count Raw Rows: 150 host count developer.nrel.gov 50 api.nasa.gov 100

36. SELECT host, COUNT(*) FROM logs GROUP BY host;

37. Dimensions: host, path Measurements: count Raw Rows: 150 host path count developer.nrel.gov /alt-fuel 30 developer.nrel.gov /solar 20 api.nasa.gov /photos 40 api.nasa.gov /asteroids 50 api.nasa.gov /mars 10

38. SELECT path, COUNT(*) FROM logs WHERE host = 'developer.nrel.gov' GROUP BY path;

39. SELECT path, COUNT(*) FROM logs WHERE user_agent = 'curl' GROUP BY path;

40. ANSI SQL is a lovely thing •  If Kylin can’t answer a query from its pre-aggregations, we can fallback to slower Hadoop queries against the raw dataset. •  Presto gives an ANSI SQL interface against the raw dataset.

41. But I could pre-aggregate myself…

42. Query Performance •  ElasticSearch: 5s, 10s, 60s, 2m… •  Kylin: ~1s *Benchmarking caveats…

43. It’s not all 🍑s

44. ★ ★ ★ Epilogue ★ ★ ★

45. Simplifying your stack

46. Scaling efficiently

47. Find the right tools

48. Perhaps some other things for your toolbelt: OpenResty/Lua Kylin API Umbrella

49. Sleeping Material •  Lua – https://github.com/NREL/api-umbrella/issues/86 – https://github.com/NREL/api-umbrella/pull/183 •  Kylin – https://github.com/18F/api.data.gov/issues/235

50. ★ ★ ★ Questions? ★ ★ ★ Nick Muerdter nick.muerdter@nrel.gov @nickblah