Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1OKo5FN.
Danny Yuan discusses how stream processing is used in Uber's real-time system to solve a wide range of problems, including but not limited to real-time aggregation and prediction on geospatial time series, data migration, monitoring and alerting, and extracting patterns from data streams. Yuan also presents the architecture of the stream processing pipeline. Filmed at qconsf.com.
Danny Yuan is a software engineer in Uber. He's currently working on streaming systems for Uber's logistics platform. Prior to joining Uber, he worked on building Netflix's cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix's low-latency crypto services.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/uber-stream-processing
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
40. Data Type
• Dimensional Temporal Spatial Data
Dimension Value
state driver_arrived
vehicle type uber X
timestamp 13244323342
lattitude 12.23
longitude 30.00
41. Data Query
• OLAP on single-table temporal-spatial data
SELECT
<agg
functions>,
<dimensions>
FROM
<data_source>
WHERE
<boolean
filter>
GROUP
BY
<dimensions>
HAVING
<boolean
filter>
ORDER
BY
<sorting
criterial>
LIMIT
<n>
DO
<post
aggregation>
103. Complex Event Processing
FROM
driver_canceled#window.time(10
min)
SELECT
clientUUID,
count(clientUUID)
as
cancelCount
GROUP
BY
clientUUID
HAVING
cancelCount
>
10
INSERT
INTO
hipchat(room);