SlideShare a Scribd company logo
1 of 44
2 0 1 6 . 0 8
F r o m J a v a S t r e a m t o J a v a D a t a F r a m e
P o p c o r n y ( 陸 振 恩 )
Outline
• 動機
• 從Java Stream到DataFrame的歷程
• Poppy簡介
• Demo
動機
• TenMax是一個廣告平台
• 廣告就是要看報表
• 所有發生的event我們稱為rawlog
• rawlog每個小時產生一次Aggregated Data
• 看報表時可以以選擇一個時間區間,根據某些維度(dimensions),可以看出某些數值(metrics)
• 這是常見的OLAP技巧
Raw Log
Aggregated Data
(Cube)
Batch
aggregateIngest
Interactive
Query
如果是單純的RDBMS
RDBMS
(RawLog)
RDBMS
(Cube)
Batch
aggregateIngest
Interactive
Query
RDBMS的困境
• 傳統的RDBMS不適合非常大量的Log Ingestion
• 更適合的有
– DFS: 但是Append-Only的環境比較適合
– Cassandra/Hbase: 除了Insert, 還可以Row-based的update, delete, partition scan
DFS or
Cassandra
RDBMS
Batch
aggregateIngest
Interactive
Query
但是,Aggregation就要自己來了
Aggregation有哪些Solution
• Computation Engine
– Hadoop MapReduce
– Hive
– Spark SQL
– Impala
• 但是都有以下的缺點
– 原本的設計都是針對Cluster環境所設計
– Heavy weight
– 過多的Dependency (如果要把driver包在自己的程式中)
– 只對HDFS-Compatible的data source比較友善
– Job啟動速度
– 如果要定義自己的UDF / UDAF 會很複雜
– 學習門檻
– 維運門檻
– …..
這些對大數據都是很好的Solution,
但是對中數據呢?
中數據
• 資料量
– 一天會新增1G ~ 1T uncompressed data
• 假設
– 一筆record = 1K, 1T資料 = 10億筆資料
– CPU 1 core一秒可以處理1萬筆資料
– 四核一天可以處理34.56億筆資料
• 其實一台機器綽綽有餘
• 更何況雲端機器可以Scale up,到16核都不是問題
• I/O跟Network throughput漸漸不是瓶頸
• 單機跑的solution可以減少很多的overhead
• 單程序跑的solution也好寫好debug
那就自己來寫Aggregation吧
Java8
• 語言特色 Lambda
• 三神器
– Stream
– Optional
– CompletableFuture
Java Stream
• Functional Reactive Programming (FRP)
• Pipeline Style,Input透過一站一站的transformation最後輸出到Output
• Streaming的特性,非常少的Memory Footprint,可以處理非常大量的資料。
forEach()map() filter() flapMap() peek()
那Aggregation呢?
先了解一下SQL吧
From
RawLog
Where
(DayRange)
Group
By
sum(),sum(),sum()
hour=?,dim1=?,dim2=?
val1, val2, val3
sum(),sum(),sum()
hour=?,dim1=?,dim2=?
val1, val2, val3
sum(),sum(),sum()
hour=?,dim1=?,dim2=?
val1, val2, val3
sum(),sum(),sum()
hour=?,dim1=?,dim2=?
val1, val2, val3
sum(),sum(),sum()
hour=?,dim1=?,dim2=?
val1, val2, val3
sum(),sum(),sum()
hour=?,dim1=?,dim2=?
val1, val2, val3
Java Stream Aggregation
From
Where
GroupBy Aggregation
count(), sum()
然後Mapper是
然後Reducer是
Java Stream
• 對於這種應用好像有點複雜
• 不太好用的平行處理
• java.util.stream.Collector對於多metrics的aggregation很麻煩
• 有些時候我們想要的是Column Based的操作,而不是單純的對一個Type操作
所以我們開發了Poppy
http://tenmax.github.io/poppy/
Introduction to Poppy
• Poppy是一個Java的DataFrame Library
• 什麼是Data Frame?
– Column based (Schema)
– 可以做類似RDBMS的相關操作 select, from, where, group by, aggregation, order by
• Poppy還有以下特色
– Stream based (適合較大數據)
– 支援partition以及平行計算
– User Defined Function, User Defined Aggregation Function
– Lightweight
• 其實就是有Schema版本的Java Stream
http://tenmax.github.io/poppy/
Poppy大概長這樣
from
where
group by
aggregation
That’s All!!
Poppy
• Pipeline分成三部分
– Input
– Operations
– Output
http://tenmax.github.io/poppy/
OutputOperation Operation Operation OperationInput
Input
• By Iterable
DataFrame.from(Class<T> clazz, java.util.Iterable... iterables)
• By DataSource
DataFrame.from(io.tenmax.DataSource dataSource)
• 其中DataSource的定義是
http://tenmax.github.io/poppy/
Output
• iterator(), forEach()
• toList(),toMap(), print()
• DataFrame.to(DataSink dataSink)
• 其中DataSink的定義是
http://tenmax.github.io/poppy/
Operations
• project()
• filter()
• Aggregation()
• groupby()
• Sort()
• distinct()
• peek()
• cache()
http://tenmax.github.io/poppy/
Projection (Select)
http://tenmax.github.io/poppy/
Filter (Where, Having)
http://tenmax.github.io/poppy/
Aggregation (Count, Sum, Avg, …)
http://tenmax.github.io/poppy/
Sort (Order by)
http://tenmax.github.io/poppy/
Distinct
http://tenmax.github.io/poppy/
Demo
http://tenmax.github.io/poppy/
User-Defined Function
http://tenmax.github.io/poppy/
• 使用 java.util.function,Function<T,R>
User-Defined Aggregation Function
http://tenmax.github.io/poppy/
• 使用 java.util.stream,Collector<T,A,R>
平行計算
• Partition是平行的基本單位
• 一個DataSource可以提供多個Partition
• 透過dataFrame.parallel(n)來決定平行的thread個數
http://tenmax.github.io/poppy/
Execution Context
• 一個Execution Context代表的是一個thread pool。
• 在其中可能有 n 個threads,以及 m 個partitions
• 通常m >= n,每個thread在處理完一個partition之後,會去拉下一個還未處理的partition
http://tenmax.github.io/poppy/
Execution Context
• 每次呼叫aggregation, sort, distinct會產生一個新的execution context。
http://tenmax.github.io/poppy/
Demo
http://tenmax.github.io/poppy/
Conclusion
• Java Stream對於Column-based的需求不太容易處理。
• 我們提供的DataFrame Library – Poppy提供了更簡單的方法來處理Column-based的資料。
• 可以很輕易的平行化來處理大量的資料。
• 但是又非常的lightweight
http://tenmax.github.io/poppy/
Conclusion
• Java Stream對於Column-based的需求不太容易處理。
• 我們提供的DataFrame Library – Poppy提供了更簡單的方法來處理Column-based的資料。
• 可以很輕易的平行化來處理大量的資料。
• 但是又非常的lightweight
http://tenmax.github.io/poppy/
Reference
• Project Site - http://tenmax.github.io/poppy/
• Poppy User Manual - http://tenmax.github.io/poppy/
• Poppy Javadoc - http://tenmax.github.io/poppy/docs/javadoc/index.html
• Java多執行緒的基本知識 - https://www.gitbook.com/book/popcornylu/java_multithread/details
• pq - https://github.com/tenmax/pq
http://tenmax.github.io/poppy/
如果覺得不錯的話請幫我打一個星星
http://tenmax.github.io/poppy/
Thank you! Question?
http://tenmax.github.io/poppy/

More Related Content

What's hot

淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践Min Zhou
 
MapReduce 簡單介紹與練習
MapReduce 簡單介紹與練習MapReduce 簡單介紹與練習
MapReduce 簡單介紹與練習孜羲 顏
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Wei-Yu Chen
 
Google LevelDB Study Discuss
Google LevelDB Study DiscussGoogle LevelDB Study Discuss
Google LevelDB Study Discusseverestsun
 
Leveldb background
Leveldb backgroundLeveldb background
Leveldb background宗志 陈
 
分布式Key Value Store漫谈
分布式Key Value Store漫谈分布式Key Value Store漫谈
分布式Key Value Store漫谈Tim Y
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocessbabel_qi
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Wei-Yu Chen
 
HDFS與MapReduce架構研討
HDFS與MapReduce架構研討HDFS與MapReduce架構研討
HDFS與MapReduce架構研討Billy Yang
 
Elastic stack day-2
Elastic stack day-2Elastic stack day-2
Elastic stack day-2YI-CHING WU
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionTianwei Liu
 
SACC2015 ”互联网+“任重而道远-白金&高春辉
SACC2015 ”互联网+“任重而道远-白金&高春辉SACC2015 ”互联网+“任重而道远-白金&高春辉
SACC2015 ”互联网+“任重而道远-白金&高春辉ptcracker
 
ClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaJack Gao
 
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering Ceph Community
 
基于Spring batch的大数据量并行处理
基于Spring batch的大数据量并行处理基于Spring batch的大数据量并行处理
基于Spring batch的大数据量并行处理Jacky Chi
 
诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 maclean liu
 
淘宝分布式数据处理实践
淘宝分布式数据处理实践淘宝分布式数据处理实践
淘宝分布式数据处理实践isnull
 

What's hot (20)

淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践淘宝Hadoop数据分析实践
淘宝Hadoop数据分析实践
 
MapReduce 簡單介紹與練習
MapReduce 簡單介紹與練習MapReduce 簡單介紹與練習
MapReduce 簡單介紹與練習
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計
 
Google LevelDB Study Discuss
Google LevelDB Study DiscussGoogle LevelDB Study Discuss
Google LevelDB Study Discuss
 
Leveldb background
Leveldb backgroundLeveldb background
Leveldb background
 
分布式Key Value Store漫谈
分布式Key Value Store漫谈分布式Key Value Store漫谈
分布式Key Value Store漫谈
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系
 
Hadoop hive
Hadoop hiveHadoop hive
Hadoop hive
 
Hantuo openstack
Hantuo openstackHantuo openstack
Hantuo openstack
 
HDFS與MapReduce架構研討
HDFS與MapReduce架構研討HDFS與MapReduce架構研討
HDFS與MapReduce架構研討
 
Elastic stack day-2
Elastic stack day-2Elastic stack day-2
Elastic stack day-2
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Level db
Level dbLevel db
Level db
 
SACC2015 ”互联网+“任重而道远-白金&高春辉
SACC2015 ”互联网+“任重而道远-白金&高春辉SACC2015 ”互联网+“任重而道远-白金&高春辉
SACC2015 ”互联网+“任重而道远-白金&高春辉
 
ClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @Sina
 
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering Ceph Day Beijing: Optimizations on Ceph Cache Tiering
Ceph Day Beijing: Optimizations on Ceph Cache Tiering
 
基于Spring batch的大数据量并行处理
基于Spring batch的大数据量并行处理基于Spring batch的大数据量并行处理
基于Spring batch的大数据量并行处理
 
诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础
 
淘宝分布式数据处理实践
淘宝分布式数据处理实践淘宝分布式数据处理实践
淘宝分布式数据处理实践
 

Viewers also liked

Spring Booted, But... @JCConf 16', Taiwan
Spring Booted, But... @JCConf 16', TaiwanSpring Booted, But... @JCConf 16', Taiwan
Spring Booted, But... @JCConf 16', TaiwanPei-Tang Huang
 
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務Mu Chun Wang
 
Apache Zeppelin 소개
Apache Zeppelin 소개Apache Zeppelin 소개
Apache Zeppelin 소개KSLUG
 
Design Patterns這樣學就會了:入門班 Day1 教材
Design Patterns這樣學就會了:入門班 Day1 教材Design Patterns這樣學就會了:入門班 Day1 教材
Design Patterns這樣學就會了:入門班 Day1 教材teddysoft
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
[113]apache zeppelin 이문수
[113]apache zeppelin 이문수[113]apache zeppelin 이문수
[113]apache zeppelin 이문수NAVER D2
 
那些 Functional Programming 教我的事
那些 Functional Programming 教我的事那些 Functional Programming 教我的事
那些 Functional Programming 教我的事Wen-Tien Chang
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Chen-en Lu
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)Henning Spjelkavik
 
Java 8, Streams & Collectors, patterns, performances and parallelization
Java 8, Streams & Collectors, patterns, performances and parallelizationJava 8, Streams & Collectors, patterns, performances and parallelization
Java 8, Streams & Collectors, patterns, performances and parallelizationJosé Paumard
 
Java 8 Stream API and RxJava Comparison
Java 8 Stream API and RxJava ComparisonJava 8 Stream API and RxJava Comparison
Java 8 Stream API and RxJava ComparisonJosé Paumard
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Taejun Kim
 
MLDM Monday -- Optimization Series Talk
MLDM Monday -- Optimization Series TalkMLDM Monday -- Optimization Series Talk
MLDM Monday -- Optimization Series TalkJerry Wu
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Helena Edelson
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinNigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinZenika
 

Viewers also liked (20)

Spring Booted, But... @JCConf 16', Taiwan
Spring Booted, But... @JCConf 16', TaiwanSpring Booted, But... @JCConf 16', Taiwan
Spring Booted, But... @JCConf 16', Taiwan
 
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務
 
Ionic2
Ionic2Ionic2
Ionic2
 
Apache Zeppelin 소개
Apache Zeppelin 소개Apache Zeppelin 소개
Apache Zeppelin 소개
 
Design Patterns這樣學就會了:入門班 Day1 教材
Design Patterns這樣學就會了:入門班 Day1 教材Design Patterns這樣學就會了:入門班 Day1 教材
Design Patterns這樣學就會了:入門班 Day1 教材
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
[113]apache zeppelin 이문수
[113]apache zeppelin 이문수[113]apache zeppelin 이문수
[113]apache zeppelin 이문수
 
那些 Functional Programming 教我的事
那些 Functional Programming 教我的事那些 Functional Programming 教我的事
那些 Functional Programming 教我的事
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
 
Java 8, Streams & Collectors, patterns, performances and parallelization
Java 8, Streams & Collectors, patterns, performances and parallelizationJava 8, Streams & Collectors, patterns, performances and parallelization
Java 8, Streams & Collectors, patterns, performances and parallelization
 
Java 8 Stream API and RxJava Comparison
Java 8 Stream API and RxJava ComparisonJava 8 Stream API and RxJava Comparison
Java 8 Stream API and RxJava Comparison
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
 
MLDM Monday -- Optimization Series Talk
MLDM Monday -- Optimization Series TalkMLDM Monday -- Optimization Series Talk
MLDM Monday -- Optimization Series Talk
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinNigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
 

Similar to From Java Stream to Java DataFrame

Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)涛 吴
 
Jvm memory
Jvm memoryJvm memory
Jvm memorybenewu
 
前端自動化工具
前端自動化工具前端自動化工具
前端自動化工具國昭 張
 
千呼萬喚始出來的 Java SE 7
千呼萬喚始出來的 Java SE 7千呼萬喚始出來的 Java SE 7
千呼萬喚始出來的 Java SE 7Justin Lin
 
分布式缓存与队列
分布式缓存与队列分布式缓存与队列
分布式缓存与队列XiaoJun Hong
 
为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)Kris Mok
 
大规模网站架构
大规模网站架构大规模网站架构
大规模网站架构drewz lin
 
Jmm与map reduce简介
Jmm与map reduce简介Jmm与map reduce简介
Jmm与map reduce简介huozhanfeng
 
Hbase介绍
Hbase介绍Hbase介绍
Hbase介绍Kay Yan
 
Java trouble shooting
Java trouble shootingJava trouble shooting
Java trouble shootingMin Zhou
 
Truck js 高性能移动web开发解决方案
Truck js 高性能移动web开发解决方案Truck js 高性能移动web开发解决方案
Truck js 高性能移动web开发解决方案美团技术团队
 
寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事Chieh (Jack) Yu
 
淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践丁 宇
 
Jvm基础调优实践(v1.0)
Jvm基础调优实践(v1.0)Jvm基础调优实践(v1.0)
Jvm基础调优实践(v1.0)ddviplinux
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群hdhappy001
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Hanborq Inc.
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...acelyc1112009
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 

Similar to From Java Stream to Java DataFrame (20)

Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
 
Jvm memory
Jvm memoryJvm memory
Jvm memory
 
前端自動化工具
前端自動化工具前端自動化工具
前端自動化工具
 
千呼萬喚始出來的 Java SE 7
千呼萬喚始出來的 Java SE 7千呼萬喚始出來的 Java SE 7
千呼萬喚始出來的 Java SE 7
 
分布式缓存与队列
分布式缓存与队列分布式缓存与队列
分布式缓存与队列
 
为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)为啥别读HotSpot VM的源码(2012-03-03)
为啥别读HotSpot VM的源码(2012-03-03)
 
Databases on AWS
Databases on AWSDatabases on AWS
Databases on AWS
 
大规模网站架构
大规模网站架构大规模网站架构
大规模网站架构
 
Jmm与map reduce简介
Jmm与map reduce简介Jmm与map reduce简介
Jmm与map reduce简介
 
Hbase介绍
Hbase介绍Hbase介绍
Hbase介绍
 
Java trouble shooting
Java trouble shootingJava trouble shooting
Java trouble shooting
 
HBase
HBaseHBase
HBase
 
Truck js 高性能移动web开发解决方案
Truck js 高性能移动web开发解决方案Truck js 高性能移动web开发解决方案
Truck js 高性能移动web开发解决方案
 
寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事寫出高性能的服務與應用 那些你沒想過的事
寫出高性能的服務與應用 那些你沒想過的事
 
淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践
 
Jvm基础调优实践(v1.0)
Jvm基础调优实践(v1.0)Jvm基础调优实践(v1.0)
Jvm基础调优实践(v1.0)
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 

From Java Stream to Java DataFrame