hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes

Building Online HBase Cluster 
of  
Zhihu Based on Kubernetes

HBase at Zhihu
Agenda
PPT模板：www.1ppt.com/moban/ PPT素材：www.1ppt.com/sucai/
PPT背景：www.1ppt.com/beijing/ PPT图表：www.1ppt.com/tubiao/
PPT下载：www.1ppt.com/xiazai/ PPT教程： www.1ppt.com/powerpoint/
资料下载：www.1ppt.com/ziliao/ 范⽂下载：www.1ppt.com/fanwen/
试卷下载：www.1ppt.com/shiti/ 教案下载：www.1ppt.com/jiaoan/
PPT论坛：www.1ppt.cn PPT课件：www.1ppt.com/kejian/
语⽂课件：www.1ppt.com/kejian/yuwen/ 数学课件：www.1ppt.com/kejian/shuxue/
英语课件：www.1ppt.com/kejian/yingyu/ 美术课件：www.1ppt.com/kejian/meishu/
科学课件：www.1ppt.com/kejian/kexue/ 物理课件：www.1ppt.com/kejian/wuli/
化学课件：www.1ppt.com/kejian/huaxue/ ⽣物课件：www.1ppt.com/kejian/shengwu/
地理课件：www.1ppt.com/kejian/dili/ 历史课件：www.1ppt.com/kejian/lishi/
Using Kubernetes HBase Online Platform

HBase Online Platform
Using Kubernetes
HBase at Zhihu

• Offline
• Physical machine, hundreds of nodes.
• Work with Spark/Hadoop.
• Online
• Based on Kubernetes, more than 300 containers.
HBase at Zhihu
01
02

Our online storage
01
02
03
MySQL
used in most business
some need scale, some need transform
all SSD，expensive
Redis
cache and partial storage
no shard
expensive
HBase / Cassandra / RocksDB etc. ?

Challenges at the beginning
• All business at one big cluster
• Also runs NodeManager and ImpalaServer
• Basically operation
• Physical node level monitor

What we want
• From Business Sight
• environment isolation
• SLA definition
• business level monition
• From Operation Sight
• balance resource ( CPU, I/O, RAM )
• friendly api
• controllable costs
01
02

Make HBase as a Service.
In short:

Zhihu’s Unified Cluster Manage Platfom

HBase online cluster
• Platform controls cluster
• Kubernetes schedule resources
• Shared HDFS and ZK
• Expose ZK address or ThriftServer to user

Kubernetes
Cluster resource manager and scheduler
Using container to isolate resource
Application management
Perfect API and active community
01
02
03
04

Component Design
• Pod
• infrastructure component
• one Pod per component
• ReplicationController -> HA
• Define A cluster
• 1 HMaster RC ( replica = 2 )
• 1 RegionServer RC ( replica = n, n >=1 )
• 1 ThriftServer RC ( replica = m, m>=0 )

Failover Design
Data Replication
Component Level
Cluster Level

• HMaster -> use ZooKeeper
• RegionServer -> Stateless designed
• ThriftServer -> use proxy
• HFile -> ???
Component Level

Component Level - HFile
• Shared HDFS Cluster
• Keep the whole cluster stateless

Cluster Level
• What if cluster Pod is down ?
• Kubernetes ReplicationController
• What if Kubernetes is down ?
• Mixed deployment
• Few physical nodes with high CPU && RAM

Data Replication
• Replication in cluster
• HDFS built in ( 3 replicas)
• period hdfs fsck
• Replication between clusters
• snapshot + bulk load
• offline cluster doing MR / Spark
01
02

Physical Node Resource
CPU: 2 * 12 core
Disk: 4 T
Memory: 128 G

Resource Definition (1)
• Minimize the resource
• Business scaled by number of containers
• Pros
• maximum resource usage on nodes
• simplified debug
• ease scale
• Cons
• minimum resource not easy to define by business
• hardly tune params for RAMs and GC

Resource Definition (2)
• Customize container resource by business
• Business scaled by number of containers
• Pros
• flexible RAM config and tuning
• used in production

Container Configuration
• JAVA_HOME HBASE_HOME
• inject to container via ENV
• hdfs-site.xml core-site.xml
• add xml config to container
• hbase-site.xml hbase-env.sh
• use start-env.sh to init configuration
• Modify params during cluster running is permitted

RegionServer Configuration
• Basie Config
• hbase.hregion.majorcompaction = 0
• hbase.regionserver.handler.count = 50
• hbase.regionserver.codecs ＝ snappy
• hfile.block.cache.size = 0.4
• Using G1GC ( thanks to Xiaomi )

Network
• Dedicated ip per pod
• DNS register/deregister automatically
• Modified /etc/hosts for pod

Client Design
• For Java/Scala
• native HBase client
• only offer ZK address to business
• For Python
• happybase
• client proxy
• service discovery

API Server
• A Bridge between Kubernetes and user
• Encapsulate component of a HBase cluster
• Restful API
• Friendly interface

Painful Points
• Cons:
• fully scan still impact whole cluster
• speed limited coprocessor
• locality && short circuit
• SSD Disk

Monitor Cluster
• Physical nodes Level
• nodes cpu loads && usage ( via IT )
• Cluster Level
• Pods cpu loads ( via cAdvisor)
• read && write rate , P95, cacheHit ( via JMX)
• Table Level
• client write speed && read latency ( via tracing )
• thrift server ( via JMX )

Current Situation
• 10+ online business, 300+ Pods
• P95 average 20-30 ms
• 99.99% SLA in 9 months

Benefits
Easy
Isolate
Flexible

• Almost no code needed
• HBase container publish independently
• Deployment and orchestration straight forward
• Decoupled from physical nodes
Easy

• Resource isolation
• CPU
• memory
• Business isolation
• data
• proxy
• monitor
Isolate

• Multi version
• mostly cdh5.5.0-hbase1.0.0
• one upgrade to 1.2 (HBASE-14283)
• customize version easily
• Configuration motivated by business
• low latency -> read replica
• etc.
Flexible

• Enhance performance
• Netty on ThriftServer
• Python HBase Client
• SSD for Datanode
• Auto scale
• by RegionServer number
• by JVM heap
• etc.
Next

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes

Similar to hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes