GPUs in Big Data - StampedeCon 2014

•

13 likes•4,272 views

At StampedeCon 2014, John Tran of NVIDIA presented "GPUs in Big Data." Modern graphics processing units (GPUs) are massively parallel general-purpose processors that are taking Big Data by storm. In terms of power efficiency, compute density, and scalability, it is clear now that commodity GPUs are the future of parallel computing. In this talk, we will cover diverse examples of how GPUs are revolutionizing Big Data in fields such as machine learning, databases, genomics, and other computational sciences.

Technology

BIG DATA IN GPUS
John Tran | StampedeCon2014, May 29 2014, St Louis, MO

“If you were plowing a field, which
would you rather use? Two strong
oxen or 1024 chickens?”
—Seymour Cray

Example CPU: Xeon E5-2687W
! 2.27 B transistors
! 8 cores, 16 threads @ 3.1 GHz
! 0.35 SP TFLOPS
! 0.17 DP TFLOPS
! 256 GB DDR3 @1600 MHz
! 51.2 GB/s
! 150 W
! 20 MB L3 cache
! Single thread Perf
! branch prediction
! out of order execution

Example GPU: Tesla K40
! 7.1 B transistors
! 2880 cores, 30720 threads @
745 MHz
! 4.29 SP TFLOPS
! 1.43 DP TFLOPS
! 12 GB GDDR5 @ 3GHz
! 288 GB/s memory BW
! 235 W
! PCIE Gen3 x16
! 12 GB/s

Math and memory peak throughput
4.29
TFLOPS Xeon E5-2687-W Tesla K40
0.35 0.17
1.43
5
4
3
2
1
0
SP TFLOPS DP TFLOPS
51.2
288
400
300
200
100
0
Memory BW
GB/s
Xeon E5-2687W Tesla K40

The Chickens are Winning
! Parallel computing is no longer “the future”
! If you are not parallel, you are already behind
! GPUs win in
! Performance == $$
! Power == $$
! Cost == $$

The Basic Idea – Accelerated Computing
Application Code
Compute-Intensive Functions
Rest of Sequential
CPU Code
GPU CPU
CUDA

$Quick CUDA C example Standard C Code Parallel C Code void saxpy(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } int N = 1<<20; // Perform SAXPY on 1M elements saxpy(N, 2.0, x, y); __global__ void saxpy(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int N = 1<<20; cudaMemcpy(x, d_x, N, cudaMemcpyHostToDevice); cudaMemcpy(y, d_y, N, cudaMemcpyHostToDevice); // Perform SAXPY on 1M elements saxpy<<<4096,256>>>(N, 2.0, x, y); cudaMemcpy(d_y, y, N, cudaMemcpyDeviceToHost); http://developer.nvidia.com/cuda-toolkit$

How else can you program it?
! Libraries
! Thrust, BLAS, SPARSE, FFT, NPP, RAND
! Directives
! OpenACC
! Languages
! CUDA C, CUDA C++, thrust, python, fortran, C++ proposal, matlab, gpu.net
! Learn
! “get cuda,” Udacity, Coursera

! 90 M monthly active users
! 17 M tracks tagged / day
! 27 M tracks in DB
“GPUs enable us to handle our tremendous processing needs at a
substantial cost savings, delivering twice the performance per dollar
compared to a CPU-based system.”
-Jason Titus, CTO, Shazam

Deep Neural Networks for image
classification

Google Datacenter Stanford AI Lab
1000 CPU Servers
600 kWatts
$5,000,000
3 GPU-Accelerated Servers
3.6 kWatts
$21,000
Deep learning with COTS HPC systems, A Coates, B Huval, T Wang, D Wu, A Ng, B Catanzaro, NIPS 2013

The DataScope at JHU
5PB of science data (in 2010)
“The Data-Scope will allow us to mine out
relationships among data that already exist
but that we can’t yet handle and to sift
discoveries from what seems like an
overwhelming flow of information.
New discoveries will definitely emerge this
way. There are relationships and patterns
that we just cannot fathom buried in that
onslaught of data. Data-Scope will tease
these out.”
– Alex Szalay, JHU

Beating Heart Surgery
Patient stands to lose 1 point of IQ every
10 min with heart stopped
Only ~2% of heart surgeons will operate
on a beating heart
GPU enables real-time motion
compensation to virtually stop beating
heart for surgeons:
Courtesy Laboratoire d’Informatique de Robotique et de Microelectronique de Montpellier

Final Thoughts
! Parallel computing is here
! Re-think parallel or get left behind
! Scale up before scaling out
! Several orders of magnitude parallelism increase by using a GPU
! Do you really need a cluster?
! GPUs are the most efficient solution for parallel problems
! Perf / $
! Perf / Watt

What's hot

Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Patrick McFadin

Large volume data analysis on the Typesafe Reactive Platform

Martin Zapletal

In this talk you will discover various state-of-the-art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs, their intended use-cases, and how to choose between them. Petr will focus on the popular frameworks, including Spark Streaming, Storm, Samza and Flink. You will also explore theoretical introduction, common pitfalls, popular architectures, and much more. The demand for stream processing is increasing. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications, include trading, social networks, the Internet of Things, and system monitoring, are becoming more and more important. A number of powerful, easy-to-use open source platforms have emerged to address this. Petr's goal is to provide a comprehensive overview of modern streaming solutions and to help fellow developers with picking the best possible solution for their particular use-case. Join this talk if you are thinking about, implementing, or have already deployed a streaming solution.

Distributed real time stream processing- why and how

Petr Zapletal

This session will discuss how Cassandra/Solr can be used to create real-time analytics platform – jKool. jKool provides an in-memory analysis of time-series data, automatically performing sequencing, correlation, grouping, enriching, synchronizing, computing, querying and displaying data streams. The session will discuss architecture, challenges and approaches taken to create a real-time analytics platform on top of open source big data analytics platforms: Cassandra, Solr, Kafka & Spark.

How We Used Cassandra/Solr to Build Real-Time Analytics Platform

DataStax Academy

[262] netflix 빅데이터 플랫폼

NAVER D2

Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs

Jim Dowling

Real-Time Analytics with Kafka, Cassandra and Storm

John Georgiadis

Presentation of an investigation into how Python's RDFLib and SQLAlchemy can be used to leverage PostgreSQL's capabilities to provide a persistent storage back-end for Graphs, and become the elusive practical RDF triple store for the Semantic Web (or simply help you export your data to someone who's expecting RDF)! Talk presented at FOSDEM 2017 in Brussels on 04-05/02/2017. Practical & hands-on presentation with example code which is certainly not optimal ;) Video: MP4: http://video.fosdem.org/2017/H.1309/postgresql_semantic_web.mp4 WebM/VP8: http://ftp.osuosl.org/pub/fosdem/2017/H.1309/postgresql_semantic_web.vp8.webm

Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database

Jimmy Angelakos

Scaling TensorFlow with Hops, Global AI Conference Santa Clara

Jim Dowling

Debugging & Tuning in Spark

Shiao-An Yuan

Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...

DataStax Academy

PySpark in practice slides

Dat Tran

Valerii Vasylkov Erlang. measurements and benefits.

Аліна Шепшелей

Multi-Tenant Storm Service on Hadoop Grid

DataWorks Summit

The Hadoop Ecosystem

Mathias Herberts

Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...

DataWorks Summit

Cassandra is the dominant data store used at Netflix and it's health is critical to many of its services. In this talk we will share details of the recent redesign of our health monitoring system and how we leveraged a reactive stream processing system to give us a real-time view our entire fleet while dramatically improving accuracy and reducing false alarms in our alerting. About the Speaker Jason Cacciatore Senior Software Engineer, Netflix Jason Cacciatore is a Senior Software Engineer at Netflix, where he's been working for the past several years. He's interested in stateful distributed systems and has a diverse background in technology. In his spare time he enjoys spending time with his wife and two sons, reading non-fiction, and watching Netflix documentaries.

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

DataStax

This session explores best practices of creating both unit and integration tests for Spark programs as well as acceptance tests for the data produced by our Spark jobs. We will explore the difficulties with testing streaming programs, options for setting up integration testing with Spark, and also examine best practices for acceptance tests. Unit testing of Spark programs is deceptively simple. The talk will look at how unit testing of Spark itself is accomplished, as well as factor out a number of best practices into traits we can use. This includes dealing with local mode cluster creation and teardown during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL. Testing Spark Streaming programs has a number of interesting problems. These include handling of starting and stopping the Streaming context, and providing mock data and collecting results. As with the unit testing of Spark programs, we will factor out the common components of the tests that are useful into a trait that people can use. While acceptance tests are not always part of testing, they share a number of similarities. We will look at which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job. Relevant Spark Packages & Code: https://github.com/holdenk/spark-testing-base / http://spark-packages.org/package/holdenk/spark-testing-base https://github.com/holdenk/spark-validator

Effective testing for spark programs Strata NY 2015

Holden Karau

유연하고 확장성 있는 빅데이터 처리

NAVER D2

Why your Spark job is failing

Sandy Ryza

What's hot (20)

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Large volume data analysis on the Typesafe Reactive Platform

Distributed real time stream processing- why and how

How We Used Cassandra/Solr to Build Real-Time Analytics Platform

[262] netflix 빅데이터 플랫폼

Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs

Real-Time Analytics with Kafka, Cassandra and Storm

Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database

Scaling TensorFlow with Hops, Global AI Conference Santa Clara

Debugging & Tuning in Spark

Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...

PySpark in practice slides

Valerii Vasylkov Erlang. measurements and benefits.

Multi-Tenant Storm Service on Hadoop Grid

The Hadoop Ecosystem

Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Effective testing for spark programs Strata NY 2015

유연하고 확장성 있는 빅데이터 처리

Why your Spark job is failing

Viewers also liked

Deep learning on spark

Satyendra Rana

Video replay: http://nvidia.fullviewmedia.com/siggraph2012/ondemand/SS106.html Location: West Hall Meeting Room 503, Los Angeles Convention Center Date: Wednesday, August 8, 2012 Time: 2:40 PM – 3:40 PM The future of GPU-based visual computing integrates the web, resolution-independent 2D graphics, and 3D to maximize interactivity and quality while minimizing consumed power. See what NVIDIA is doing today to accelerate resolution-independent 2D graphics for web content. This presentation explains NVIDIA's unique "stencil, then cover" approach to accelerating path rendering with OpenGL and demonstrates the wide variety of web content that can be accelerated with this approach. More information: http://developer.nvidia.com/nv-path-rendering

SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering

Mark Kilgard

Presented at the GPU Technology Conference 2012 in San Jose, California. Tuesday, May 15, 2012. Standards such as Scalable Vector Graphics (SVG), PostScript, TrueType outline fonts, and immersive web content such as Flash depend on a resolution-independent 2D rendering paradigm that GPUs have not traditionally accelerated. This tutorial explains a new opportunity to greatly accelerate vector graphics, path rendering, and immersive web standards using the GPU. By attending, you will learn how to write OpenGL applications that accelerate the full range of path rendering functionality. Not only will you learn how to render sophisticated 2D graphics with OpenGL, you will learn to mix such resolution-independent 2D rendering with 3D rendering and do so at dynamic, real-time rates.

GTC 2012: GPU-Accelerated Path Rendering

Mark Kilgard

GPU Ecosystem

Ofer Rosenberg

Computational Techniques for the Statistical Analysis of Big Data in R

herbps10

Matrix factorization (MF) is widely used in recommendation systems. We present cuMF, a highly-optimized matrix factorization tool with supreme performance on graphics processing units (GPUs) by fully utilizing the GPU compute power and minimizing the overhead of data movement. Firstly, we introduce a memory-optimized alternating least square (ALS) method by reducing discontiguous memory access and aggressively using registers to reduce memory latency. Secondly, we combine data parallelism with model parallelism to scale to multiple GPUs. Results show that with up to four GPUs on one machine, cuMF can be up to ten times as fast as those on sizable clusters on large scale problems, and has impressively good performance when solving the largest matrix factorization problem ever reported.

Accelerating Machine Learning Applications on Spark Using GPUs

IBM

PG-Strom - GPU Accelerated Asyncr

Kohei KaiGai

From social networks to protein networks to financial transactions, graphs are everywhere. Graph Analytics represent a key tool for data science to take advance of this type of network information. Many “Bigdata” and NoSQL techniques for analysis and data science that work well for relational and structured data, do not scale effectively when applied to challenges in graph analytics and traversal algorithms. The data locality and graph access patterns challenge existing HW architectures and place a premium on bandwidth to main memory.GPUs currently have 10X advantage over CPUs in this area. The advantage is projected to grow to 100X by 2016. This talk will discuss why GPUs are game-changer by dramatically improving the price-performance ratio for very large graph analytics over existing technologies. It will present results for work in GPU Acceleration of graph analytics within both research and industry applications.

Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...

odsc

In this video from SC13, Vinod Tipparaju presents an Heterogeneous System Architecture Overview. "The HSA Foundation seeks to create applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access enabling greater application performance at low power consumption. The Foundation is defining key interfaces for parallel computation utilizing CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing." Learn more: http://hsafoundation.com/ Watch the video presentation: http://wp.me/p3RLHQ-aXk

Heterogeneous System Architecture Overview

inside-BigData.com

PG-Strom - GPGPU meets PostgreSQL, PGcon2015

Kohei KaiGai

Wendell Kuling works as a Data Scientist at ING in the Wholesale Banking Advanced Analytics team. Their projects aim to provide better services to corporate customers of ING, by using innovative techniques from data-science. In this talk, Wendell covers key insights from their experience in matching large datasets based on names. After covering the key algorithms and packages ING uses for name matching, Wendell will share his best-practice approach in applying these algorithms at scale… would you bet on a Cruncher (48-CPU/512 MB RAM machine), a Tesla (Cuda Tesla K80 with 4992 cores, 24GB memory) or a Spark cluster (80 cores/2,5 TB memory)?

PyData Amsterdam - Name Matching at Scale

GoDataDriven

Hadoop + GPU

Vladimir Starostenkov

Deep Learning on Hadoop

DataWorks Summit

From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...

Spark Summit

At the recent sold-out Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered a lightning talk called DeepLearning4J and Spark: Successes and Challenges. Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface. Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.

DeepLearning4J and Spark: Successes and Challenges - François Garillot

sparktc

How to Solve Real-Time Data Problems

IBM Power Systems

Containerizing GPU Applications with Docker for Scaling to the Cloud

Subbu Rama

Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...

Chris Fregly

The Potential of GPU-driven High Performance Data Analytics in Spark

Spark Summit

Spark Summit EU talk by Tim Hunter

Spark Summit

Viewers also liked (20)

Deep learning on spark

SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering

GTC 2012: GPU-Accelerated Path Rendering

GPU Ecosystem

Computational Techniques for the Statistical Analysis of Big Data in R

Accelerating Machine Learning Applications on Spark Using GPUs

PG-Strom - GPU Accelerated Asyncr

Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...

Heterogeneous System Architecture Overview

PG-Strom - GPGPU meets PostgreSQL, PGcon2015

PyData Amsterdam - Name Matching at Scale

Hadoop + GPU

Deep Learning on Hadoop

From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...

DeepLearning4J and Spark: Successes and Challenges - François Garillot

How to Solve Real-Time Data Problems

Containerizing GPU Applications with Docker for Scaling to the Cloud

Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...

The Potential of GPU-driven High Performance Data Analytics in Spark

Spark Summit EU talk by Tim Hunter

Similar to GPUs in Big Data - StampedeCon 2014

[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...

Rakuten Group, Inc.

Making AI efficient

Dr Janet Bastiman

TensorFrames: Google Tensorflow on Apache Spark

Databricks

Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket

CanSecWest

Intro to Machine Learning for GPUs

Sri Ambati

Embedded system Design introduction _ Karakola

JohanAspro

In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.

Coding the Continuum

Ian Foster

Petascale Analytics - The World of Big Data Requires Big Analytics

Heiko Joerg Schick

Poster presented at PEARC20. UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another (“beta diversity”). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this poster we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a Cshared library linkable by any programming language.

Porting and optimizing UniFrac for GPUs

Igor Sfiligoi

MaPU-HPCA2016

Shaolin Xie

Nvidia in bioinformatics

Shanker Trivedi

Lrz kurs: big data analysis

Ferdinand Jamitzky

Torsten Hoefler from ETH Zurich presented this deck at the Switzerland HPC Conference. "Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side remote memory access with target notification. To hide instruction pipeline latencies, CUDA programs over-decompose the problem and over-subscribe the device by running many more threads than there are hardware execution units. Whenever a thread stalls, the hardware scheduler immediately proceeds with the execution of another thread ready for execution. This latency-hiding technique is key to make best use of the available hardware resources. With dCUDA, we apply latency hiding at cluster scale to automatically overlap computation and communication. Our benchmarks demonstrate perfect overlap for memory bandwidth-bound tasks and good overlap for compute-bound tasks." Watch the video presentation: http://wp.me/p3RLHQ-gCB

dCUDA: Distributed GPU Computing with Hardware Overlap

inside-BigData.com

The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.

POWER10 innovations for HPC

Ganesan Narayanasamy

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

General Purpose Computing using Graphics Hardware

Daniel Blezek

A Platform for Accelerating Machine Learning Applications

NVIDIA Taiwan

You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.

Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC

MLconf

PyCon Korea 2018 발표자료입니다. https://www.pycon.kr/2018/program/34 게임에서 AI는 빠질 수 없는 기능으로 그동안 다양한 장르와 플랫폼에서 사용되어 왔다. 특히 요즘 모바일 게임에서 자동 플레이 AI는 흔히 '노가다', '피로도'를 줄여주기 위한 매우 중요한 기능으로 자리잡고 있다. 하지만 지금까지의 자동 플레이 AI는 정해진 범위안에서 작동하는 FSM(Finite State Machine) 형태로 구현되다 보니 AI가 동작하는 경우의 수가 유한하고 한정적이라고 볼 수 있다. 때로는 이렇게 정해진 패턴의 AI가 유저들에게는 마치 로보트와 같은 느낌을 주기도 한다. 아무리 State를 추가하고 자연스럽게 구현해보려고 해도 어디까지 자연스럽게 해줘야 할 것인가에 대한 한계에 맞닥들이게 된다. 고려해야 할 경우의 수가 많기 때문이다. 하지만 이렇게 다양한 경우의 수를 로직으로 구현하지 않고 사용자가 플레이했던 데이터를 이용하여 학습시켜보면 어떨까? 이 호기심을 시작으로 LINE에서 자체 개발한 "리틀나이츠" 모바일 게임에 적용해보기로 했다. 게임 런칭 후 실제 사용자 플레이 로그를 수집하여 전처리하고 학습시켜서, 기획 의도에 맞게 유저가 Offline일 때 자신을 대신해서 플레이해 줄 수 있는 AI를 개발하였다. 이를 위해서 유저가 언제 어떤 카드를 선택했고 어디에 배치했는지, When, What, Where 3가지 상황에 대해서 학습시켰고 게임에 적용시켜 보았다. 단계별 과정을 간략하게 살펴보면, 먼저 로그 포멧을 게임 개발팀과 함께 정의했다. 두번째로 유저가 플레이했던 배틀 정보가 사전에 정의했던 로그 포맷 형태로 하둡에 쌓이게 했으며, 세번째로 Apache Spark을 이용하여 저장된 대용량 플레이 로그를 분산으로 전처리하여 데이터를 학습 가능한 형태로 가공하였다. 네번째로 AI 모델을 만들기 위한 뉴럴 네트워크를 설계하고, Python과 TensorFlow를 이용하여 데이터를 학습시켰다. 다섯번째로 학습에 반영되지 않는 순수한 테스트 데이터로 예측률을 구해본다. 이때 최적의 모델을 찾기 위해서 인내를 가지고 테스트하게 되는데, 먼저 하이퍼파라메터를 변경해보고 그래도 성능이 안나오면 뉴럴 네트워크와 데이터 전처리를 다양하게 변경해가며 테스트해 본다. 참고로 이러한 과정에 소비되는 Cost를 줄이고 싶다면 AutoML에 관심을 가져보아도 좋다. 여섯번째로 Python으로 개발된 AI 모델을 C# 기반의 유니티 환경에서 구동시키기 위해서 LineTensorFlow(가칭) 라이브러리를 개발해서 유니티 게임에 적용하였다. 발표 끝부분에서는 학습 지표를 공유하고, 알고리즘 기반의 AI와 딥러닝 기반의 AI에 대해서 플레이 비교 영상을 보고, 어떤 것이 딥러닝을 이용한 AI인지 맞춰보는 시간도 갖아본다. 이 발표에서는 로그 기반의 게임 AI가 개발되는 과정에서 파이썬이 어떻게 활용되었는지 살펴보고, 그 동안 겪었던 문제와 해결 방법에 대해서 공유하고자 한다.

A Development of Log-based Game AI using Deep Learning

Suntae Kim

Similar to GPUs in Big Data - StampedeCon 2014 (20)

[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...

Making AI efficient

TensorFrames: Google Tensorflow on Apache Spark

Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket

Intro to Machine Learning for GPUs

Embedded system Design introduction _ Karakola

Coding the Continuum

Petascale Analytics - The World of Big Data Requires Big Analytics

Porting and optimizing UniFrac for GPUs

MaPU-HPCA2016

Nvidia in bioinformatics

Lrz kurs: big data analysis

dCUDA: Distributed GPU Computing with Hardware Overlap

POWER10 innovations for HPC

Apache Spark and Tensorflow as a Service with Jim Dowling

General Purpose Computing using Graphics Hardware

A Platform for Accelerating Machine Learning Applications

Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC

A Development of Log-based Game AI using Deep Learning

More from StampedeCon

Despite widespread adoption and success most machine learning models remain black boxes. Many times users and practitioners are asked to implicitly trust the results. However understanding the reasons behind predictions is critical in assessing trust, which is fundamental if one is asked to take action based on such models, or even to compare two similar models. In this talk I will (1.) formulate the notion of interpretability of models, (2.) provide a review of various attempts and research initiatives to solve this very important problem and (3.) demonstrate real industry use-cases and results focusing primarily on Deep Neural Networks.

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...

StampedeCon

Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses. In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.

The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017

StampedeCon

In many modern applications data are collected in unusual form. Connectome or brain imaging data are graphs. Wearable devices measuring activity are functions over time. In many cases these objects are collected for each individual or transaction leaving the statistician with the challenge of analyzing populations of data not in classical numeric and categorical formats in big spreadsheets. In this talk I introduce object oriented data analysis with an application we recently developed for regression analysis. This talk will be aimed at the general data scientist and emphasis on the concepts and not mathematical detail. The take home message is how can we use covariates (i.e., meta-data) to predict what the structure of a brain image graph will be.

Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017

StampedeCon

This talk aims to dive into technical details in machine learning model development, implementation and values it bring to Monsanto breeding pipeline. We genotype over 100 million seeds a year in order to save field resources and product development cycle time. Automation and high throughput production from the lab becomes key to R&D success. In house predictive model development incorporated random forest ensemble based approach with additional features derived from gaussian mixture model. The results show over 95% accuracy with less than 1% false positives/negatives. Model is highly generalizable with over 10 million data points being trained and tested on. The model also offers probabilistic approach to present genotypes in a more meaningful way and help enhanced downstream genomics analyses. The talk targets audience who are in breeding, genetics, molecular biology, and data scientists who are interested in practical applications.

Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...

StampedeCon

How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017

StampedeCon

This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.

Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017

StampedeCon

This presentation will cover all aspects of modeling, from preparing data, training and evaluating the results. There will be descriptions of the mainline ML methods including, neural nets, SVM, boosting, bagging, trees, forests, and deep learning. common problems of overfitting and dimensionality will be covered with discussion of modeling best practices. Other topics will include field standardization, encoding categorical variables, feature creation and selection. It will be a soup-to-nuts overview of all the necessary procedures for building state-of-the art predictive models.

Foundations of Machine Learning - StampedeCon AI Summit 2017

StampedeCon

In this session, we’ll discuss approaches for applying convolutional neural networks to novel computer vision problems, even without having millions of images of your own. Pretrained models and generic image data sets from Google, Kaggle, universities, and other places can be leveraged and adapted to solve industry and business specific problems. We’ll discuss the approaches of transfer learning and fine tuning to help anyone get started on using deep learning to get cutting edge results on their computer vision problems.

Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...

StampedeCon

Like the story of the six blind men trying to explain the nature of an elephant, current research in cognitive computational systems attempts to identify the nature of an illness, human behavior, or socio-economical phenomenon, from their own perspective. At present, there is no agreed upon definition for cognitive systems. One large communication corporation defines cognitive systems as a category of technology that uses artificial intelligence, machine learning and reasoning, to enable people and machines to interact more naturally. It also extends and magnifies human expertise and cognition to enable accurate decisions on time. Two of the most famous risk and financial advisory firms agree with that interpretation. A different large corporation, however, considers “cognitive systems” as merely marketing jargon. If cognitive systems are going to help us solve challenging problems in medicine, economics, or other fields, three aspects must be considered in order to reveal the “true nature of the elephant”. § All facets of the problem must be addressed, like the main parts of the elephant had to be touched by the men. § These facets must be properly assembled, like the men needed to join hands around the elephant in order to understand what it was. § This assembly must be completed within sufficient time to anticipate future decisions. Just like the men needed to know what an elephant is before the next one charges them. This talk will explain how agnostic (unsupervised, blinded) machine learning findings can be assembled by multiobjective and multimodal optimization research techniques would be utilized to uncover a multifaceted view of the “elephant”, in this case the human being (e.g., genomic variants, personality traits, brain images). It will also give real-world examples of how this knowledge will “extend the human capabilities” by achieving an integrative assessment of the whole person in relation to their risk, which will allow professionals to generate accurate person-centered policies: from personalized diagnoses, business opportunities, or the prevention of outbreaks.

Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...

StampedeCon

This talk will walk through the important building blocks of Automated AI. Rajiv will highlight the current gaps in the analytics organizations, how to close those gaps using automated AI. Some of the issues discussed around automated AI are the accuracy of models, tradeoffs around control when using automation, interpretability of models, and integration with other tools. These issues will be highlighted with examples of automated analytics in different industries. The talk will end with some examples of how automated AI in the hands of data scientists and business analysts is transforming analytic teams and organizations.

Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017

StampedeCon

Artificial Intelligence has entered a renaissance thanks to rapid progress in domains as diverse as self-driving cars, intelligent assistants, and game play. Underlying this progress is Deep Learning – driven by significant improvements in Graphic Processing Units and computational models inspired by the human brain that excel at capturing structures hidden in massive complex datasets. These techniques have been pioneered at research universities and digital giants but mainstream enterprises are starting to apply them as open source tools and improved hardware become available. Learn how AI is impacting analytics today and in the future. Learn how AI is affecting the enterprise including applications like fraud detection, mobile personalization, predicting failures for IoT and text analysis to improve call center interactions. We look at how practical examples of assessing the opportunity for AI, phased adoption, and lessons going from research, to prototype, to scaled production deployment.

AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017

StampedeCon

A Different Data Science Approach - StampedeCon AI Summit 2017

StampedeCon

Enterprises typically have many data silos of partial customer data and a common theme in big data projects to use big data tools and pipelines to unify all siloed customer data into a single, queryable, platform for improving all future customer interactions. This data often comes from billing, website traffic, logistics, and marketing; all in different formats with different properties. Graph provides a way to unify all of the data into a single place for use in tracking the flow of a user through the various silos. Graph can also be used for visualizations and analytics that are difficult in other systems. In this talk we will explore the ways in which Graph can be leveraged in a customer 360 use case. What it can add to a more conventional system and what the approach to developing a graph based Customer 360 system should be.

Graph in Customer 360 - StampedeCon Big Data Conference 2017

StampedeCon

This talk will go over how to build an end-to-end data processing system in Python, from data ingest, to data analytics, to machine learning, to user presentation. Developments in old and new tools have made this particularly possible today. The talk in particular will talk about Airflow for process workflows, PySpark for data processing, Python data science libraries for machine learning and advanced analytics, and building agile microservices in Python. System architects, software engineers, data scientists, and business leaders can all benefit from attending the talk. They should learn how to build more agile data processing systems and take away some ideas on how their data systems could be simpler and more powerful.

End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017

StampedeCon

Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017

StampedeCon

Using big data isn’t about doing the same things we’ve always done just with different technologies. The technology advances that we’ve chosen to label as big data create the opportunity for wholly new kinds of solutions. Two of the key advances that are enabling new business capabilities are cloud-based data management platforms and streaming data processing and analytics. In this session, Paul Boal will drill into the cloud-based streaming data architecture that has made possible EVŌ, a new breakthrough health and wellness platform. EVŌ uses a game-changing approach that leverages over 60 billion data points and a predictive analytics engine to intervene BEFORE someone becomes critically ill. All of this is possible by leveraging data from smartphones and wearable fitness devices along with advanced analytics which then help users develop and sustain positive behaviors. Attendees will learn how to create a cloud- based architecture that can receive data, apply multiple layers of dynamic business rules, and drive alerts and decisions through real-time stream processing using technologies including web services, Amazon DynamoDB and Kinesis, Drools, and Apache Spark.

Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...

StampedeCon

The collection and use of Big Data has become an important part of modern business practice. The Internet of Things (IoT) movement promises to provide new opportunities for businesses interested in the intersection of people and technology. It is also wrought with pitfalls for practitioners and researchers who struggle to make sense of an increasing cacophony of signals. How should they poll and collect data from millions of signals in a way that is manageable, scalable, and statistically valid? How should they analyze and predict using these data? This presentation will discuss these challenges with applied examples from monitoring and managing one of the world’s largest computers.

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...

StampedeCon

Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.

Innovation in the Data Warehouse - StampedeCon 2016

StampedeCon

Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success. For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include: Data Driven Culture Executive Sponsorship Organizational Structure – Collaboration Hubs and Bi-Modal Analytics Role of Hadoop and Big Data as Part of Data Driven Culture

Creating a Data Driven Organization - StampedeCon 2016

StampedeCon

The Internet of (Human) Things is just beginning to take shape. The human body is an inexhaustible source of data about personal health, and the healthcare industry is just beginning to scratch the surface of the potential insights and value that will come from that data.  While much of healthcare traditionally focuses on the episodic delivery of services, the Affordable Care Act is pushing healthcare providers, payers, and self-funded employer groups to look at ways to proactively encourage healthy behaviors. Providing personal health devices as a way to promote individual health is one way that healthcare is beginning to take advantage of IoT technologies.  This session provides insight into how IoT is being leveraged in population health management through a solution jointly delivered by Amitech Solutions and Big Cloud Analytics.  Attendees will learn how Hadoop is being used to gather personal device from various vendors, integrate and analyze that information, differentiate trends across regional and cultural diversity, and provide personal recommendations and insights into health risks. This session presents one important way the healthcare industry is leveraging IoT.

Using The Internet of Things for Population Health Management - StampedeCon 2016

StampedeCon

More from StampedeCon (20)