SlideShare a Scribd company logo
1 of 54
Download to read offline
Infrastructure for
the work of Data Scientists
Dmitry Spodarets
AI Ukraine / 9.10.2016
Who am I
Dmitry Spodarets
• CEO and Founder at FlyElephant
• Lecturer at Odessa Polytechnic
University
• Organizer of technical
conferences about AI, BigData,
HPC, JS, FOSS …
Agenda
• In production
• Database
• Notebook / IDE
• Software
• Deep Learning Tools
• Programming Languages
• Visualization
• Computing power
• Architecture
• Docker
• Cloud Services
• FlyElephant
Data Science
Data Science: people
EngineerDevOps
BusinessData Scientist
In production
Trained Model Deployed Model
Live Data
Historical Data Result
In production
Trained Model Deployed Model
Live Data
Historical Data Result
Training phase
In production
Trained Model Deployed Model
Live Data
Historical Data Result
ProductionTraining phase
Database
Notebook / IDE
Jupyter Notebook
Jupyter Lab
Jupyter Lab
JupyterLab
• JupyterLab is the natural evolution of the Jupyter Notebook user
interface
• JupyterLab is an IDE: Interactive Development Environment
• Flexible user interface for assembling the fundamental building
blocks of interactive computing
• Modernized JavaScript architecture based on npm/webpack,
plugin system, model/view separation
• Built using PhosphorJS (http://phosphorjs.github.io/)
• Design-driven development process
https://github.com/jupyter/jupyterlab
http://blog.jupyter.org/2016/07/14/jupyter-lab-alpha/
Software
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
Software
New (in this poll) tools that received at least 1% share votes in
2016 were:
• Anaconda, 16%
• Microsoft other ML/Data Science tools, 1.6%
• SAP HANA, 1.2%
• XLMiner, 1.2%
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
Software
• Tools with the highest growth (among tools with at least 15
users in 2015) were
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
(turi)
Deep Learning Tools
• Tensorflow, 6.8%
• Theano ecosystem (including Pylearn2), 5.1%
• Caffe, 2.3%
• MATLAB Deep Learning Toolbox, 2.0%
• Deeplearning4j, 1.7%
• Torch, 1.0%
• Microsoft CNTK, 0.9%
• Cuda-convnet, 0.8%
• mxnet, 0.6%
• Convnet.js, 0.3%
• darch, 0.1%
• Nervana, 0.1%
• Veles, 0.1%
• Other Deep Learning Tools, 3.7%
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
Programming Languages
• R, 49.0 % share (was 46.9), 4.5% increase
• Python, 45.8% share (was 30.3%), 51% increase
• Java, 16.8% share (was 14.1%), 19% increase
• Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase
• C/C++, 7.3% share (was 9.4%), 23% decrease
• Other programming/data languages, 6.8% share (was 5.1%), 34.1%
increase
• Scala, 6.2% share (was 3.5%), 79% increase
• Perl, 2.3% share (was 2.9%), 19% decrease
• Julia, 1.1% share (was 1.1%), 1.6% decrease
• F#, 0.4% share (was 0.7%), 41.8% decrease
• Clojure, 0.4% share (was 0.5%), 19.4% decrease
• Lisp, 0.2% share (was 0.4%), 33.3% decrease
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
Visualization
• Tableau
• Zoomdata
• Qlik
• Plotly
• Matplotlib 2.0 (for Python)
• Ggplot2 (for R)
• D3.js
Computing power
Computing power
NVIDIA	DGX-1	Deep Learning Supercomputer
170/3	TFLOPS	(GPU	FP16	/	CPU	FP32)	
intel xeon phi processor
nvidia tesla p100
~5 TeraFLOPS
~3	TeraFLOPS
Computing power (NVIDIA)
18
ПРОДУКТЫ TESLA ДЛЯ ЛЮБЫХ ЗАДАЧ
СМЕШАННЫЕ
ВЫЧИСЛЕНИЯ
Tesla P100 PCIE
МАСШТАБИРУЕМЫЕ
ВЫЧИСЛЕНИЯ
Tesla P100 SXM2
СУПЕРКОМПЬЮТЕР ДЛЯ
DEEP LEARNING
DGX-1
Полностью интегрированное
решение для DL
Вычислительные центры для
приложений которые хорошо
масштабируются на GPU
Вычислительные центры
имеющие CPU и GPU
приложения
HYPERSCALE
ВЫЧИЛСЕНИЯ
Tesla P4, P40
Создание DL-сервисов,
обработка видео и
изображений, обучение
нейросетей
9
40x Efficient vs CPU, 8x Efficient vs FPGA
0
50
100
150
200
AlexNet
CPU FPGA 1x M4 (FP32) 1x P4 (INT8)
Images/Sec/Watt
Максимальная эффективность для
масштабируемых серверов
P4
# of CUDA Cores 2560
Peak Single Precision 5.5 TeraFLOPS
Peak INT8 22 TOPS
Low Precision
4x 8-bit vector dot product
with 32-bit accumulate
Video Engines 1x decode engine, 2x encode engine
GDDR5 Memory 8 GB @ 192 GB/s
Power 50W & 75 W
AlexNet, batch size = 128, CPU: Intel E5-2690v4 using Intel MKL 2017, FPGA is Arria10-115
1x M4/P4 in node, P4 board power at 56W, P4 GPU power at 36W, M4 board power at 57W, M4 GPU power at 39W, Perf/W chart using GPU power
TESLA P4
Computing power (NVIDIA)
10
TESLA P40
P40
# of CUDA Cores 3840
Peak Single Precision 12 TeraFLOPS
Peak INT8 47 TOPS
Low Precision
4x 8-bit vector dot product
with 32-bit accumulate
Video Engines 1x decode engine, 2x encode engines
GDDR5 Memory 24 GB @ 346 GB/s
Power 250W
0
20 000
40 000
60 000
80 000
100 000
GoogLeNet AlexNet
8x M40 (FP32) 8x P40 (INT8)
Images/Sec
4x Boost in Less than One Year
GoogLeNet, AlexNet, batch size = 128, CPU: Dual Socket Intel E5-2697v4
Максимальная пропускная способность для
масштабируемых серверов
Computing power (NVIDIA)
Computing power (NVIDIA)
Computing power (NVIDIA)
20
P100 ДЛЯ САМОГО БЫСТРОГО ОБУЧЕНИЯ
0,0x
0,5x
1,0x
1,5x
2,0x
2,5x
AlexnetOWT GoogLenet VGG-D Incep v3 ResNet-50
8x K80 8x M40 8x P40 8x P100 PCIE DGX-1
Deepmark test with NVCaffe. AlexnetOWT/GoogLenet use batch 128, VGG-D uses batch 64, Incep-v3/ResNet-50 use batch 32, weak scaling
K80/M40/P100/DGX-1 are measured, P40 is projected, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04
Speedup
Img/sec
7172 2194 578 526 661
FP32 Training
Computing power (NVIDIA)
21
NVLINK: ЛИНЕЙНОЕ МАСШТАБИРОВАНИЕ
1,0x
2,0x
3,0x
4,0x
5,0x
6,0x
7,0x
8,0x
1GPU 2GPU 4GPU 8GPU
AlexnetOWT
DGX-1
P100 PCIE
Deepmark test with NVCaffe. AlexnetOWT use batch 128, Incep-v3/ResNet-50 use batch 32, weak scaling,
P100 and DGX-1 are measured, FP32 training, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04
1,0x
2,0x
3,0x
4,0x
5,0x
6,0x
7,0x
8,0x
1GPU 2GPU 4GPU 8GPU
Incep-v3
DGX-1
P100 PCIE
1,0x
2,0x
3,0x
4,0x
5,0x
6,0x
7,0x
8,0x
1GPU 2GPU 4GPU 8GPU
ResNet-50
DGX-1
P100 PCIE
Коэффициент
ускорения
2.3x
1.3x
1.5x
Computing power (NVIDIA)
NVIDIA Deep Learning SDK
Computing power (NVIDIA)
25
Jetson TX1
JETSON TX1
GPU 1 TFLOP/s 256-core Maxwell
CPU 64-bit ARM A57 CPUs
Memory 4 GB LPDDR4 | 25.6 GB/s
Video decode 4K 60Hz
Video encode 4K 30Hz
CSI Up to 6 cameras | 1400 Mpix/s
Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI
Wifi 802.11 2x2 ac
Networking 1 Gigabit Ethernet
PCIE Gen 2 1x1 + 1x4
Storage 16 GB eMMC, SDIO, SATA
Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs
Computing power (Intel)
Computing power (Intel)
Computing power (Intel)
Computing power (Intel)
• Intel Math Kernel Library (Intel MKL)
Natively supports C, C++ and Fortran Development.
Cross-language compatible with Java, C#, Python and other languages.
• Intel Data Analytics Acceleration Library (Intel DAAL)
Includes Python, C++, and Java APIs and connectors to popular data
sources including Spark and Hadoop.
• Intel MPI Library
Natively supports C,C++ and Fortran development
Architecture
Docker
Cloud Services
Amazon Machine Learning
Azure Machine Learning
Google Machine Learning
Your Home for High Performance Computing
Compute ● Collaborate ● Manage
Solutions for
Data Science Engineering
Simulation
Rendering Academia
FlyElephant Platform
• Computing resources. You can get quick access to
different Clouds or HPC clusters from one place.
• Ready-computing infrastructure. With one click you get
your software calculation infrastructure and the
computing resources that you need.
• Collaboration & Sharing. Create projects, invite
colleagues to join them and share the calculation results
within the projects.
• Fast Deployment. Deploy your computing tasks as APIs
without engineering or DevOps.
• Expert Community. Our partner companies and individual
experts will help solve any of your issues very quickly.
Cloud computing
Tools
Computing on a HPC cluster
Support of Docker
Jupyter Notebooks
Storage
Community
Projects
Roadmap
(November)
• FlyElephant Box (Closed Beta Preview)
• AWS
• OpenStack
• Your Docker images
• MPI cluster
• Hadoop/Spark cluster
www.flyelephant.net
slack.flyelephant.net
We are looking for partners
Dmitry Spodarets
d.spodarets@flyelephant.net
www.flyelephant.net

More Related Content

Viewers also liked

JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewCarol Willing
 
Вебинар: Основы распараллеливания С++ программ при помощи OpenMP
Вебинар: Основы распараллеливания С++ программ при помощи OpenMPВебинар: Основы распараллеливания С++ программ при помощи OpenMP
Вебинар: Основы распараллеливания С++ программ при помощи OpenMPFlyElephant
 
Вебинар: Введение в машинное обучение
Вебинар: Введение в машинное обучениеВебинар: Введение в машинное обучение
Вебинар: Введение в машинное обучениеFlyElephant
 
Dmitry Spodarets_Infrastructure for the work of data scientists
Dmitry Spodarets_Infrastructure for the work of data scientistsDmitry Spodarets_Infrastructure for the work of data scientists
Dmitry Spodarets_Infrastructure for the work of data scientistsFlyElephant
 
GRID-технологии в физическом эксперименте (Введение)
GRID-технологии в физическом эксперименте (Введение)GRID-технологии в физическом эксперименте (Введение)
GRID-технологии в физическом эксперименте (Введение)Dmitry Spodarets
 
Environment for training models
Environment for training modelsEnvironment for training models
Environment for training modelsFlyElephant
 
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...Plotly
 
[Impact Lab] IT инструменты для проекта
[Impact Lab] IT инструменты для проекта[Impact Lab] IT инструменты для проекта
[Impact Lab] IT инструменты для проектаDmitry Spodarets
 
Containers for Science and High-Performance Computing
Containers for Science and High-Performance ComputingContainers for Science and High-Performance Computing
Containers for Science and High-Performance ComputingDmitry Spodarets
 

Viewers also liked (10)

JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" Overview
 
Вебинар: Основы распараллеливания С++ программ при помощи OpenMP
Вебинар: Основы распараллеливания С++ программ при помощи OpenMPВебинар: Основы распараллеливания С++ программ при помощи OpenMP
Вебинар: Основы распараллеливания С++ программ при помощи OpenMP
 
Вебинар: Введение в машинное обучение
Вебинар: Введение в машинное обучениеВебинар: Введение в машинное обучение
Вебинар: Введение в машинное обучение
 
Dmitry Spodarets_Infrastructure for the work of data scientists
Dmitry Spodarets_Infrastructure for the work of data scientistsDmitry Spodarets_Infrastructure for the work of data scientists
Dmitry Spodarets_Infrastructure for the work of data scientists
 
GRID-технологии в физическом эксперименте (Введение)
GRID-технологии в физическом эксперименте (Введение)GRID-технологии в физическом эксперименте (Введение)
GRID-технологии в физическом эксперименте (Введение)
 
Spodarets Pereslavl 2009
Spodarets Pereslavl 2009Spodarets Pereslavl 2009
Spodarets Pereslavl 2009
 
Environment for training models
Environment for training modelsEnvironment for training models
Environment for training models
 
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
 
[Impact Lab] IT инструменты для проекта
[Impact Lab] IT инструменты для проекта[Impact Lab] IT инструменты для проекта
[Impact Lab] IT инструменты для проекта
 
Containers for Science and High-Performance Computing
Containers for Science and High-Performance ComputingContainers for Science and High-Performance Computing
Containers for Science and High-Performance Computing
 

Similar to Infrastructure for the work of Data Scientists

Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Cheer Chain Enterprise Co., Ltd.
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Haidee McMahon
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneDatabricks
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMZainal Abidin
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIndrajit Poddar
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLMohammad Sabouri
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureTalbott Crowell
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsKeiichiro Ono
 
From leading IoT Protocols to Python Dashboarding_final
From leading IoT Protocols to Python Dashboarding_finalFrom leading IoT Protocols to Python Dashboarding_final
From leading IoT Protocols to Python Dashboarding_finalLukas Ott
 

Similar to Infrastructure for the work of Data Scientists (20)

Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVM
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRL
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore Future
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
From leading IoT Protocols to Python Dashboarding_final
From leading IoT Protocols to Python Dashboarding_finalFrom leading IoT Protocols to Python Dashboarding_final
From leading IoT Protocols to Python Dashboarding_final
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Infrastructure for the work of Data Scientists

  • 1. Infrastructure for the work of Data Scientists Dmitry Spodarets AI Ukraine / 9.10.2016
  • 2. Who am I Dmitry Spodarets • CEO and Founder at FlyElephant • Lecturer at Odessa Polytechnic University • Organizer of technical conferences about AI, BigData, HPC, JS, FOSS …
  • 3. Agenda • In production • Database • Notebook / IDE • Software • Deep Learning Tools • Programming Languages • Visualization • Computing power • Architecture • Docker • Cloud Services • FlyElephant
  • 6. In production Trained Model Deployed Model Live Data Historical Data Result
  • 7. In production Trained Model Deployed Model Live Data Historical Data Result Training phase
  • 8. In production Trained Model Deployed Model Live Data Historical Data Result ProductionTraining phase
  • 13. Jupyter Lab JupyterLab • JupyterLab is the natural evolution of the Jupyter Notebook user interface • JupyterLab is an IDE: Interactive Development Environment • Flexible user interface for assembling the fundamental building blocks of interactive computing • Modernized JavaScript architecture based on npm/webpack, plugin system, model/view separation • Built using PhosphorJS (http://phosphorjs.github.io/) • Design-driven development process https://github.com/jupyter/jupyterlab http://blog.jupyter.org/2016/07/14/jupyter-lab-alpha/
  • 15. Software New (in this poll) tools that received at least 1% share votes in 2016 were: • Anaconda, 16% • Microsoft other ML/Data Science tools, 1.6% • SAP HANA, 1.2% • XLMiner, 1.2% http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
  • 16. Software • Tools with the highest growth (among tools with at least 15 users in 2015) were http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html (turi)
  • 17. Deep Learning Tools • Tensorflow, 6.8% • Theano ecosystem (including Pylearn2), 5.1% • Caffe, 2.3% • MATLAB Deep Learning Toolbox, 2.0% • Deeplearning4j, 1.7% • Torch, 1.0% • Microsoft CNTK, 0.9% • Cuda-convnet, 0.8% • mxnet, 0.6% • Convnet.js, 0.3% • darch, 0.1% • Nervana, 0.1% • Veles, 0.1% • Other Deep Learning Tools, 3.7% http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
  • 18. Programming Languages • R, 49.0 % share (was 46.9), 4.5% increase • Python, 45.8% share (was 30.3%), 51% increase • Java, 16.8% share (was 14.1%), 19% increase • Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase • C/C++, 7.3% share (was 9.4%), 23% decrease • Other programming/data languages, 6.8% share (was 5.1%), 34.1% increase • Scala, 6.2% share (was 3.5%), 79% increase • Perl, 2.3% share (was 2.9%), 19% decrease • Julia, 1.1% share (was 1.1%), 1.6% decrease • F#, 0.4% share (was 0.7%), 41.8% decrease • Clojure, 0.4% share (was 0.5%), 19.4% decrease • Lisp, 0.2% share (was 0.4%), 33.3% decrease http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
  • 19. Visualization • Tableau • Zoomdata • Qlik • Plotly • Matplotlib 2.0 (for Python) • Ggplot2 (for R) • D3.js
  • 21. Computing power NVIDIA DGX-1 Deep Learning Supercomputer 170/3 TFLOPS (GPU FP16 / CPU FP32) intel xeon phi processor nvidia tesla p100 ~5 TeraFLOPS ~3 TeraFLOPS
  • 22. Computing power (NVIDIA) 18 ПРОДУКТЫ TESLA ДЛЯ ЛЮБЫХ ЗАДАЧ СМЕШАННЫЕ ВЫЧИСЛЕНИЯ Tesla P100 PCIE МАСШТАБИРУЕМЫЕ ВЫЧИСЛЕНИЯ Tesla P100 SXM2 СУПЕРКОМПЬЮТЕР ДЛЯ DEEP LEARNING DGX-1 Полностью интегрированное решение для DL Вычислительные центры для приложений которые хорошо масштабируются на GPU Вычислительные центры имеющие CPU и GPU приложения HYPERSCALE ВЫЧИЛСЕНИЯ Tesla P4, P40 Создание DL-сервисов, обработка видео и изображений, обучение нейросетей
  • 23. 9 40x Efficient vs CPU, 8x Efficient vs FPGA 0 50 100 150 200 AlexNet CPU FPGA 1x M4 (FP32) 1x P4 (INT8) Images/Sec/Watt Максимальная эффективность для масштабируемых серверов P4 # of CUDA Cores 2560 Peak Single Precision 5.5 TeraFLOPS Peak INT8 22 TOPS Low Precision 4x 8-bit vector dot product with 32-bit accumulate Video Engines 1x decode engine, 2x encode engine GDDR5 Memory 8 GB @ 192 GB/s Power 50W & 75 W AlexNet, batch size = 128, CPU: Intel E5-2690v4 using Intel MKL 2017, FPGA is Arria10-115 1x M4/P4 in node, P4 board power at 56W, P4 GPU power at 36W, M4 board power at 57W, M4 GPU power at 39W, Perf/W chart using GPU power TESLA P4 Computing power (NVIDIA)
  • 24. 10 TESLA P40 P40 # of CUDA Cores 3840 Peak Single Precision 12 TeraFLOPS Peak INT8 47 TOPS Low Precision 4x 8-bit vector dot product with 32-bit accumulate Video Engines 1x decode engine, 2x encode engines GDDR5 Memory 24 GB @ 346 GB/s Power 250W 0 20 000 40 000 60 000 80 000 100 000 GoogLeNet AlexNet 8x M40 (FP32) 8x P40 (INT8) Images/Sec 4x Boost in Less than One Year GoogLeNet, AlexNet, batch size = 128, CPU: Dual Socket Intel E5-2697v4 Максимальная пропускная способность для масштабируемых серверов Computing power (NVIDIA)
  • 27. 20 P100 ДЛЯ САМОГО БЫСТРОГО ОБУЧЕНИЯ 0,0x 0,5x 1,0x 1,5x 2,0x 2,5x AlexnetOWT GoogLenet VGG-D Incep v3 ResNet-50 8x K80 8x M40 8x P40 8x P100 PCIE DGX-1 Deepmark test with NVCaffe. AlexnetOWT/GoogLenet use batch 128, VGG-D uses batch 64, Incep-v3/ResNet-50 use batch 32, weak scaling K80/M40/P100/DGX-1 are measured, P40 is projected, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04 Speedup Img/sec 7172 2194 578 526 661 FP32 Training Computing power (NVIDIA)
  • 28. 21 NVLINK: ЛИНЕЙНОЕ МАСШТАБИРОВАНИЕ 1,0x 2,0x 3,0x 4,0x 5,0x 6,0x 7,0x 8,0x 1GPU 2GPU 4GPU 8GPU AlexnetOWT DGX-1 P100 PCIE Deepmark test with NVCaffe. AlexnetOWT use batch 128, Incep-v3/ResNet-50 use batch 32, weak scaling, P100 and DGX-1 are measured, FP32 training, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04 1,0x 2,0x 3,0x 4,0x 5,0x 6,0x 7,0x 8,0x 1GPU 2GPU 4GPU 8GPU Incep-v3 DGX-1 P100 PCIE 1,0x 2,0x 3,0x 4,0x 5,0x 6,0x 7,0x 8,0x 1GPU 2GPU 4GPU 8GPU ResNet-50 DGX-1 P100 PCIE Коэффициент ускорения 2.3x 1.3x 1.5x Computing power (NVIDIA)
  • 30. Computing power (NVIDIA) 25 Jetson TX1 JETSON TX1 GPU 1 TFLOP/s 256-core Maxwell CPU 64-bit ARM A57 CPUs Memory 4 GB LPDDR4 | 25.6 GB/s Video decode 4K 60Hz Video encode 4K 30Hz CSI Up to 6 cameras | 1400 Mpix/s Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI Wifi 802.11 2x2 ac Networking 1 Gigabit Ethernet PCIE Gen 2 1x1 + 1x4 Storage 16 GB eMMC, SDIO, SATA Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs
  • 34. Computing power (Intel) • Intel Math Kernel Library (Intel MKL) Natively supports C, C++ and Fortran Development. Cross-language compatible with Java, C#, Python and other languages. • Intel Data Analytics Acceleration Library (Intel DAAL) Includes Python, C++, and Java APIs and connectors to popular data sources including Spark and Hadoop. • Intel MPI Library Natively supports C,C++ and Fortran development
  • 37. Cloud Services Amazon Machine Learning Azure Machine Learning Google Machine Learning
  • 38. Your Home for High Performance Computing Compute ● Collaborate ● Manage
  • 39. Solutions for Data Science Engineering Simulation Rendering Academia
  • 40. FlyElephant Platform • Computing resources. You can get quick access to different Clouds or HPC clusters from one place. • Ready-computing infrastructure. With one click you get your software calculation infrastructure and the computing resources that you need. • Collaboration & Sharing. Create projects, invite colleagues to join them and share the calculation results within the projects. • Fast Deployment. Deploy your computing tasks as APIs without engineering or DevOps. • Expert Community. Our partner companies and individual experts will help solve any of your issues very quickly.
  • 42. Tools
  • 43. Computing on a HPC cluster
  • 49.
  • 50. Roadmap (November) • FlyElephant Box (Closed Beta Preview) • AWS • OpenStack • Your Docker images • MPI cluster • Hadoop/Spark cluster
  • 53. We are looking for partners