AIDC India - AI on IA

Intelarchitecturefor
ArtificialIntelligence
Austin Cherian
Head - High Performance Computing Business, India
austin.cherian@intel.com

hardware
Multi-purpose to purpose-built
AI compute from device to cloud
solutions
Partner ecosystem to facilitate AI in
finance, health, retail, industrial & more
Intel analytics
ecosystem to
get your data
ready
Data
Driving AI forward
through R&D,
investments &
policy
Future
tools
Software to accelerate development &
deployment of real solutions
Bring Your AI Vision to Life Using Intel’s Comprehensive Portfolio
#IntelAIDC2019 | #AIonIntel | #IntelAI

Data-centricinfrastructure
Move Faster Process EverythingStore More
INTEL® SILICON PHOTONICS CPU
AI ACCELERATORSINTEL® ETHERNET
INTEL® OMNI-PATH FABRIC
GPU
(Integrated &
Discrete)
FPGA, GPU
Powering the Future of Compute & Communications

HARDWARE Multi-purpose to purpose-built
AI compute from cloud to device
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Deep
Learning
Training
Inference
AI
Mainstream intensive
Most
other

Large-scale data centers such as public
cloud or comms service providers, gov’t
& academia, large enterprise IT
User-touch end point devices with lower power
requirements such as laptops, tablets, smart home
devices, drones
Small-scale data centers, small business
IT infrastructure, to few on-premise
server racks & workstations
Varies to <1ms <5ms <10-40ms ~100ms
DatacenterEdgeEndpoint

IoT SENSORS
(Security, home, retail, industrial…)
Display, Video, AR/VR, Gestures, Speech
DESKTOP & MOBILITY
Vision &
Inference Speech
SELF-DRIVING VEHICLES
Autonomous
Driving
SERVERS, APPLIANCES & GATEWAYS
Latency-
Bound
Inference
Basic Inference,
Media & Vision
Most Use Cases
SERVERS & APPLIANCES
DatacenterEdgeEndpoint
Flexible & Memory
Bandwidth-Bound
Use Cases
Varies to <1ms <5ms <10-40ms ~100ms
Dedicated
Media & Vision
Inference
Most Use Cases
Most Intensive
Use Cases
NNP-L
M.2 CardSOC
Special Purpose Special Purpose
1GNA=Gaussian Neural Accelerator
All products, computer systems, dates, and figures are preliminary based on current expectations, and are
subject to change without notice. Images are examples of intended applications but not an exhaustive list.
Onesizedoesnotfitall

Intel®Xeon®ScalableProcessorFamily
Now build the AI you want on the CPU you know
yourFOUNDATION
forAI
Getmaximumutilization
running data center & AI workloads side-by-side
Breakmemorybarriers
in order to apply AI to large data sets & models
Trainmodelsatscale
through efficient scaling to many nodes
Accessoptimizedtools
including continuous performance gains for TensorFlow, MXNet,
& more
Runinthecloud
including AWS, Microsoft, Alibaba, TenCent, Google, Baidu, & more
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel
measured as of November 2016. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

Intel®Xeon®ScalableProcessorforAI
ArtificialIntelligenceWithIntel®Xeon®ScalableProcessors
Deep Learning INFERENCE & Deep Learning TRAINING
Generational
performance
improvements
Continuous
software
optimizations
Lower
precision
integerops
Scaling
efficiency

Upto65%PerformanceBoostwithIntel®AVX-512
onIntel®Xeon®Platinum8180processor
1
1.37
1
1.65
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Intel® AVX-512 OFF
Caffe GoogLeNet v1
Intel® AVX-512 ON
Caffe GoogLeNet v1
Intel® AVX-512 OFF
Caffe AlexNet
Intel® AVX-512 ON
Caffe AlexNet
Convolution layer performance on Intel® Xeon® Platinum 8180 Processor
Test results above quantify the value add of Intel® AVX-512 to Convolution layer performance. All results shown above are measured
on Intel® Xeon® Platinum 8180 Processor running AI topologies on Caffe framework with and without Intel® AVX-512 enabled
Convolutionlayerperformance
(Measuredinmilliseconds)
representedrelativetoabaseline1.0
Higherisbetter
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as
"Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.
Batch Sizes AlexNet:256 GoogleNet-V1: 96 Configuration Details on Slide: 24
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations
include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured
by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Generational
performance
improvements
Enhanced compute performance with Intel® AVX-512 on Intel® Xeon® Scalable Processor

INTRODUCING2nd GENERATION
INTEL® XEON® SCALABLEPROCESSORS
LEADERSHIP WORKLOAD
PERFORMANCE
GROUNDBREAKING
MEMORY INNOVATION
EMBEDDED
ARTIFICIAL INTELLIGENCE
ACCELERATION
ENHANCED
AGILITY & UTILIZATION
HARDWARE ENHANCED
SECURITY
BUILT-IN
VALUE
UNINTERRUPTED

Intel®DeepLearningBoost(DLBoost)featuring Vector Neural Network Instructions (VNNI)
INT8 07 06 05 04 03 02 01 00
Sign Mantissa
NEW
vpdpbusd OUTPUT
INT32
CONSTANT
INT32
INPUT
INT8
INPUT
INT8
AVX-512 (VNNI) instruction to accelerate INT8 convolutions: vpdpbusd
INPUT
INT8
INPUT
INT8
vpmaddubsw
vpmaddwd
vpaddd
OUTPUT
INT16 OUTPUT
INT32
CONSTANT
INT16 CONSTANT
INT32
OUTPUT
INT32
Current AVX-512 instructions to perform INT8 convolutions: vpmaddubsw, vpmaddwd, vpaddd

IncreasingAIperformanceonIntel®Xeon®PROCESSORS
Intel® Optimizations for Caffe ResNet-50
Inference Throughput Performance
Intel® DL Boost Theoretical Throughput per core over
1st Generation Intel® Xeon® Scalable Processors
BASE
SKX launch
July 2017
1st Generation Intel® Xeon®
Scalable Processor
2S Intel® Xeon®
Platinum 8280
processor
(28 cores/S)
2S Intel®Xeon®
Platinum
9282 processor
(56 cores/S)
vs. BASE vs. BASE
2S Intel® Xeon®
Platinum 8180
processor
(28 cores/S)
14x1 30x15.7x1
1 Based on Intel internal testing: 1X,5.7x,14x and 30x performance improvement based on Intel® Optimization for Café ResNet-50 inference throughput performance on Intel® Xeon® Scalable Processor. See Configuration Details slide 22
Performance results are based on testing as of 7/11/2017(1x) ,11/8/2018 (5.7x), 2/20/2019 (14x) and 2/26/2019 (30x) and may not reflect all publically available security updates. No product can be absolutely secure. See configuration slide 22
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel
microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance
2nd Generation Intel® Xeon®
Scalable Processor
1st Gen
Xeon-SP
Int8
1s Gen
Xeon-SP
FP32
Upto
1.3x
2nd Gen
Xeon-SP
Int8 w/ Intel®
DL Boost
3 Instructions
VPMADDUBSW
VPMADDWD
VPADDD
1st Gen
Xeon-SP
Int8
Upto
3x
1 Instruction
VPDPBUSD
Faster throughput, but inefficient
Uses 3 instructions per operation
DL Boost fixes this, combines 3 instructions into 1
vs. BASE

Intel® Nervana™neuralnetworkprocessors(NNP)¥
NNP-L
NNP-I Highly-efficient multi-model
inferencing for cloud, data
center & intense appliances
Fastest time-to-train with
high bandwidth AI server
connections for the most
persistent, intense usage
DEDICATED
DL TRAINING
DEDICATED
DL INFERENCE
‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today

INTEL® FPGAPRODUCTPORTFOLIO

Intel® Movidius™ Visionprocessingunit(vPU)
Power-Efficient Image Processing, Computer Vision & Deep Learning for Devices
SURVEILLANCe
Detection & Classification •
Identification •
Multi-Nodal Systems •
Multi-Modal Sensing •
Video, Image Capture •
SERVICEROBOTS
Navigation •
3D Vol. Mapping •
Multi-Modal Sensing •
WEARABLES
Detection, Tracking •
Recognition •
Video, Image, Session Capture •
DRONES
• Sense & Avoid
• GPS Denied Hovering
• Pixel Labeling
• Video, Image Capture
SMARTHOME
• Detection, Tracking
• Perimeter, Presence Monitoring
• Recognition, Classification
• Multi-Nodal Systems
• Multi-Modal Sensing
• Video, Image Capture
AR-VRHMD
• 6DOF Pose, Position, Mapping
• Gaze, Eye Tracking
• Gesture Tracking, Recognition
• See-Through Camera

Intelintegratedprocessorgraphics
Built-in Deep Learning Inference Acceleration
• Shipped in > 1billion Intel SOCs
• Broad choice of performance/power offering across
Intel® Atom™ , Intel®
Core™ and Intel® Xeon™ processors
Ubiquity/Scalability
MediaLeadership
• Intel® Quick Synch Video – fixed function media
blocks to improve power and performance
• Intel® Media SDK - API that provides access to
hardware-accelerated codecs
Powerful&FlexibleArchitecture
• Rich data type support for 32bitFP, 16bitFP,
32bitInteger, 16bitInteger with
SIMD multiply-accumulate instructions
MemoryArchitecture
• Shared memory architecture on die between CPU
and GPU to enable lower latency and power
Hardwareintegration Softwaresupport
MacOS (CoreML and MPS1)
Windows O/S (WinML)
OpenVINO™ Toolkit (Win, Linux)
clDNN

Intel®Gaussianneuralaccelerator(GNA)
https://software.intel.com/en-
us/iot/speech-enabling-dev-kit
TryitTODAY!
Intel® Speech Enabling
Developer Kit
Learn more: https://sigport.org/sites/default/files/docs/PosterFinal.pdf
Amplethroughput
For speech, language, and
other sensing inference
Lowpower
<100 mW power consumption
for always-on applications
Flexibility
Gaussian mixture model (GMM) and
neural network inference support
Intel®
GNA
(IP)
DSP
Streaming Co-Processor for Low-Power Audio Inference & More

less often →
AccessDistribution
Data Access Frequency
Cooler data
 more often
Hot data
DRAM
HOT TIER
SSD
WARM TIER
Intel®3DNandSSD
Optimize performancegiven
cost and power budget
LLC
Core
CPU
L2
L1
pico-secs
nano-secs
Memory Sub-System 10s
GB
<100nanosecs
Network
Storage
SSD
10s
TB
<100millisecs
10s
TB
<100microsecs
Compute
100s GB
Move Data Closer to
<1microsec
1s TB
Maintain Persistenc<
y10microsecs
Goal:EfficientData-CentricArchitecture
HDD / TAPE
COLD TIER

Thebestofbothworldswith
Intel®Optane™DCPersistentMemory
Performance comparable
to DRAM at low latencies1
Data persistence with
higher capacity than DRAM2
Memoryattributes
1“Performance comparable to DRAM” - Intel persistent memory is expected to perform at latencies near DDR4 DRAM. “low latencies” - Data transferred across the memory bus causes latencies to be orders of magnitude lower when compared to
transferring data across PCIe or I/O bus’ to NAND/Hard Disk. 2Intel persistent memory offers 3 different capacities – 128GB, 256GB, 512GB. Individual DIMMs of DDR4 DRAM max out at 256GB. Performance results are based on testing as of February
22, 2019 and may not reflect all publicly available security updates. See slide 24 for details. No product or component can be absolutely secure.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that
product when combined with other products. For more information go to www.intel.com/benchmarks.
Storageattributes

Connectivity
High-speed connectivity for massively parallel & distributed AI
Intel®SiliconPhotonics
Connects memory and compute,
integrating connectivity
technologies onto a single die for
affordable, scalable solutions
Comingsoon
SmartNIC(CascadeGlacier)
Enables optimized performance for
Intel® Xeon® processor-based
systems
Intel®Omni-PathArchitecture
Provides low-latency interconnect to
scale to hundreds of thousands of
nodes without losing performance
or reliability

Intel®Omni-PathArchitectureEvolutionaryApproach,RevolutionaryFeatures,End-to-EndSolution
HFI Adapters
Single port
x8 and x16
x8 Adapter
(58 Gb/s)
x16
Adapter
(100 Gb/s)
Edge Switches
1U Form Factor
24 and 48 port
24-port
Edge Switch
48-port
Edge Switch
Director Switches
QSFP-based
288 and 1,152 port
288-port
Director Switch
(7U chassis)
48-port Leaves
1,152-port
Director
Switch
(20U chassis)
48-port
Leaves
Cables
Third Party Vendors
Passive Copper
Active Optical
Silicon
OEM custom designs
HFI and Switch ASICs
Switch silicon
up to 48 ports
(1200 GB/s
total b/w
HFI silicon
Up to 2 ports
(50 GB/s
total b/w)
“-F”
Processors
w/integrated
HFI
Software
Open Source
Host Software and
Fabric Manager

Edge
Device
ARTIFICIALINTELLIGENCE
Platforms Finance Healthcare Energy Industrial Transport Retail Home More…
Data Center
TOOLKITSApp
Developers
librariesData
Scientists
foundationLibrary
Developers
*
*
*
*
FOR
* * * *
HardwareIT System
Architects
SolutionsSolution
Architects
AI Solutions Catalog
(Public & Internal)
DEEPLEARNINGACCELERATORS
DEEPLEARNINGDEPLOYMENT
OpenVINO™ † Intel® Movidius™ SDK
Open Visual Inference & Neural Network Optimization
toolkit for inference deployment on CPU, processor
graphics, FPGA & VPU using TF, Caffe* & MXNet*
Optimized inference deployment
for all Intel® Movidius™ VPUs using
TensorFlow* & Caffe*
DEEPLEARNINGFRAMEWORKS
Now optimized for CPU Optimizations in progress
TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle*
DEEPLEARNING
Intel® Deep
Learning Studio‡
Open-source tool to compress
deep learning development cycle
MACHINELEARNINGLIBRARIES
Python R Distributed
•Scikit-
learn
•Pandas
•NumPy
•Cart
•Random
Forest
•e1071
•MlLib (on Spark)
•Mahout
ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES
Python DAAL MKL-DNN clDNN
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(for machine learning)
Open-source deep neural
network functions for
CPU, processor graphics
DEEPLEARNINGGRAPHCOMPILER
Intel® nGraph™ Compiler (Alpha)
Open-sourced compiler for deep learning model
computations optimized for multiple devices (CPU, GPU,
NNP) using multiple frameworks (TF, MXNet, ONNX)
AIFOUNDATION
A
R
T
I
F
I
C
I
A
l
I
N
T
E
L
L
I
G
E
n
C
e
NNP L-1000
* * * *
† Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
Ai.intel.com
Inference

TransformaIwithSoftware
Akanksha Balani
Country Lead - Intel Software Tools – IAGS, Intel

Intersectionofdata&computegrowth
4TBAutonomousvehicle
5TBCONNECTEDAIRPLANE
1PBSmartFactory
1.5GBAverage internetuser
750pBCloudvideoProvider
DailyBy2020
Source: Amalgamation of analyst data and Intel analysis.
Business
Insights
Operational
Insights
Security
Insights

Consumer Health Finance Retail Government Energy Transport Industrial Other
Smart Assistants
Chatbots
Search
Personalization
Augmented
Reality
Robots
Enhanced
Diagnostics
Drug
Discovery
Patient Care
Research
Sensory
Aids
Algorithmic
Trading
Fraud Detection
Research
Personal Finance
Risk Mitigation
Support
Experience
Marketing
Merchandising
Loyalty
Supply Chain
Security
Defense
Data
Insights
Safety & Security
Resident
Engagement
Smarter
Cities
Oil & Gas
Exploration
Smart
Grid
Operational
Improvement
Conservation
Autonomous
Cars
Automated
Trucking
Aerospace
Shipping
Search & Rescue
Factory
Automation
Predictive
Maintenance
Precision
Agriculture
Field Automation
Advertising
Education
Gaming
Professional & IT
Services
Telco/Media
Sports
Source: Intel forecast
Aiwilltransform

Intel®AITools
DELIVERING ROBUST TOOLSETS & POWERFUL RESOURCES.
ACCELERATING INNOVATIVE AI SOLUTIONS.

Edge
Device
ARTIFICIALINTELLIGENCE
Platforms Finance Healthcare Energy Industrial Transport Retail Home More…
Data Center
TOOLKITSApp
Developers
librariesData
Scientists
foundationLibrary
Developers
*
*
*
*
FOR
* * * *
HardwareIT System
Architects
SolutionsSolution
Architects
AI Solutions Catalog
(Public & Internal)
DEEPLEARNINGACCELERATORS
DEEPLEARNINGDEPLOYMENT
OpenVINO™ † Intel® Movidius™ SDK
Open Visual Inference & Neural Network Optimization
toolkit for inference deployment on CPU, processor
graphics, FPGA & VPU using TF, Caffe* & MXNet*
Optimized inference deployment
for all Intel® Movidius™ VPUs using
DEEPLEARNINGFRAMEWORKS
Now optimized for CPU Optimizations in progress
TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle*
DEEPLEARNING
Intel® Deep
Learning Studio‡
Open-source tool to compress
deep learning development cycle
MACHINELEARNINGLIBRARIES
Python R Distributed
•Scikit-
learn
•Pandas
•NumPy
•Cart
•Random
Forest
•e1071
•MlLib (on Spark)
•Mahout
ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES
Python DAAL MKL-DNN clDNN
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(for machine learning)
Open-source deep neural
network functions for
CPU, processor graphics
DEEPLEARNINGGRAPHCOMPILER
Intel® nGraph™ Compiler (Alpha)
Open-sourced compiler for deep learning model
computations optimized for multiple devices (CPU, GPU,
NNP) using multiple frameworks (TF, MXNet, ONNX)
AIFOUNDATION
AI
NNP L-1000
* * * *
† Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
Ai.intel.com
Inference

Intel®Xeon®processorsNow Optimized For Deep Learning
INFERENCE THROUGHPUT
Intel® Xeon® Platinum 8180 Processor
higher Intel optimized Caffe GoogleNet v1 with Intel®
MKL inference throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
TRAINING THROUGHPUT
Intel® Xeon® Platinum 8180 Processor
higher Intel Optimized Caffe AlexNet with Intel® MKL
training throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
UP TO
241x
1
UP TO
277x
1 Optimized
Frameworks
Optimized Intel®
MKL Libraries
Inference and training throughput uses FP32 instructions
Deliver significant AI performance with hardware & software optimizations on Intel® Xeon® Scalable family
1 The benchmark results may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and workloads utilized in the testing, and may not be applicable to any particular user's components, computer system or workloads. The results are not necessarily representative of other
benchmarks and other benchmark results may show greater or lesser impact from mitigations. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit: http://www.intel.com/performance.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit:
http://www.intel.com/performance Source: Intel measured as of June 2018. Configurations: See slide 4.

Intelsoftware–ExtractPerformance
Build highly
optimized
media
infrastructure,
solutions, &
applications
Fast, Dense, High
Quality Transcoding
Improve performance,
scalability, & reliability
for applications and
frameworks -Computing
and ML/DL
Technical & Enterprise compute, HPC, AI
Take advantage of
deep system-wide
insight & analysis
for system &
embedded apps
Manuf., Retail,
Drones, Robots…
Smart Cities, Auto. Driving, Gaming…
Create solutions using
Computer Vision –
OpenVino Toolkit,
Deep Learning,
Graphics, Libraries,
Media, OpenCL™,
& more
Optimization Tools , SDKs
Edge to Data Center to Cloud
Intel® Distribution of Python
Intel® DAAL
AI&IoT AI,HPC,Enterprise

Intel®ParallelStudioXE
POWER THROUGH PERFORMANCE BOTTLENECKS.
CODE NEW BREAKTHROUGHS FOR AI.

34
AISoftwareOptimization
Intel® Parallel Studio XE
Up to 35X faster application
performance
**Intel® Xeon Phi™ Processor Software Ecosystem Momentum Guide
Performance results are based tests from 2016-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations & functions. Any change to any of those factors may cause the results to vary. You should consult other information & performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/performance. See configurations in individual case study links.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3
instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product
User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. For more complete information about compiler optimizations, see
our Optimization Notice.
NERSC (National Energy Research
Scientific Computing Center)
Read case study
Science&research
For more success stories, review Intel® Parallel Studio XE Case Studies
Artificialintelligence
Performance speedup of
up to 23X faster with Intel
optimized scikit-learn vs. stock
scikit-learn
Google Cloud Platform
Read blog
LifeScience
Simulations ran up to 7.6X
faster with 9X
energy efficiency**
LAMMPS code - Sandia National
Laboratories
Read technology brief

Artificial Intelligence
Energy
EDA
Science & Research
Manufacturing
Government
Computer Software
IT
Healthcare
Digital Media
Telecommunications
35
Intel®ParallelStudioXEforAI:HighPerformance,
ScalableSoftwareacrossMultipleIndustries
4X 8X 1.35X
Kyoto University
the Walker Molecular
Dynamics lab
3X
1.4X 4X
10X
11X
25X
2.5X 1.25X 1.3X
5X 2X
20X
2.5X
Performance results are based tests from ~2015-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more
complete information about performance and benchmark results, visit www.intel.com/benchmark. See configurations in Intel® Parallel Studio XE Case Studies deck, & individual case studies links at this site
More Success Stories
▪ Intel® Parallel Studio XE
Case Studies deck
▪ Case studies site
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets & other optimizations. Intel does not guarantee the availability, functionality,
or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer
to the applicable product User & Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
Google Cloud Platform 23X

Intel®DistributionforPython*
SUPERCHARGE PYTHON APPS.
RETHINK HIGH PERFORMANCE FOR AI.

3
Python*Landscape
Challenge#1
Domain experts are not
professional software programmers
Adoption of Python
continues to grow among
domain experts &
developers for its
productivity benefits
MostPopularCodingLanguagesof2018
Challenge#2
Python performance limits migration
to production systems
Intel’s Python Tools
› Accelerate Python performance
› Enable easy access
› Empower the community

38
1Available only in Intel® Parallel Studio Composer Edition.
EcosystemcompatibilityGreaterProductivityFasterPerformance
Supports Python 2.7 & 3.6, Conda & PIP
Operating System: Windows*, Linux*, MacOS1*
Intel® Architecture Platforms
Performance Libraries, Parallelism,
Multithreading, Language Extensions
› Accelerated NumPy/SciPy/scikit-learn
with Intel® MKL1 & Intel® DAAL2
› Data analytics, machine learning & deep
learning with scikit-learn, pyDAAL,
› Scale with Numba* & Cython*
› Includes optimized mpi4py, works with
Dask* & PySpark*
› Optimized for latest Intel® architecture
› Prebuilt & optimized packages for
numerical computing, machine/deep
learning, HPC, & data analytics
› Drop in replacement for existing Python-
No code changes required
› Jupyter* notebooks, Matplotlib included
› Free download & free for all uses
including commercial deployment
› Supports Python 2.7 & 3.6, optimizations
integrated in Anaconda* Distribution
› Distribution & optimized packages available
via Conda, PIP, APT GET, YUM, & DockerHub,
numerical performance optimizations
integrated in Anaconda Distribution
› Optimizations upstreamed to main Python
trunk
› Priority Support with Intel® Parallel Studio XE
1Intel® Math Kernel Library
2Intel® Data Analytics Acceleration Library
Prebuilt & Accelerated Packages
AcceleratePython*withIntel®DistributionforPython*High Performance Python* for Scientific Computing, Data Analytics, Machine & Deep Learning
Learn More: software.intel.com/distribution-for-python
Operating System: Windows*, Linux*, MacOS1*
Intel® Architecture Platforms

Intel®PerformanceLibraries
POWERFUL & AWARD-WINNING PERFORMANCE LIBRARIES
TO OPTIMIZE CODE & ACCELERATE DEVELOPMENT.
1Data from Evans Data Software Developer surveys, 2011-2016

Fast,ScalableCodewithIntel®MathKernelLibrary
(Intel® MKL)
40
› Speeds computations for scientific, engineering, financial and
machine learning applications by providing highly optimized,
threaded, and vectorized math functions
› Provides key functionality for dense and sparse linear algebra
(BLAS, LAPACK, PARDISO), FFTs, vector math, summary
statistics, deep learning, splines and more
› Dispatches optimized code for each processor automatically
without the need to branch code
› Optimized for single core vectorization and cache utilization
› Automatic parallelism for multi-core and many-core
› Scales from core to clusters
› Available at no cost & royalty free
› Great performance with minimal effort!
1 Available only in Intel® Parallel Studio Composer Edition.
Dense&SPARSELinearAlgebra
FastFourierTransforms
VectorMath
VectorRNGs
FastPoissonSolver
&More!
Intel®MKLLibraryOffers…

SpeedupAnalytics&MachineLearningwith
Intel®DataAnalyticsAccelerationLibrary(Intel®DAAL)
› Highly tuned functions for classical machine learning & analytics
performance from datacenter to edge running on Intel®
processor-based devices
› Simultaneously ingests data & computes results for highest
throughput performance
› Supports batch, streaming & distributed usage models to meet a
range of application needs
› Includes Python*, C++, Java* APIs, & connectors to popular data
sources including Spark* & Hadoop*
Pre-processing Transformation Analysis Modeling DecisionMaking
Decompression,
Filtering,
Normalization
Aggregation,
Dimension Reduction
Summary Statistics
Clustering, etc.
Machine Learning (Training)
Parameter Estimation
Simulation
Forecasting
Decision
Trees, etc.
Validation
Hypothesis Testing
Model Errors
What’sNewinthe2019Release
New Algorithms
› Logistic Regression, most widely-used classification algorithm
› Extended Gradient Boosting Functionality for inexact split calculations &
user-defined callback canceling for greater flexibility
› User-defined Data Modification Procedure supports a wide range of
feature extraction & transformation techniques
Learn More: software.intel.com/daal

Spark
Core
Feature
Parity
Lower TCO, improved
ease of use
Efficient
Scale-Out
HighPerformanceDeepLearningforApacheSpark*onCPUInfrastructure
No need to deploy costly accelerators, duplicate data,
or suffer through scaling headaches!
Designed&OptimizedforIntel®Xeon®Processor
Powered by Intel® MKL-DNN
DataFrame
ML Pipelines
SQL SparkR Streaming MLlib GraphX BigDL

4
Consumer
Sentiment Analysis
Image Similarity
* Search
Image Transfer
Learning
Image Generation
3D Image
Support
Fraud Detection
Anomaly Detection
Recommendation
NCF
Wide n Deep
Object
Detection
Tensorflow
support
Low latency
serving
Health Finance Retail Manufacturing Infrastructure

Result
Client
JD.Com, 2nd largest
online retailer in China,
~ 25 M users.
ChallengE
Building deep learning applications
such as image similarity search
without moving data.
Solution
Switched from GPU to CPU cluster.
Using Apache Spark* with BigDL,
running on Intel® Xeon® processors
Intel® Xeon® CPU
Processing ~380M images
4XGaiN
CaseStudy: ImageRecognition

The integrated surveillance system connected to cameras at stadiums, which transmitted video
data to operational HQ in each city.
Intel® Distribution of OpenVINO™ toolkit allowed Axxonsoft to distribute the neural network
video analytics of the video across all available Intel hardware, for zone entry detection,
abandoned objects detection, and facial recognition.
SecurityforStadiums
atWorldCup2018
Result
used to protect
Surveillance
Cameras9000+
fans2Million+
See case study for details.

Result
60%increaseIn coverage rate and 20% accuracy rate, better
than traditional rule-based approach
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others.
“Performance of Intel® Xeon® processors and the sustained
optimization of Apache Spark were key [to deploy] a single-
platform that consolidates and analyzes all types of data, from
any channel, within a highly secure environment.”
https://ai.intel.com/nervana/wp-content/uploads/sites/53/2018/06/Intel-White-Paper-Union-Pay_2_hir-res_Keep-the-Size-of-Figure-6.pdf
https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html
Client
China UnionPay*, which
specializes in banking
services and payment
systems. It is the 3rd largest
payment network in the world.
ChallengE
Detect fraudulent credit card
transactions with more coverage and
accuracy.
Solution
Using Cloudera Enterprise (Hadoop Cluster),
Apache Spark* with BigDL, running on Intel®
Xeon® and 5th Gen Intel® Core™ Processors for
credit card fraud detection. Historical data is
stored on Apache Hive*. Data preprocessing
done with Apache Spark SQL*.

Result
Working closely with Intel’s Analytics Zoo team,
Midea built a highly-optimized defect detection
solution, and chose Intel® Xeon® Scalable
6130/6148 over GPU-based servers as it met
their latency requirements and more easily
integrated into their existing infrastructure
https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
“Analytics Zoo from Intel provides a great tool for developing
the end-to-end AI solutions, building pipelines across cloud
and edge computing, and optimizing the hardware resources.”
Zheng Hu, Director of Computer
Vision Research Institute, Midea
Public
Client
Midea Group is a Chinese
electrical appliance
manufacturer with 21
manufacturing plants and 260
logistics centers across 200
countries
ChallengE
Midea needed to eliminate defects
caused by scratched surfaces, missing
bolts, misaligned labeling on surfaces
(glass, polished metal, painted), and
human inspection was not able to meet
target quality metrics or detection rate
requirements.
Solution
An advanced defect inspection system built on
top of Analytics Zoo, which provides a unified
analytics + AI platform that seamlessly unites
Spark, BigDL and TensorFlow* programs into
an integrated pipeline. The system was based
on Intel® Xeon Scalable 6130/6148 servers
and Core i7 edge devices.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others.

Result
The platform provides multiple functions from onboard Wi-Fi to computer vision applications
such as human/vehicle detection at crossroads, onboard empty seat detection and intruder
detection.
OpenVINO™ provides a scalable, high performance common platform across a variety of
hardware for greater efficiencies.
In-train
visionplatform
Enables pedestrian & vehicle
identification at crossroads +
on-train empty seat detection

Intel®DistributionofOpenVINO™Toolkit
COMPUTER VISION & DEEP LEARNING APPS... NOW FASTER.

OpenVINO™SoftwaretoolkitVisual Inferencing & Neural Network Optimization
DEPLOY COMPUTER
VISION & DEEP LEARNING
CAPABILITIES TO THE
EDGE
HighPerformance,highEfficiencyfortheedge
Writeonce+scaletoDiverseAccelerators
BroadFrameworksupport
Other names and brands may be claimed as the property of others
VPU = Vision Processing Unit (Movidius)

51
What’sInsidetheOpenVINO™toolkit
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
Intel®Architecture-Based
PlatformsSupport
OS Support CentOS* 7.4 (64 bit) Ubuntu* 16.04.3 LTS (64 bit) Microsoft Windows* 10 (64 bit) Yocto Project* version Poky Jethro v2.0.3 (64 bit)
Intel® Deep Learning Deployment Toolkit Traditional Computer Vision Tools & Libraries
Model Optimizer
Convert & Optimize
Inference Engine
Optimized InferenceIR OpenCV* OpenVX*
Photography
Vision
Optimized Libraries
IR = Intermediate
Representation file
For Intel® CPU & CPU with integrated graphics
Increase Media/Video/Graphics Performance
Intel® Media SDK
Open Source version
OpenCL™
Drivers & Runtimes
For CPU with integrated graphics
Optimize Intel® FPGA
FPGA RunTime Environment
(from Intel® FPGA SDK for OpenCL™)
Bitstreams
FPGA – Linux* only
20+ Pre-trained
Models
Code SamplesComputer Vision
Algorithms
Samples

0
2
4
6
8
10
12
14
16
18
20
GoogLeNet v1 Vgg16* Squeezenet* 1.1 GoogLeNet v1 (32) Vgg16* (32) Squeezenet* 1.1 (32)
Std. Caffe on CPU OpenCV on CPU OpenVINO on CPU OpenVINO on GPU OpenVINO on FPGA
52
Get an even Bigger Performance Boost with Intel® FPGA
1Depending on workload, quality/resolution for FP16 may be marginally impacted. A performance/quality tradeoff from FP32 to FP16 can affect accuracy; customers are encouraged to experiment to find what works best
for their situation. Performance results are based on testing as of June 13, 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For
more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Configuration: Testing by Intel as of June 13, 2018. Intel® Core™ i7-6700K CPU @ 2.90GHz fixed, GPU GT2 @
1.00GHz fixed Internal ONLY testing, Test v3.15.21 – Ubuntu* 16.04, OpenVINO 2018 RC4, Intel® Arria® 10 FPGA 1150GX. Tests were based on various parameters such as model used (these are public), batch size, and
other factors. Different models can be accelerated with different Intel hardware solutions, yet use the same Intel software tools.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and
other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended
for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice. Notice revision #20110804
Public Models (Batch Size)
RelativePerformance
Improvement
Standard
Caffe*
Baseline
19.9x1
OpenVINO on CPU+Intel® FPGAOpenVINOon CPU+ Intel® Processor Graphics(GPU)/ (FP16)
Comparison of Frames per Second (FPS)
IncreaseDeepLearningWorkloadPerformance
onPublicModelsusingOpenVINO™toolkit&Intel®Architecture

oneAPI
Single Programming Model
to Deliver Cross-Architecture Performance
All information provided in this deck is subject to change without notice.
Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

54
ProgrammingChallenge
Diverse set of data-centric hardware
No common programming language or APIs
Inconsistent tool support across platforms
Each platform requires unique
software investment
Spatial
FPGA
Matrix
AI
Vector
GPU
Scalar
CPU
SVMS
Optimization Notice

The future is a diverse mix of scalar,
vector, matrix, & spatial architectures
deployed in CPU, GPU, AI, FPGA & other
accelerators
DiverseWorkloadsrequireDIVERSEARCHITECTURES
55
Spatial
FPGA
Matrix
AI
Vector
GPU
Scalar
CPU
SVMS
Optimization Notice

56
Project oneAPI delivers a unified
programming model to simplify
development across diverse
architectures
Common developer experience across
Scalar, Vector, Matrix & Spatial
architectures (CPU, GPU, AI and FPGA)
Uncompromised native high-level
language performance
Based on industry standards & open
specifications
Optimized Applications
Optimized
Middleware / Frameworks
oneAPI Language & Libraries
Intel’soneAPICoreConcept
FPGAAIGPUCPU
Scalar Vector Matrix Spatial
oneAPI
Tools
Optimization Notice

Some capabilities may differ per architecture.
57
oneAPIforcross-architectureperformance
Optimized Applications
Optimized Middleware & Frameworks
oneAPI Product
Direct Programming
Data Parallel C++
API-Based Programming
Libraries
Analysis &
Debug Tools
Scalar Vector Matrix Spatial
FPGAAIGPUCPU
Optimization Notice

Language to deliver uncompromised parallel programming productivity and performance across CPUs
and accelerators
Based on C++ with language enhancements being driven through community project
Open, cross-industry alternative to single architecture proprietary language
DataparallelC++Standards-based, Cross-architecture Language
There will still be a need to tune for each architecture.
Optimization Notice

Visit TechDecoded.intel.io — a video series where developers learn to put into
practice key optimization strategies with Intel Development tools.
Focused conversations where
tech. visionaries share key
concepts on front-line topics,
what you need to know and why
it matters.
Put into practice — short videos
and articles that deliver the how-
to’s of specific programming
tasks using Intel tools.
Watch big picture videos Dig deeper with Essential Get started with Quick Hits
Webinars covering strategies,
practices and tools that help
you optimize applications and
solutions performance.
Visual Code Systems & IoT Data Science Data Center &
Computing Modernization Cloud Computing
GetTheMostFromYourCodeTodaywithIntelTech.Decoded
Optimization Notice

60
Tools for C/C++/Python/Fortran
developers – HPC, AI, IOT, Cloud
Partner programs focused
on Developer enablement
Developers, Customers,
Partners Trained
129Customers
Engaged
150K 67Programs
62Partners
software.intel.com Techdecoded.intel.io
Intel®Software Intel®DeveloperWorkshops

LET’SACCELERATETHEFUTURETOGETHER

SecuritybarrierRecognitionmodelUsing
intel®DeepLearningDeploymentToolkit
65

Run Inference 1:
Model
vehicle-license-plate-
detection-barrier-0007
Detects Vehicles
Run Inference 2:
Model
vehicle-attributes-
recognition-barrier-0010
Classifies vehicle attributes
Run Inference 3:
Model
license-plate-recognition-
barrier-0001
Detects License Plates
Load Input Image(s)
Display Results
66

67
End-to-EndVisionWorkflow
Decode
Pre-
Processing Inference
Post-
Processing Encode
GPUCPU GPUCPU FPGA VPU GPUCPU
Intel®
Media SDK
OpenCV*
Intel® Deep
Learning
Deployment
Toolkit
Intel
Media SDK
Video input Video output
with results
annotated
OpenCV

68
KeyVisionSolutionsOptimizedbyIntel®
DistributionofOpenVINO™toolkit
Intel teamed with Philips to show that servers powered by Intel® Xeon®
Scalable processors & Intel® Distribution of OpenVINO™ toolkit can efficiently
perform deep learning inference on patients’ X-rays & computed tomography
(CT) scans, without the need for accelerators. Achieved breakthrough
performance for AI inferencing:
▪ 188x increase in throughput (images/sec) on Bone-age prediction model.1
▪ 38x increase in throughput (images/sec) on Lung segmentation model. 1
“Intel® Xeon® Scalable processors and OpenVINO toolkit appears to be the right solution for medical imaging AI
workloads. Our customers can use their existing hardware to its maximum potential, without having to complicate their
infrastructure, while still aiming to achieve quality output resolution at exceptional speeds."
— Vijayananda J., chief architect and fellow, Data Science and AI, Philips HealthSuite Insights, India
White Paper
1See white paper for performance details.
Philips

69
The Intel® Distribution of OpenVINO™ toolkit helped GE deliver
optimized inferencing to its deep learning image-classification solution.
By bringing AI to its clinical diagnostic scanning, GE no longer needed
an expensive 3rd party accelerator board, achieving:
▪ 5.9x inferencing performance above the target1
▪ 14x inferencing speed over the baseline solution1
▪ Improved image quality, diagnostic capabilities, and clinical workflows
With the OpenVINO™ toolkit , we are now able to optimize inferencing across Intel® silicon, exceeding our throughput goals by almost 6x,”
said David Chevalier, Principal Engineer for GE Healthcare.
“We want to not only keep deployment costs down for our customers, but also offer a flexible, high-performance solution for a new era of
smarter medical imaging. Our partnership with Intel allows us to bring the power of AI to clinical diagnostic scanning and other healthcare
workflows in a cost-effective manner.”
GE Healthcare*
Intel-GE Healthcare, Intel® Distribution of OpenVINO™ Optimizes Deep Learning Performance for Healthcare Imaging
KeyVisionSolutionsOptimizedbyIntel®Distributionof
OpenVINO™toolkit
1See white paper for performance details.

70
DemonstratedIndustrySuccessAccess Developer Success Stories for details & more examples

AIDC India - AI on IA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AIDC India - AI on IA

Similar to AIDC India - AI on IA (20)

More from Intel® Software

More from Intel® Software (14)

Recently uploaded

Recently uploaded (20)

AIDC India - AI on IA