SlideShare a Scribd company logo
1 of 44
Download to read offline
© 2002 IBM CorporationAugust 11, 2005
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Graphics Hardware Workshop 2005
Bruce D’Amora
Emerging Systems Software
IBM T.J. Watson Research Center
http://www.research.ibm.com
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation2
AgendaAgenda
Introduction
Architecture
Programming
Graphics & Visualization
Demos
Questions
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation3
IntroductionIntroduction
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation4
What is Cell?What is Cell?
Sony, Toshiba, IBM alliance established in March, 2001
− “STI” Design Center is located in Austin, Texas
Goals
− Outstanding performance, especially for game/multimedia
− Real time responsiveness to the user and the network
− Applicable to a wide range of platforms and applications
Design Concepts
− Compatibility with 64-bit Power Architecture™
− Increased efficiency and performance
− Non-homogeneous architecture
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation5
ArchitectureArchitecture
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation6
Cell Processor ArchitectureCell Processor Architecture
SPU
LS
SPU
LS
SPU
LS
SPU
LS
SPU
LS
SPU
LS
SPU
LS
EIB (up to 96B/cycle)
SPU
LS
16B/cycle
16B/cycle
SPE0 SPE1 SPE2 SPE3 SPE4 SPE5 SPE6 SPE7
L2
L1
PPU
32B/cycle
16B/cycle
MIC BIC
16B/cycle(2x)16B/cycle
Dual XDR™
I/O I/O
64-bit Power Architecture w/VMX
PPE
AUC
MFC MFC MFC MFC MFC MFC MFC MFC
AUC AUC AUC AUC AUC AUC AUC
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation7
Element Interconnect BusElement Interconnect Bus
EIB data ring for internal communication
− Four 16 byte data rings, supporting multiple transfers
− 96B/cycle peak bandwidth
− Over 100 outstanding requests
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation8
Power Processor ElementPower Processor Element
PPE handles operating system and control tasks
− 64-bit Power ArchitectureTM
with VMX
− In-order, 2-way hardware symmetrical multi-threading (SMT)
− Load/Store with 32KB I & D L1 and 512KB L2
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation9
Synergistic Processor ElementSynergistic Processor Element
SPE provides computational performance
− Dual issue, up to 16-way 128-bit SIMD
− Dedicated resources: 128 128-bit register file, 256KB Local Store
− Each can be dynamically configured to protect resources
− Dedicated DMA engine: Up to 16 outstanding requests per SPE
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation10
SPE HighlightsSPE Highlights
User-mode architecture
− No translation/protection within SPU
− DMA is full Power Architecture protect/x-late
RISC like structures
− 32 bit fixed instructions
− Clean design – unified Register file
VMX-like SIMD dataflow
− Broad set of operations (8 / 16 / 32 Byte)
− Graphics SP-Float
− IEEE DP-Float
Unified register file
− 128 entry x 128 bit
256KB Local Store
− Combined I & D
− 16B/cycle L/S bandwidth
− 128B/cycle DMA bandwidth
LS
LS
LS
LS
GPR
FXU ODD
FXU EVN
SFP
DP
CONTROL
CHANNEL
DMA SMM
ATO
SBI
RTB
BEB
FWD
14.5mm2 (90nm SOI)
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation11
I/O and Memory InterfacesI/O and Memory Interfaces
I/O Provides wide bandwidth
− Two configurable interfaces
− Up to 25.6 GB/s memory B/W
– Up to 70+ GB/s I/O B/W
– Practical ~ 50GB/s
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation12
ConfigurabilityConfigurability
Direct Attach XDR
Two I/O interfaces
– Configurable number of Bytes
– Coherent or I/O Protection
CELL
Processor
XDRtm XDRtm
IOIF0 IOIF1
CELL
Processor
XDRtm
XDRtm
IOIF BIF
CELL
Processor
XDRtm
XDRtm
IOIF
CELL
Processor
XDRtm
XDRtm
IOIF
BIF
CELL
Processor
XDRtm
XDRtm
IOIF
CELL
Processor
XDRtm
XDRtm
IOIF
BIF
CELL
Processor
XDRtm
XDRtm
IOIF
SW
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation13
ProgrammingProgramming
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation14
BE Features Exploited by SoftwareBE Features Exploited by Software
Keeping Intermediate/Control Data on-Chip
− MMU – SLBs, TLBs
− DMA from L2 cache-> LS
− LS to LS DMA
− Cache <-> Cache transfers (atomic update)
− SPE Signalling Registers
− SPE <-> PPE Mailboxes
Resource Reservation and Allocation
− PPE can be shared across logical partitions
− SPEs can be assigned to logical partitions
− SPEs independently or Group Allocated
PowerPC
(PPE)
L2 Cache
DMA with Intervention
Hardware Managed Cache Coherency
BE Chip
System Memory I/O
BIF/IOIF
MFC
Local Store
SPU
AUC
MFC
Local Store
SPU
AUC
MFC
Local Store
SPU
AUC
MFC
Local Store
SPU
AUC
MFC
Local Store
SPU
AUC
MFC
Local Store
SPU
AUC
Local Store to Local Store DMA
Atomic Update Cache
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation15
Models for Programming the CellModels for Programming the Cell
Function Offload
Application Specific Accelerators
Computation-Acceleration
Streaming
Shared Memory Multi-processor
Heterogeneous Thread Runtime Model
Single Source Compiler
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation16
Function OffloadFunction Offload
Easiest way to port applications to Cell
PPE can call procedure located on the SPE as if it were a
local PPE function
PPE and SPE use “stubs” as placeholders for local and
remote procedures
Main() image
f1() PPE stub
f2() image
f1() SPE stub
f1() image
PPE SPE
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation17
Function Offload via IDLFunction Offload via IDL
.idl file
IDL compiler
Program.c Function.cSPE_stub.cStub.hPPE_stub.c
SPE compiler
& linker
SPE binary
PPE compiler
linker
PPE binary
RPC run-time
library
IDL compiler facilitates the RPC
function offload
– Defines interfaces between
PPE main program and SPE
RPC program
Function loaded onto an SPE only once
− PPE can make multiple calls to that
procedure without having to reload it.
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation18
OS allocates and initializes SPE resources
N
N
N
N
N
N
N
Special Case: Application Specific AcceleratorsSpecial Case: Application Specific Accelerators
OS services invokes SPE function
System Memory
SPE Function
Graphics MPEGSSL
Vertex shaders
Encryption Decryption
Realtime
MPEG
Encoding
MPEG DecodeMPEG Decode
PPE
OS
Application
Services
PPE_mpeg_encode()
SPE_mpeg_encode()
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation19
Programming Models
Computational AccelerationComputational Acceleration
PowerPC
(PPE)
SPU Puts
Results
PPE
Puts
Initial Text
Static Data
Parameters System Memory
SPC Independently
Stages Text & Intermediate Data
Transfers while executing
PPE
SPU
Local Store
MFC
PPE
Gets
Results
PPE passes:
Text
Static Data
Parameters
SPU
executes
User created RPC libraries
− User acceleration routines
− User compiles SPE code
Local Data
− Data and Parameters passed in call
Global Data
− Data and Parameters passed in call
− Code manages global data
SPU
Local Store
MFC
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation20
Streaming ModelStreaming Model
SPE initiated DMA
Input/output/kernel streams
–DMA’d from system memory->LS or LS->LS
PPE
SPE1
Kernel()
SPE0
Kernel()
SPE7
Kernel()
XDR
In
.
I7
I6
I5
I4
I3
I2
I1
I0
On
.
O7
O6
O5
O4
O3
O2
O1
O0
…..
Data-parallel
PPE
SPE1
Kernel1()
SPE0
Kernel0()
SPE7
Kernel7()
XDR
In
.
.
I6
I5
I4
I3
I2
I1
I0
On
.
.
O6
O5
O4
O3
O2
O1
O0
…..
DMA DMA
Task-parallel
SPE7
Ki
SPE1
Ki
SPE0
Ki
XDR
In
.
I7
I6
I5
I4
I3
I2
I1
I0
…..
Data&Kernel Streams
Kn
.
.
K6
K5
K4
K3
K2
K1
K0
SPE7
Ki
PPE
On
.
O7
O6
O5
O4
O3
O2
O1
O0
SPE7
Ki
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation21
SharedShared--memory Multiprocessormemory Multiprocessor
The Cell Processor can be programmed as a shared-memory
multiprocessor, using two different instruction sets
The SPEs and the PPE fully inter-operate in a cache-coherent
Shared-Memory Multiprocessor Model
– All DMA operations in the SPEs are cache-coherent
– The DMA operations use an effective address that is common to all
PPE and SPEs
– Shared-memory store instructions are replaced by a store from the
register file to the LS, followed by a DMA operation from LS to shared
memory.
A compiler or interpreter could manage part of the LS as a local
cache for instructions and data obtained from shared memory.
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation22
Application Source
& Libraries
PPE object files SPE object files
SPESPE SPE SPE
Physical SPUs
SPESPE SPE SPE
Cell AwareOS ( Linux)
SPE Virtualization / Scheduling Layer (m->n spc threads)
New SPE tasks/threadsExisting PPE
tasks/threads
Physical PPE
PPE
MT1 MT2
PPE Threads, SPE Threads
Shared effective address space
SPE Virtualization in debug mode only
Heterogeneous MultiHeterogeneous Multi--Threading ModelThreading Model
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation23
Single source approach to programming Cell
Single Source Compiler
− Auto parallelization ( treat target Cell as an SMP )
− Auto SIMD-ization ( SIMD-vectorization )
− Compiler management of Local Store as 2nd level register file / SW managed cache (I&D)
− Most Cell unique piece
Optimization
− OpenMP pragmas
− Vector.org SIMD intrinsics
− Data/Code partitioning
− Streaming / pre-specifying code/data use
Prototype Single Source Compiler Developed in IBM Research
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation24
Graphics & VisualizationGraphics & Visualization
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation25
Graphics and Visualization on CellGraphics and Visualization on Cell
Terrain Rendering Engine
Multi-resolution Subdivision Surfaces
Geometry
Processing
Courtesy of Alias Systems
Cloth
Simulation
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation26
The Cell and GPUsThe Cell and GPUs
Extension to shader pipeline on GPUs
− Vertex shaders for geometric modeling
− NURBS
− Subdivision surfaces
− Continuous level of detail
− Vertex shaders for a additional lighting models
Physically Based Modeling
− Soft-body dynamics
− Rigid-body dynamics
Image Processing
− Background Subtraction for Video Surveillance
Global Illumination
– Raytracing
− Raycasting Terrains
− Rendering of DEMs
− Surface shaders for lighting, shadows
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation27
The Cell and GPUsThe Cell and GPUs
SPU
LS
MFC
SPE 6
PPU
L1
L2
PPE
SPU
LS
MFC
SPE 4
SPU
LS
MFC
SPE2
SPU
LS
MFC
SPE 0
SPU
LS
MFC
SPE 7
SPU
LS
MFC
SPE 5
SPU
LS
MFC
SPE3
SPU
LS
MFC
SPE 1
Physically-based modeling
• Soft/rigid body
• Fluid dynamics
Vertex Shaders
• Lighting models
• Subdivision
App
Cell
I/O&MemoryInterfaces
GPU
Pixel
Processing
Frame Buffer
Vertex
Processing
Local or network connection
Runtime generated
textures, maps,
vertices, etc.
Geometry, textures,
maps, etc.
AI
System
Memory
Terrain Generation
• Incl. depth maps
D
ata/code
flow
Control Meshes
Tesselated geom.
Maps, Textures, etc.
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation28
Image Processing on CellImage Processing on Cell
Video Surveillance
Background subtraction compares the current image (left) with a
reference image (middle) to find the changed regions (right)
corresponding to objects of interest.
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation29
BGS Application Structure
AGC & AWB
compensation
Kalman
smoothing
motion
energy
shadow &
highlights
pixel
differences
texture
differences
Image
Background
Mask
Salience
quiescent
area addition
slow update
blending
stationary
healing
determine
add / remove
statistical
thresholding
SIGNAL
contour
smoothing
component
pruning
per Jonathan Connell
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation30
Alternative 1Alternative 1 –– Single SPE for Video StreamSingle SPE for Video Stream
Each SPE dedicated to one or more video streams
Code overlaying may be needed to overcome the local memory
limitation
Seems most compatible with existing application structure
No code or data partitioning across SPEs
SPE 0
Video
Stream 1
SPE 1
Video
Stream 2
SPE 7
Video
Stream N
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation31
Alternative 2Alternative 2 –– Image Pipeline for TaskImage Pipeline for Task--levellevel
ParallelismParallelism
Functions are divided into groups
Groups set up in pipelined fashion
Each SPE dedicated to only one group
Outputs from one group passed SPE-to-SPE as inputs to the next
Code is partitioned – Data is not partitioned
Efficient utilization of SPEs
SPE 0
Video
Streams
SPE 1 SPE 7
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation32
Alternative 3Alternative 3 –– DataData--level Parallelismlevel Parallelism
For a given function, each SPE processes part of a frame
Not easily applied to all BGS functions
00 00 00 00 00 00 00 07
18 2D 43 5B 74 8D A2 B3
C5 D7 EC FF FF FF FF FF
FF FF FF FF FF FF FF FF
FF FF FF FA DE C5 AC 94
7B 62 4A 31 1C 0A 00 00
SPE 0
SPE 1
SPE 2
SPE 3
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation33
MultiMulti--resolution Subdivisionresolution Subdivision
Detail vector
Base Mesh
Decompose
Into 1-rings
PPE
SPE1
Kernel()
SPE0
Kernel()
SPE7
Kernel()…..
Data-parallel
System Memory
N 1-rings
…
…
Rn
.
R2
R1
R0
.
D6
D5
D4
D3
D2
D1
D0
Tj
.
.
T6
T5
T4
T3
T2
T1
T0
Store as
Ring stream
Triangles to GPU
GPU
Pixel
Processing
Frame Buffer
Vertex
Processing
Local or network connection
Dm
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation34
DemonstrationsDemonstrations
Raytracing
Terrain Rendering (TRE)
Cloth Simulation
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation35
Demo Platform: Cell Blade PrototypeDemo Platform: Cell Blade Prototype
Chassis
Blade
Deskside
Unit
Blade Server
– Dual Cell Processors (SMP), Support Logic,
Memory, Storage
– PCI Express 4X option port
– BladeCenter Interface ( Based on IBM JS20)
– Infiniband 4x (10Gbps) interconnect
Chassis
– Standard IBM BladeCenter form factor with:
– 7 Blades (1 blade/2 slots)
– 2 internal switches (1Gb Ethernet) with 4 external ports each
– Separate, external Infiniband Switch with optional FC port
Software
– Linux OS
– Tool chain and compilation support
– Additional IBM Software Development Tools and Cluster
Management software
– Client Systems
– IBM T41 Thinkpad (1.7 Ghz Pentium-M)
– Apple Mac-dual G5@2Ghz
Blade
BladeCenter Interface
PCI-Exp
Cell
Processor
South
Bridge
Memory
Cell
Processor
South
Bridge
Memory
IB
4X
IB
4X
HDD (HDD)
GbE GbE
Rack
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation36
Camera Shader
Example: Raytracing ArchitectureExample: Raytracing Architecture
OpenRT
Generate Tiles
Spatial Subdivision
K-d tree Traversal
Material Shading
Intersection Testing
Generate
Rays
View transorms
Recursive/Adaptive
Sorting
Spatial Subdivision
Structure
K-d Tree
Per object
& top-level
System Memory
•framebuffer
System Memory
•Texture Maps
•Run-time shaders
Tile pixels
Triangle-Ray pairs
Rays
SecondaryRays
PU Code
Potential execution on SPE
SPU code
Object Data
SPE iPPE
If intersect
false
true
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation37
Cell Optimized Ray Caster: Terrain Rendering EngineCell Optimized Ray Caster: Terrain Rendering Engine
30+ frames per second with only one Cell processor30+ frames per second with only one Cell processor
–– No graphics adapter assistNo graphics adapter assist
–– 1280x720 (HD 720P) resolution1280x720 (HD 720P) resolution
HD 1080P at 30+ frames per second via 2 way SMP CellHD 1080P at 30+ frames per second via 2 way SMP Cell
Advanced SPE shader functionAdvanced SPE shader function
–– Ray/Terrain intersection computationRay/Terrain intersection computation
–– Texture FilteringTexture Filtering
–– Normal computationNormal computation
–– Bump map computationBump map computation
–– Diffuse + Ambient lighting modelDiffuse + Ambient lighting model
–– Perlin Noise based cloudsPerlin Noise based clouds
–– Atmosphere computation (haze, sun, halo)Atmosphere computation (haze, sun, halo)
–– Dynamic multiDynamic multi--sampling (4sampling (4 –– 16 samples per pixel)16 samples per pixel)
–– Image based input (16 bit height + 16 bit texture)Image based input (16 bit height + 16 bit texture)
–– 29 KB of SPE object code29 KB of SPE object code
–– 224 KB of SPE local store data224 KB of SPE local store data
MM--JPEG compression via SPEJPEG compression via SPE
Performance scales linearly with number of available SPEsPerformance scales linearly with number of available SPEs
Written completely in C with intrinsicsWritten completely in C with intrinsics
Currently up and running on bring up systems!Currently up and running on bring up systems!
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation38
Vertical Ray CoherenceVertical Ray Coherence
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation39
Algorithm Mapped to Cell
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation40
Physically Based Modeling: Alias research prototypePhysically Based Modeling: Alias research prototype
Cell implementation of Alias cloth solver
– Compute the displacement of 3-D cloth mesh vertices in
response to various gravitational, collision and constaints
– Each simulation can occur on a separate SPE
Render results on commodity graphics adapter – locally
or over network
Courtesy of Alias Systems™
SPEN
.
.
.
input
output
SPE3
input
output
SPE2
input
output
SPE1
input
output
SPE0
input
output
System
Memory
PPE
Main
SocketMgr
Create SPE Treads
DMAIN
DMAOUT
VertexDisplacementsout
Position, orientation input
Triangle output
Forcevectorsin
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation41
Cell beyond the PS3Cell beyond the PS3
IBM support for custom systems and applications based on Cell
− Through IBM Engineering and Technology Services
Planned release (next several months) of architecture, simulator, and compiler to
enable Cell software development and evaluation in a large community
− Linux OS, compilers, debugger, full system simulator
− Open source and IBM alphaworks
Prototype system implementations and evaluation of markets beyond console
− Cell Processor Based Blades (acceleration and other apps.)
− CE devices (HDTV, home media server)
− Standard products for embedded applications
− Prototype reference designs for smaller systems
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation42
SummarySummary
Cell ushers in a new era of leading edge processors optimized for digital
media and entertainment
New levels of performance and power efficiency beyond what is achieved by
PC processors
Step towards HPC / game processor convergence
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation43
AcknowledgementsAcknowledgements
Sony Computer Entertainment
Toshiba Corporation
IBM STI Design Center:
Peter Hofstee
Barry Minor
Mark Nutter
Gerhard Stenzel, IBM Boeblingen Development Labs
Philipp Slusallek, Saarland University
Alias Systems:
Joyce Janczyn
Francesco Iorio
Jos Stam
IBM T.J. Watson Research Center
Cell Technology for Graphics and Visualization © 2005 IBM Corporation44
Thank youThank you

More Related Content

What's hot

Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing SlidesRonen Mendezitsky
 
COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...
COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...
COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...IBM India Smarter Computing
 
Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Eric Van Hensbergen
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architecturesA B Shinde
 
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAlan Su
 
Z4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OSZ4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OSTony Pearson
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
 
Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...Intel Software Brasil
 
Abstract chameleon chip
Abstract chameleon chipAbstract chameleon chip
Abstract chameleon chipAnugrah James
 
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar:  a Content Router for High Speed ForwardingCCNxCon2012: Session 4: Caesar:  a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed ForwardingPARC, a Xerox company
 
Lakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D PackageLakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D Packageinside-BigData.com
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
 

What's hot (20)

SoC: System On Chip
SoC: System On ChipSoC: System On Chip
SoC: System On Chip
 
Technology (1)
Technology (1)Technology (1)
Technology (1)
 
Sakar jain
Sakar jainSakar jain
Sakar jain
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...
COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...
COST/BENEFIT CASE FOR IBM SYSTEM STORAGE DS8700: COMPARISONS WITH EMC SYMMETR...
 
Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)
 
Intel tools to optimize HPC systems
Intel tools to optimize HPC systemsIntel tools to optimize HPC systems
Intel tools to optimize HPC systems
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
 
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
 
Z4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OSZ4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OS
 
Ibm cell
Ibm cell Ibm cell
Ibm cell
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
 
Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...
 
Introduction to BeagleBoard-xM
Introduction to BeagleBoard-xMIntroduction to BeagleBoard-xM
Introduction to BeagleBoard-xM
 
Abstract chameleon chip
Abstract chameleon chipAbstract chameleon chip
Abstract chameleon chip
 
Ph.D. Thesis presentation
Ph.D. Thesis presentationPh.D. Thesis presentation
Ph.D. Thesis presentation
 
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar:  a Content Router for High Speed ForwardingCCNxCon2012: Session 4: Caesar:  a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
 
Lakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D PackageLakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D Package
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 

Similar to Cell Technology for Graphics and Visualization

Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorSlide_N
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologysolarisyougood
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...Michael Gschwind
 
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationCell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationSlide_N
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeAnderson Bassani
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processingideas2ignite
 
Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessorsKrunal Siddhapathak
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMchiportal
 
Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...
Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...
Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...Ingria. Technopark St. Petersburg
 
Track F- Designing the kiler soc - sonics
Track F- Designing the kiler soc - sonicsTrack F- Designing the kiler soc - sonics
Track F- Designing the kiler soc - sonicschiportal
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - ReviewAbdelrahman Hosny
 
System_on_Chip_SOC.ppt
System_on_Chip_SOC.pptSystem_on_Chip_SOC.ppt
System_on_Chip_SOC.pptzahixdd
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 

Similar to Cell Technology for Graphics and Visualization (20)

Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processorHardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
 
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
 
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationCell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessors
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBM
 
Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...
Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...
Вадим Сухомлинов _Платформы Intel(r) Atom(tm) – новые возможности для социаль...
 
Track F- Designing the kiler soc - sonics
Track F- Designing the kiler soc - sonicsTrack F- Designing the kiler soc - sonics
Track F- Designing the kiler soc - sonics
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
 
System_on_Chip_SOC.ppt
System_on_Chip_SOC.pptSystem_on_Chip_SOC.ppt
System_on_Chip_SOC.ppt
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
resume
resumeresume
resume
 
Hosseini sv07
Hosseini sv07Hosseini sv07
Hosseini sv07
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
 

More from Slide_N

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfSlide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive EntertainmentSlide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingSlide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfSlide_N
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupSlide_N
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiSlide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSlide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60 Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignSlide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: TheorySlide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionSlide_N
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerSlide_N
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesSlide_N
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3Slide_N
 

More from Slide_N (20)

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution Continues
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Cell Technology for Graphics and Visualization

  • 1. © 2002 IBM CorporationAugust 11, 2005 Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization Graphics Hardware Workshop 2005 Bruce D’Amora Emerging Systems Software IBM T.J. Watson Research Center http://www.research.ibm.com
  • 2. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation2 AgendaAgenda Introduction Architecture Programming Graphics & Visualization Demos Questions
  • 3. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation3 IntroductionIntroduction
  • 4. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation4 What is Cell?What is Cell? Sony, Toshiba, IBM alliance established in March, 2001 − “STI” Design Center is located in Austin, Texas Goals − Outstanding performance, especially for game/multimedia − Real time responsiveness to the user and the network − Applicable to a wide range of platforms and applications Design Concepts − Compatibility with 64-bit Power Architecture™ − Increased efficiency and performance − Non-homogeneous architecture
  • 5. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation5 ArchitectureArchitecture
  • 6. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation6 Cell Processor ArchitectureCell Processor Architecture SPU LS SPU LS SPU LS SPU LS SPU LS SPU LS SPU LS EIB (up to 96B/cycle) SPU LS 16B/cycle 16B/cycle SPE0 SPE1 SPE2 SPE3 SPE4 SPE5 SPE6 SPE7 L2 L1 PPU 32B/cycle 16B/cycle MIC BIC 16B/cycle(2x)16B/cycle Dual XDR™ I/O I/O 64-bit Power Architecture w/VMX PPE AUC MFC MFC MFC MFC MFC MFC MFC MFC AUC AUC AUC AUC AUC AUC AUC
  • 7. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation7 Element Interconnect BusElement Interconnect Bus EIB data ring for internal communication − Four 16 byte data rings, supporting multiple transfers − 96B/cycle peak bandwidth − Over 100 outstanding requests
  • 8. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation8 Power Processor ElementPower Processor Element PPE handles operating system and control tasks − 64-bit Power ArchitectureTM with VMX − In-order, 2-way hardware symmetrical multi-threading (SMT) − Load/Store with 32KB I & D L1 and 512KB L2
  • 9. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation9 Synergistic Processor ElementSynergistic Processor Element SPE provides computational performance − Dual issue, up to 16-way 128-bit SIMD − Dedicated resources: 128 128-bit register file, 256KB Local Store − Each can be dynamically configured to protect resources − Dedicated DMA engine: Up to 16 outstanding requests per SPE
  • 10. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation10 SPE HighlightsSPE Highlights User-mode architecture − No translation/protection within SPU − DMA is full Power Architecture protect/x-late RISC like structures − 32 bit fixed instructions − Clean design – unified Register file VMX-like SIMD dataflow − Broad set of operations (8 / 16 / 32 Byte) − Graphics SP-Float − IEEE DP-Float Unified register file − 128 entry x 128 bit 256KB Local Store − Combined I & D − 16B/cycle L/S bandwidth − 128B/cycle DMA bandwidth LS LS LS LS GPR FXU ODD FXU EVN SFP DP CONTROL CHANNEL DMA SMM ATO SBI RTB BEB FWD 14.5mm2 (90nm SOI)
  • 11. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation11 I/O and Memory InterfacesI/O and Memory Interfaces I/O Provides wide bandwidth − Two configurable interfaces − Up to 25.6 GB/s memory B/W – Up to 70+ GB/s I/O B/W – Practical ~ 50GB/s
  • 12. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation12 ConfigurabilityConfigurability Direct Attach XDR Two I/O interfaces – Configurable number of Bytes – Coherent or I/O Protection CELL Processor XDRtm XDRtm IOIF0 IOIF1 CELL Processor XDRtm XDRtm IOIF BIF CELL Processor XDRtm XDRtm IOIF CELL Processor XDRtm XDRtm IOIF BIF CELL Processor XDRtm XDRtm IOIF CELL Processor XDRtm XDRtm IOIF BIF CELL Processor XDRtm XDRtm IOIF SW
  • 13. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation13 ProgrammingProgramming
  • 14. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation14 BE Features Exploited by SoftwareBE Features Exploited by Software Keeping Intermediate/Control Data on-Chip − MMU – SLBs, TLBs − DMA from L2 cache-> LS − LS to LS DMA − Cache <-> Cache transfers (atomic update) − SPE Signalling Registers − SPE <-> PPE Mailboxes Resource Reservation and Allocation − PPE can be shared across logical partitions − SPEs can be assigned to logical partitions − SPEs independently or Group Allocated PowerPC (PPE) L2 Cache DMA with Intervention Hardware Managed Cache Coherency BE Chip System Memory I/O BIF/IOIF MFC Local Store SPU AUC MFC Local Store SPU AUC MFC Local Store SPU AUC MFC Local Store SPU AUC MFC Local Store SPU AUC MFC Local Store SPU AUC Local Store to Local Store DMA Atomic Update Cache
  • 15. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation15 Models for Programming the CellModels for Programming the Cell Function Offload Application Specific Accelerators Computation-Acceleration Streaming Shared Memory Multi-processor Heterogeneous Thread Runtime Model Single Source Compiler
  • 16. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation16 Function OffloadFunction Offload Easiest way to port applications to Cell PPE can call procedure located on the SPE as if it were a local PPE function PPE and SPE use “stubs” as placeholders for local and remote procedures Main() image f1() PPE stub f2() image f1() SPE stub f1() image PPE SPE
  • 17. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation17 Function Offload via IDLFunction Offload via IDL .idl file IDL compiler Program.c Function.cSPE_stub.cStub.hPPE_stub.c SPE compiler & linker SPE binary PPE compiler linker PPE binary RPC run-time library IDL compiler facilitates the RPC function offload – Defines interfaces between PPE main program and SPE RPC program Function loaded onto an SPE only once − PPE can make multiple calls to that procedure without having to reload it.
  • 18. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation18 OS allocates and initializes SPE resources N N N N N N N Special Case: Application Specific AcceleratorsSpecial Case: Application Specific Accelerators OS services invokes SPE function System Memory SPE Function Graphics MPEGSSL Vertex shaders Encryption Decryption Realtime MPEG Encoding MPEG DecodeMPEG Decode PPE OS Application Services PPE_mpeg_encode() SPE_mpeg_encode()
  • 19. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation19 Programming Models Computational AccelerationComputational Acceleration PowerPC (PPE) SPU Puts Results PPE Puts Initial Text Static Data Parameters System Memory SPC Independently Stages Text & Intermediate Data Transfers while executing PPE SPU Local Store MFC PPE Gets Results PPE passes: Text Static Data Parameters SPU executes User created RPC libraries − User acceleration routines − User compiles SPE code Local Data − Data and Parameters passed in call Global Data − Data and Parameters passed in call − Code manages global data SPU Local Store MFC
  • 20. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation20 Streaming ModelStreaming Model SPE initiated DMA Input/output/kernel streams –DMA’d from system memory->LS or LS->LS PPE SPE1 Kernel() SPE0 Kernel() SPE7 Kernel() XDR In . I7 I6 I5 I4 I3 I2 I1 I0 On . O7 O6 O5 O4 O3 O2 O1 O0 ….. Data-parallel PPE SPE1 Kernel1() SPE0 Kernel0() SPE7 Kernel7() XDR In . . I6 I5 I4 I3 I2 I1 I0 On . . O6 O5 O4 O3 O2 O1 O0 ….. DMA DMA Task-parallel SPE7 Ki SPE1 Ki SPE0 Ki XDR In . I7 I6 I5 I4 I3 I2 I1 I0 ….. Data&Kernel Streams Kn . . K6 K5 K4 K3 K2 K1 K0 SPE7 Ki PPE On . O7 O6 O5 O4 O3 O2 O1 O0 SPE7 Ki
  • 21. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation21 SharedShared--memory Multiprocessormemory Multiprocessor The Cell Processor can be programmed as a shared-memory multiprocessor, using two different instruction sets The SPEs and the PPE fully inter-operate in a cache-coherent Shared-Memory Multiprocessor Model – All DMA operations in the SPEs are cache-coherent – The DMA operations use an effective address that is common to all PPE and SPEs – Shared-memory store instructions are replaced by a store from the register file to the LS, followed by a DMA operation from LS to shared memory. A compiler or interpreter could manage part of the LS as a local cache for instructions and data obtained from shared memory.
  • 22. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation22 Application Source & Libraries PPE object files SPE object files SPESPE SPE SPE Physical SPUs SPESPE SPE SPE Cell AwareOS ( Linux) SPE Virtualization / Scheduling Layer (m->n spc threads) New SPE tasks/threadsExisting PPE tasks/threads Physical PPE PPE MT1 MT2 PPE Threads, SPE Threads Shared effective address space SPE Virtualization in debug mode only Heterogeneous MultiHeterogeneous Multi--Threading ModelThreading Model
  • 23. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation23 Single source approach to programming Cell Single Source Compiler − Auto parallelization ( treat target Cell as an SMP ) − Auto SIMD-ization ( SIMD-vectorization ) − Compiler management of Local Store as 2nd level register file / SW managed cache (I&D) − Most Cell unique piece Optimization − OpenMP pragmas − Vector.org SIMD intrinsics − Data/Code partitioning − Streaming / pre-specifying code/data use Prototype Single Source Compiler Developed in IBM Research
  • 24. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation24 Graphics & VisualizationGraphics & Visualization
  • 25. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation25 Graphics and Visualization on CellGraphics and Visualization on Cell Terrain Rendering Engine Multi-resolution Subdivision Surfaces Geometry Processing Courtesy of Alias Systems Cloth Simulation
  • 26. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation26 The Cell and GPUsThe Cell and GPUs Extension to shader pipeline on GPUs − Vertex shaders for geometric modeling − NURBS − Subdivision surfaces − Continuous level of detail − Vertex shaders for a additional lighting models Physically Based Modeling − Soft-body dynamics − Rigid-body dynamics Image Processing − Background Subtraction for Video Surveillance Global Illumination – Raytracing − Raycasting Terrains − Rendering of DEMs − Surface shaders for lighting, shadows
  • 27. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation27 The Cell and GPUsThe Cell and GPUs SPU LS MFC SPE 6 PPU L1 L2 PPE SPU LS MFC SPE 4 SPU LS MFC SPE2 SPU LS MFC SPE 0 SPU LS MFC SPE 7 SPU LS MFC SPE 5 SPU LS MFC SPE3 SPU LS MFC SPE 1 Physically-based modeling • Soft/rigid body • Fluid dynamics Vertex Shaders • Lighting models • Subdivision App Cell I/O&MemoryInterfaces GPU Pixel Processing Frame Buffer Vertex Processing Local or network connection Runtime generated textures, maps, vertices, etc. Geometry, textures, maps, etc. AI System Memory Terrain Generation • Incl. depth maps D ata/code flow Control Meshes Tesselated geom. Maps, Textures, etc.
  • 28. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation28 Image Processing on CellImage Processing on Cell Video Surveillance Background subtraction compares the current image (left) with a reference image (middle) to find the changed regions (right) corresponding to objects of interest.
  • 29. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation29 BGS Application Structure AGC & AWB compensation Kalman smoothing motion energy shadow & highlights pixel differences texture differences Image Background Mask Salience quiescent area addition slow update blending stationary healing determine add / remove statistical thresholding SIGNAL contour smoothing component pruning per Jonathan Connell
  • 30. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation30 Alternative 1Alternative 1 –– Single SPE for Video StreamSingle SPE for Video Stream Each SPE dedicated to one or more video streams Code overlaying may be needed to overcome the local memory limitation Seems most compatible with existing application structure No code or data partitioning across SPEs SPE 0 Video Stream 1 SPE 1 Video Stream 2 SPE 7 Video Stream N
  • 31. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation31 Alternative 2Alternative 2 –– Image Pipeline for TaskImage Pipeline for Task--levellevel ParallelismParallelism Functions are divided into groups Groups set up in pipelined fashion Each SPE dedicated to only one group Outputs from one group passed SPE-to-SPE as inputs to the next Code is partitioned – Data is not partitioned Efficient utilization of SPEs SPE 0 Video Streams SPE 1 SPE 7
  • 32. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation32 Alternative 3Alternative 3 –– DataData--level Parallelismlevel Parallelism For a given function, each SPE processes part of a frame Not easily applied to all BGS functions 00 00 00 00 00 00 00 07 18 2D 43 5B 74 8D A2 B3 C5 D7 EC FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FA DE C5 AC 94 7B 62 4A 31 1C 0A 00 00 SPE 0 SPE 1 SPE 2 SPE 3
  • 33. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation33 MultiMulti--resolution Subdivisionresolution Subdivision Detail vector Base Mesh Decompose Into 1-rings PPE SPE1 Kernel() SPE0 Kernel() SPE7 Kernel()….. Data-parallel System Memory N 1-rings … … Rn . R2 R1 R0 . D6 D5 D4 D3 D2 D1 D0 Tj . . T6 T5 T4 T3 T2 T1 T0 Store as Ring stream Triangles to GPU GPU Pixel Processing Frame Buffer Vertex Processing Local or network connection Dm
  • 34. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation34 DemonstrationsDemonstrations Raytracing Terrain Rendering (TRE) Cloth Simulation
  • 35. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation35 Demo Platform: Cell Blade PrototypeDemo Platform: Cell Blade Prototype Chassis Blade Deskside Unit Blade Server – Dual Cell Processors (SMP), Support Logic, Memory, Storage – PCI Express 4X option port – BladeCenter Interface ( Based on IBM JS20) – Infiniband 4x (10Gbps) interconnect Chassis – Standard IBM BladeCenter form factor with: – 7 Blades (1 blade/2 slots) – 2 internal switches (1Gb Ethernet) with 4 external ports each – Separate, external Infiniband Switch with optional FC port Software – Linux OS – Tool chain and compilation support – Additional IBM Software Development Tools and Cluster Management software – Client Systems – IBM T41 Thinkpad (1.7 Ghz Pentium-M) – Apple Mac-dual G5@2Ghz Blade BladeCenter Interface PCI-Exp Cell Processor South Bridge Memory Cell Processor South Bridge Memory IB 4X IB 4X HDD (HDD) GbE GbE Rack
  • 36. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation36 Camera Shader Example: Raytracing ArchitectureExample: Raytracing Architecture OpenRT Generate Tiles Spatial Subdivision K-d tree Traversal Material Shading Intersection Testing Generate Rays View transorms Recursive/Adaptive Sorting Spatial Subdivision Structure K-d Tree Per object & top-level System Memory •framebuffer System Memory •Texture Maps •Run-time shaders Tile pixels Triangle-Ray pairs Rays SecondaryRays PU Code Potential execution on SPE SPU code Object Data SPE iPPE If intersect false true
  • 37. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation37 Cell Optimized Ray Caster: Terrain Rendering EngineCell Optimized Ray Caster: Terrain Rendering Engine 30+ frames per second with only one Cell processor30+ frames per second with only one Cell processor –– No graphics adapter assistNo graphics adapter assist –– 1280x720 (HD 720P) resolution1280x720 (HD 720P) resolution HD 1080P at 30+ frames per second via 2 way SMP CellHD 1080P at 30+ frames per second via 2 way SMP Cell Advanced SPE shader functionAdvanced SPE shader function –– Ray/Terrain intersection computationRay/Terrain intersection computation –– Texture FilteringTexture Filtering –– Normal computationNormal computation –– Bump map computationBump map computation –– Diffuse + Ambient lighting modelDiffuse + Ambient lighting model –– Perlin Noise based cloudsPerlin Noise based clouds –– Atmosphere computation (haze, sun, halo)Atmosphere computation (haze, sun, halo) –– Dynamic multiDynamic multi--sampling (4sampling (4 –– 16 samples per pixel)16 samples per pixel) –– Image based input (16 bit height + 16 bit texture)Image based input (16 bit height + 16 bit texture) –– 29 KB of SPE object code29 KB of SPE object code –– 224 KB of SPE local store data224 KB of SPE local store data MM--JPEG compression via SPEJPEG compression via SPE Performance scales linearly with number of available SPEsPerformance scales linearly with number of available SPEs Written completely in C with intrinsicsWritten completely in C with intrinsics Currently up and running on bring up systems!Currently up and running on bring up systems!
  • 38. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation38 Vertical Ray CoherenceVertical Ray Coherence
  • 39. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation39 Algorithm Mapped to Cell
  • 40. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation40 Physically Based Modeling: Alias research prototypePhysically Based Modeling: Alias research prototype Cell implementation of Alias cloth solver – Compute the displacement of 3-D cloth mesh vertices in response to various gravitational, collision and constaints – Each simulation can occur on a separate SPE Render results on commodity graphics adapter – locally or over network Courtesy of Alias Systems™ SPEN . . . input output SPE3 input output SPE2 input output SPE1 input output SPE0 input output System Memory PPE Main SocketMgr Create SPE Treads DMAIN DMAOUT VertexDisplacementsout Position, orientation input Triangle output Forcevectorsin
  • 41. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation41 Cell beyond the PS3Cell beyond the PS3 IBM support for custom systems and applications based on Cell − Through IBM Engineering and Technology Services Planned release (next several months) of architecture, simulator, and compiler to enable Cell software development and evaluation in a large community − Linux OS, compilers, debugger, full system simulator − Open source and IBM alphaworks Prototype system implementations and evaluation of markets beyond console − Cell Processor Based Blades (acceleration and other apps.) − CE devices (HDTV, home media server) − Standard products for embedded applications − Prototype reference designs for smaller systems
  • 42. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation42 SummarySummary Cell ushers in a new era of leading edge processors optimized for digital media and entertainment New levels of performance and power efficiency beyond what is achieved by PC processors Step towards HPC / game processor convergence
  • 43. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation43 AcknowledgementsAcknowledgements Sony Computer Entertainment Toshiba Corporation IBM STI Design Center: Peter Hofstee Barry Minor Mark Nutter Gerhard Stenzel, IBM Boeblingen Development Labs Philipp Slusallek, Saarland University Alias Systems: Joyce Janczyn Francesco Iorio Jos Stam
  • 44. IBM T.J. Watson Research Center Cell Technology for Graphics and Visualization © 2005 IBM Corporation44 Thank youThank you