Time Series Foundation Models - current state and future directions
Symposium on HPC Applications – IIT Kanpur
1. A review of power & energy
consumption optimization in HPC
Rishi Pathak
riship@cdac.in
National PARAM Supercomputing Facility, C-DAC, Pune
Symposium on HPC Applications – IIT Kanpur
March 12 - 14, 2012
5. Exascale system
• Likely to be feasible by 2017±2
• 10-100 Million processing elements (cores or mini-
cores)
• Chips perhaps as dense as 1,000 cores per socket
• Clock rates will grow more slowly
• Large-scale optics based interconnects
• 10-100 PB of aggregate memory
• Performance per watt ~ 100 GF/watt sustained
performance
• 10 – 100 MW Exascale system
6. Power & Energy
E=P*T
Energy(E) consumed in time(T) with average
power(P)
Minimizing time interval will limit energy
A minimum value of T for an application
Mapping of application to cluster system
Scalability & system bottlenecks
Beyond that – Power management approaches
7. Power management techniques
Static Power Management(SPM)
Low power CPUs
Local flash storage
Suitable for data centric applications
Dynamic Power Management(DPM)
Software & power scalable components
Dynamically adjust power consumption
Frequency & Voltage scaling for CPU & memory
8. DVFS
Dynamic Voltage & Frequency Scaling
P = C * V2 * f
Throttling when
Workload is not CPU bound
Is not much CPU intensive
9. DVFS Scheduling
Off-line, trace-based scheduling
Source code instrumentation for performance profiling
Execution with profiling
Determination of appropriate processor frequencies for
each phase
Source code instrumentation for DVFS scheduling
S. Huang & W. Feng – Proc. Cluster computing[IEEE/ACM](2009)
10. DVFS Scheduling
Run-time, profiling-based scheduling
Time-window based performance prediction model
No a priori information of application phases
False prediction will have dire consequences for performance
or energy efficiency
Metrics
MIPS & CPU utilization
Interception of MPI communication calls
File I/O calls
MPI receive wait cycles
Shown to reduce energy with pre-specified performance loss
constraint
11. DVFS Implementations
Memory MISER (Management Infra-Structure for Enerygy
Reduction)
CPU MISER
Linux CPUSPEED
Ecod
Beta-Algorithm
M. E. Tolentino, J. Turner & K. W. Cameron – Proc. of the 4th international conference
on Computing frontiers(2007)
S. Huang & W. Feng – Proc. Cluster computing[IEEE/ACM](2009)
C. Hsu & W. Feng - Proc. of the 2005 ACM/IEEE conference on Supercomputing
12. Enhancements in DVFS
Dynamic Frequency Scaling per Core
Each core runs at its own clock
Power is linear with frequency
Power savings are relatively small
Separate power planes for the core and "uncore" part
of the CPU
Cores can go to sleep (C-state)
Memory controller is still operational for external device
(e.g. via DMA)
13. Enhancements in DVFS
Clock gating
Clock disabled sleep state (AMD-C1,E1, Intel-
C[0,1,3,6])
At the CPU block level
At the core level
Reduces dynamic power
Power Gating
Power to CPU/core cut off (~0V)
Reduces both dynamic and static(leakage) power
16. Power optimization at NPSF
Scheduler capable of :
Power off a node after a pre specified state of idleness(no
job)
Power optimization with QOS(turnaround time)
Node power on time(2-3 min) is additional
Targeted power policies
Aggressive optimization w/o regard to QOS
Power capping
Power budget
17. Power optimization at NPSF
Node packing via checkpointing, migration & restart
MPI with BLCR – one approach
Use of virtualization – another approach
Considerations –
Remaining walltime of job being migrated
Remaining walltime of jobs on node in consideration
Associated cost of migration against power savings expected to
be achieved
20. Simulation Results - Table
Parameter Case Case I Case II Case III
Power saving (in percentage) 4.05 4.22 9.29
NODEIDLEPOWERTHRESHOLD 8 6 4
(In minutes)
21. Power optimization at NPSF
Feedback driven policy engine
Speculative power on/off of nodes at any given time
Metrics/deciding factors
Function of Jobs arrival time & resource requirements
How many nodes at what time
Current and probable cluster utilization at given time – another
metric
Expected starttime of jobs in queue
Minimize impact on turnaround time of job