2. Benchmark Instrumentation 2
OUTLINE
β’ NAS Benchmark Suite
β’ Experiments
β’ Paraver Visualization
β’ Code View
β’ Communication
β’ Disk I/O
β’ Load Balancing
β’ LD1 Cache Miss
β’ Cycles per Instruction (CPI)
β’ Execution Time
β’ Benchmarking Time
β’ Conclusion
3. Benchmark Instrumentation 3
NAS Benchmark Suite
β’ NAS
... is a set of benchmarks.
... evaluates performance of highly parallel supercomputers.
... developed and maintained by NASA Advanced Supercomputing(NAS).
4. Benchmark Instrumentation 4
NAS Benchmark Suite
β’ NAS Kernel Applications
β’ IS - Integer Sort
β’ EP - Embarrassingly Parallel
β’ CG - Conjugate Gradient
β’ MG - Multi-Grid
β’ FT - discrete 3D fast Fourier Transform
β’ Problem Sizes
β’ S : small size
β’ W : workstation size
β’ A, B, C : standart test size; ~4X size in increasing order
β’ D, E, F : large test size; ~16X size in increasing order
5. Benchmark Instrumentation 5
OUTLINE
β’ NAS Benchmark Suite
β’ Experiments
β’ Paraver Visualization
β’ Code View
β’ Communication
β’ Disk I/O
β’ Load Balancing
β’ LD1 Cache Miss
β’ Cycles per Instruction (CPI)
β’ Execution Time
β’ Benchmarking Time
β’ Conclusion
6. Benchmark Instrumentation 6
Experiments
β’ NAS Parallel Benchmark version 3.2.1
β’ IS Kernel Application:
... sorts N keys in parallel.
... tests
β’ integer computation speed
β’ communication perfomance
β’ S Problem Size:
... small for quick test purposes
... has 216 keys
7. Benchmark Instrumentation 7
Experiments
β’ IS Benchmarking Procedure (generally)
1. Generating sequence of N keys
2. Loading N keys into the memory systems
3. Time begins
4. Loop
Sorting & partial verification
5. Time ends
6. Full verification.
9. Benchmark Instrumentation 9
Experiments
Procedure:
β’ Not manually instrumented.
β’ Paraver traces are automatically generated
β’ LD_PRELOAD is exported.
β’ Benchmarks are executed with 2,4,8,16,32, and 64 processors.
β’ Benchmark results are analyzed
β’ Generated traces are examined in paraver tools.
10. Benchmark Instrumentation 10
OUTLINE
β’ NAS Benchmark Suite
β’ Experiments
β’ Paraver Visualization
β’ Code View
β’ Communication
β’ Disk I/O
β’ Load Balancing
β’ LD1 Cache Miss
β’ Cycles per Instruction (CPI)
β’ Execution Time
β’ Benchmarking Time
β’ Conclusion
23. Benchmark Instrumentation 23
OUTLINE
β’ NAS Benchmark Suite
β’ Experiments
β’ Paraver Visualization
β’ Code View
β’ Communication
β’ Disk I/O
β’ Load Balancing
β’ LD1 Cache Miss
β’ Cycles per Instruction (CPI)
β’ Execution Time
β’ Benchmarking Time
β’ Conclusion
24. Benchmark Instrumentation 24
Benchmarking Time - reminder
β’ IS Benchmarking Procedure (generally)
1. Generating sequence of N keys
2. Loading N keys into the memory systems
3. Time begins
4. Loop
Sorting & partial verification
5. Time ends
6. Full verification.
β’ Benchmarking Time = execution time of the parallel
algorithm
27. Benchmark Instrumentation 27
Benchmarking Time
β’ SpeedUp of My Computer
1,2
1
0,8
SpeedUp
0,6
0,4
0,2
0
# of processors
1 2 4 8 16 32 64
28. Benchmark Instrumentation 28
Benchmarking Time
β’ SpeedUp of Boada
7
6
5
4
SpeedUp
3
2
1
0
1 2 4 8 16 32 64 # of processors
29. Benchmark Instrumentation 29
OUTLINE
β’ NAS Benchmark Suite
β’ Experiments
β’ Paraver Visualization
β’ Code View
β’ Communication
β’ Disk I/O
β’ Load Balancing
β’ LD1 Cache Miss
β’ Cycles per Instruction (CPI)
β’ Execution Time
β’ Benchmarking Time
β’ Conclusion
30. Benchmark Instrumentation 30
Conclusion
β’ IS application
β’ ... does not have so much communication.
β’ ... is based on computation and memory loading.
β’ ... has low cache miss and high CPI values in computation phase.
β’ NAS is designed for highly parallel supercomputers.
β’ MyComputer is inadequate to meet requierments of NAS.
β’ MyComputer can not speed up in this application.
β’ Boada can speed up untill number of processors that it has.
β’ Mycomputer saves less time for disk I/O operations.
β’ CPI values in Boadaβ s computation phase less.