SlideShare a Scribd company logo
1 of 41
Getting Back Memory and Performance
Jen Costillo
While you wait, download:
http://tinyurl.com/nha7853
V1.2 release
Why Optimize?
Lower memory -> cheaper BOM
Lower Memory footprint
Low RAM footprint
PerformancePerformance
Bottlenecks
Power considerations
Maintenance
Sometime your compiler can’t do everything.
Sometime you want a challenge
11/15/2015 2Costillo Rebelbot
Things Covered
Basics of: Intro to tools:
Profiling tools
Code Optimization
RAM optimization
Keil
IDA
RAM optimization
Map files
Compiler/Linker
documentation secrets
11/15/2015 3Costillo Rebelbot
Things Not Covered (but are pretty cool)
Virtual memory
Caching for speed
Branch optimization
Processor pipeline considerationsProcessor pipeline considerations
Specifics of a particular processor family
Instead focus on where to find the info to accomplish
the goal
11/15/2015 4Costillo Rebelbot
Performance
Speed of execution
Resolve bottlenecks
S
p
a
c
eResolve bottlenecks
Meeting design guidelines
Meet power consumption
model
Time
e
11/15/2015 5Costillo Rebelbot
Maintenance
Refactoring
Code Hygiene
Primarily an avoidance mechanism
11/15/2015 6Costillo Rebelbot
Lab 0
Install toolchain
Keil
Download project
Github
Install ST Link SW and Driver
STM32L476 Discovery board
Hook Up Scope (Optional- /Utilities/PC SW/Saleae)Hook Up Scope (Optional- /Utilities/PC SW/Saleae)
PB2
PE8
PA0
Make sure it compiles with “LAB0” defined in project options-
> C/C++-> Preprocessor Symbols-> Define
WARNING: Keil may decide additional packages needed
11/15/2015 7Costillo Rebelbot
Build and Load in Keil
Open Workspace:
project menu ->
Open Project. Select
Optimizer.uvprojx
Change Options->Change Options->
C/C++-> Preprocessor
Symbols-> Define
Rebuild / F7
Debug/ Ctrl + F5
Run/ F511/15/2015 8Costillo Rebelbot
Program Structure
MainThread
500ms timer
thrLED
Toggle LED4
thrIsrLED
Sample
trigger
Signal Set
EXTI GPIO
ISR
Signal Set
Joystick Select Press
11/15/2015 9Costillo Rebelbot
Before You Dive In
Don’t optimize too early
Keep a baseline
Profiling
Keep track of memory utilizationKeep track of memory utilization
Leverage tools with compiler tool chain/IDE
Create your own profiling systems
11/15/2015 10Costillo Rebelbot
11/15/2015 11Costillo Rebelbot
Performance Measurements
1. Model – what are you expecting them
to be
a. Tasks
b. RTOS – task switching
c. ISRs
2. Measure
1. Review compiler listings and map files1. Review compiler listings and map files
2. Home-grown profiling tools
3. Leverage simulators
4. IDE/RTOS toolchain tools
3. Modify – basic tools
1. Leverage toolchain intrinsics
2. Utilize compiler optimizer
3. Use Big-O to improve code structure
4. Count and shrink instruction count
5. Assembly based on processor pipeline
knowledge11/15/2015 12Costillo Rebelbot
Measurements Interface Tradeoffs
Type Pro Cons
Logging –based: Human readable
Serial or other live
stream
Happens in “real” time Serial port is slow
Overhead is high
Can disrupt execution order
Circular buffer Faster than serial Need extraction tool
Not reading in real time
Limited data
File system with
extraction tool
Stays until you extract it
Size is limited by allocation
Requires extraction tool
Not reading in real time
HW-based: Execution disruption is low Need to decode for readability
GPIOs Setup is low Potentially high pin count
PWMs Low pin count Overhead can be high
Can be painful without
spectrum analyzer
DAC Low pin count
2^n bits of event levels
Need oscilloscope
11/15/2015 13Costillo Rebelbot
Modification Improvement
Strategies
Type Method Impact
Algorithm efficiency
function
• Review your Big O(n)
• Leverage preprocessor
intrinsics
• Count instructions and
write new code
ROM,
Scale, speed
Code size function ROM, processor pipeline,
write new code
• Utilize optimizer flags
• C/Assembly based on
processor knowledge
Code size function ROM, processor pipeline,
Target size or call
frequency
Memory usage • Leverage compiler
intrinsics
• Utilize optimizer or
linker flags
RAM
stack. heap
Memory location RAM
11/15/2015 14Costillo Rebelbot
Profiling LED Sample Task
MainThread thrLED
thrIsrLED
Sample
triggerSignal SetMainThread
500ms timer
thrLED
Toggle LED4
trigger
EXTI GPIO
ISR
Signal Set
Signal Set
11/15/2015 15Costillo Rebelbot
Lab1 - Make Profiler Module
Go to project Options and in C/C++, change
LAB0 -> LAB1
Load Saleae logic settings in /Saleae (Optional)
Exercise: How to improveExercise: How to improve
Observe:
Measurement accuracy
11/15/2015 16Costillo Rebelbot
Lab2 -Improve Profiler Module
Go to project Options and in C/C++, change
LAB1 -> LAB2
Exercise: Count Instruction Cycles
Observe:
8MHz processor speed
~56us Interrupt Delay (448cycles)~56us Interrupt Delay (448cycles)
~12ms blip (~96k cycles)
11/15/2015 17Costillo Rebelbot
Estimate Instruction Count
Exercise: Estimate the number of
instructions, if you dare
11/15/2015 18Costillo Rebelbot
Intro to reading a map file
Cross reference – location of data/functions
Symbol Table – size in section
Memory Map – memory section
Image Component Sizes – by moduleImage Component Sizes – by module
Callgraph – depth of stack usage *
Summary
*Keil uses a separate .htm file
11/15/2015 19Costillo Rebelbot
Image Breakdown
segment Data type Contains
.text/ .code READ
ONLY
Code Functions,
const, strings
literals and pre-
defined values
.bss/.zinit READ
WRITE
Zero-init
UninitializedWRITE Uninitialized
global static
variables
.data READ
WRITE
Initialized Global
variables,
static variables
STACK READ
WRITE
Call stack
Local function
vars
HEAP READ
WRITE
Malloc()
https://en.wikipedia.org/wiki/Data_segment11/15/2015 20Costillo Rebelbot
Estimate Instruction Count
Exercise: Estimate # number of instructions executed via
profiling and counting instructions. Are they close?
11/15/2015 21Costillo Rebelbot
Estimate Clocks Per Instruction
(CPI)
Method 1:
CPI= Execution Time/
(Instruction Count *
Clock Frequency)
Instruction Cycle Count
HAL_GPIO_TogglePin :
LDR R2, [R0,#0x14] 2
EORS R2, R1 1
(12ms)/(x* 8Mhz) = ???
Hard to count to find
reasonable number ~1-2
Method 2:
CPI = weighted average
of instruction types
EORS R2, R1 1
STR R2, [R0,#0x14] 2
BX LR 1 + (Pipeline
refill 1-3)
TOTAL instructions: 4 Total cycles:
6-8
CPI 1.5 - 2
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html11/15/2015 22Costillo Rebelbot
Instruction Counting versus Time
Counting/CPI Time
Tedious Counting
Non-representative result
Not required in most cases
Look at average execution
times, not instructions
Breakdown only to the levelNot required in most cases
Works best in small
encapsulated functions()
with limited call stack depth
Use when concerned with
nanoseconds.
Breakdown only to the level
of detail required
Best for large procedures
and subsystems
Use when at microsecond
and millisecond scope
11/15/2015 23Costillo Rebelbot
Lab2B –More Granularity
Go to project Options and in C/C++, change
LAB2-> LAB2B
Exercise:
Determine where processing time is spent
Toggle on each f() call in task
Observe:Observe:
LCD calls are long
LCD
Clear
LCD Display
11/15/2015 24Costillo Rebelbot
11/15/2015 25Costillo Rebelbot
Now for something interesting
thrIsrLED() includes:
Creates sliding averaging
window structure
Collect Gyro data sample with
thrIsrLED
Sample
trigger
Collect Gyro data sample with
each button press.
Calculates magnitude of the 3
axis – just for fun.
Adds it to a sliding averaging
window
Prints out average of the
window on LCD screen.
EXTI GPIO
ISR
Signal Set
11/15/2015 26Costillo Rebelbot
Lab 3 – Measure Data Processing
Time
Go to project Options and in C/C++, change LAB2 ->
LAB3
Exercise:
Measure code in terms of time and size
Utilize compiler listings under IDAUtilize compiler listings under IDA
Observations:
Profiler time goes up (8ms-14ms 11ms -15ms)
Algorithm choices are poor
11/15/2015 27Costillo Rebelbot
Bytes By the Numbers
Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5
Main.o 868
+124+88
778 +62
+88
822 + 72 +
88
1008 +
136 + 88
thrIsrLED() 18 24 54 140
Slidingwindow.o 154
SWInsert&Ave() 92
Code
Total 27572 27576 27648 27988
Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5
Main.o 21 21 28 21 +224
thrIsrLED() 224
Slidingwindow.o 0
Total 240 240 244 240
RW Data
11/15/2015 28Costillo Rebelbot
Using IDA
Select “New”
disassemble a new file.
Open “Optimizer.axf”
Change Processor TypeChange Processor Type
to ARM Little Endian
Select “Ok” and “yes”
for everything else
11/15/2015 29Costillo Rebelbot
LAB 3- thrIsrLEDER_IROM1:08005438 thrIsrLED
……
ER_IROM1:08005446 loc_8005446 ; CODE XREF:
thrIsrLED+8Aj
ER_IROM1:08005446 MOV.W R2, #0xFFFFFFFF ; millisec
ER_IROM1:0800544A MOVS R1, #1 ; signals
ER_IROM1:0800544C MOV R0, SP ; retstr
ER_IROM1:0800544E BL osSignalWait
ER_IROM1:08005452 ADD R0, SP, #0x30+GyroBuffer ; pfData
ER_IROM1:08005454 BL BSP_GYRO_GetXYZ
ER_IROM1:08005458 VLDR S0, [SP,#0x30+GyroBuffer]
ER_IROM1:0800545C VMUL.F32 S0, S0, S0
ER_IROM1:08005460 VCVTR.S32.F32 S0, S0
ER_IROM1:08005464 VMOV R1, S0
ER_IROM1:08005468 VLDR S0, [SP,#0x30+GyroBuffer+4]
void thrIsrLED(void const *argument) {
uint8_t strbuff[20];
SlidingWindow16Init(&average_gyro,AVERAGE_WINDOW_SIZ
E, windowbuffer );
float GyroBuffer[3];
for (;;) {
osSignalWait(0x0001, osWaitForever);
//1. Get sample and calculate the magnitude
BSP_GYRO_GetXYZ(GyroBuffer);
int16_t sample =
MAGNITUDE_3AXIS(GyroBuffer[AXIS_DIR__X],GyroBuff
er[AXIS_DIR__Y],GyroBuffer[AXIS_DIR__Z]);
ER_IROM1:08005468 VLDR S0, [SP,#0x30+GyroBuffer+4]
ER_IROM1:0800546C VMUL.F32 S0, S0, S0
ER_IROM1:08005470 VCVTR.S32.F32 S0, S0
ER_IROM1:08005474 VMOV R2, S0
ER_IROM1:08005478 ADD R2, R1
ER_IROM1:0800547A VLDR S0, [SP,#0x30+GyroBuffer+8]
ER_IROM1:0800547E VMUL.F32 S0, S0, S0
ER_IROM1:08005482 VCVTR.S32.F32 S0, S0
ER_IROM1:08005486 VMOV R1, S0
ER_IROM1:0800548A ADDS R0, R2, R1 ; val
ER_IROM1:0800548C BL sq_rt
ER_IROM1:08005490 SXTH R4, R0
ER_IROM1:08005492 MOV R1, R4 ; sample
ER_IROM1:08005494 LDR R0, =average_gyro ; pwindow
ER_IROM1:08005496 BL
SlidingWindowInsertSampleAndUpdateAverage16
ER_IROM1:0800549A MOVS R0, #0
……
ER_IROM1:080054C2 B loc_8005446
//2. Process sample
SlidingWindowInsertSampleAndUpdateAverage16(&aver
age_gyro, sample);
int16_t ave = 0;
SlidingWindowGetAverage16(&average_gyro, &ave);
//3. Print Average result
BSP_LCD_GLASS_Clear();
/* Get the current menu */
sprintf((char *)strbuff, "%d", ave);
BSP_LCD_GLASS_DisplayString((uint8_t *)strbuff);
LED5_PROFILE__STOP;
}
}
11/15/2015 30Costillo Rebelbot
Revisit Estimate Instruction Count
Exercise: Is utilizing IDA to count instructions
effective?
11/15/2015 31Costillo Rebelbot
Quick Tips
Big O notation matters:
Iterations improvements
pay off big
Most compilers are smart
Math Consideration:
Data type matters
Division becomes >>
operations.
Use powers of 2 for bufferMost compilers are smart
enough to take optimize if
you tell them.
Use powers of 2 for buffer
sizes on averaging windows.
Skip % operations. They
usually become some
version of / and are
expensive.
While/subtraction loops
can be faster in some cases.
Pow(), sqrt(), and math.h
are expensive. Focus on
“good enough”
NOTE: some of these will
appear in the next labs
11/15/2015 32Costillo Rebelbot
11/15/2015 33Costillo Rebelbot
RAM Optimization
Symptoms Solutions
You are out of space and can’t
link
Keep blowing your
Look at your local variables
inside functions.
Remove debug helperKeep blowing your
stack/heap (i.e. things are
suddenly in the weeds or
weird values)
malloc() keeps failing.
Remove debug helper
variables.
Reduce stack size if possible
Alter your memory map
Inputs on stack versus send
pointer to struct
Referencing globals and
static variables
11/15/2015 34Costillo Rebelbot
Lab 4 – Lower RAM footprint
Go to project Options and in C/C++, change
LAB3 -> LAB4
Exercise:
Role of global, static, and local variables in RAMRole of global, static, and local variables in RAM
footprint- what happens as they shift attributions?
Tradeoffs in hiding variables in the call stacks
Select the right stack size for your task
Observation:
Smaller .DATA segment
Decreased algorithm size
11/15/2015 35Costillo Rebelbot
Bytes By the Numbers
Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5
Main.o 868
+124+88
874+124
+88
880 + 124 +
88
1008 +
136 + 88
1000 +
128+ 88
thrIsrLED() 18 24 54 140 140
Slidingwindow.o 154 146
SWInsert&Ave() 92 84
Code
SWInsert&Ave() 92 84
Total 27572 27576 27648 27988 27972
Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5
Main.o 21 21 28 21 +224 21
thrIsrLED() 224 Stack!
Slidingwindow.o 0 0
Total 240 240 244 240 240
RW Data
11/15/2015 36Costillo Rebelbot
Deeper Code Space Optimization
Reduce number of instructions
Use listing file
Check stack and heap usage *
Use compiler flagsUse compiler flags
Use processor intrinsics
11/15/2015 37Costillo Rebelbot
Lab 5 – More Code Space
Optimizer Through Toolchain
Go to project Options and in C/C++, change
LAB4 -> LAB5
Exercise:
Optimize only SlidingWindowAverage() with intrinsics.Optimize only SlidingWindowAverage() with intrinsics.
Use smarter math operation selections. Is there a trade
off?
Observation:
Check current size on listing
11/15/2015 38Costillo Rebelbot
Bytes By the Numbers
Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5
Main.o 868
+124+88
874+124
+88
880 + 124 +
88
1008 +
136 + 88
1000 +
128+ 88
1000 +
130 + 88
thrIsrLED() 18 24 54 140 140 138
Slidingwindow.o 154 146 136
SWInsert&Ave() 92 84 72
Code
Total 27572 27576 27648 27988 27972 27960
RW Data
Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5
Main.o 21 21 28 21 +224 21 21
thrIsrLED() 224 Stack! Stack!
Slidingwindow.
o
0 0 0
Total 240 240 244 240 240 240
11/15/2015 39Costillo Rebelbot
Open Lab - Things to try
Unroll loop (need to standardize
buffer size)
Write a better squareroot function
using a lookup table.
Actually turn on Optimizer for
• #pragma unroll [(n)]
• Hint: Look up table
Actually turn on Optimizer for
space or speed
Check call graph for deepest stack
usage
Turn on RTOS run time stat feature
in FreeRTOSconfig.h file
Customize map file
Who can get the MOST EFFICIENT
CODE?
• Type –Os3
• Optimizer.htm
• configGENERATE_RUN_TI
ME_STATS
• HINT: remove PADDING
11/15/2015 40Costillo Rebelbot
@rebelbotJen
@rebelbots
www.rebelbot.com

More Related Content

What's hot

Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Intel® Software
 
Pjproject su Android: uno scontro su più livelli
Pjproject su Android: uno scontro su più livelliPjproject su Android: uno scontro su più livelli
Pjproject su Android: uno scontro su più livelliGiacomo Bergami
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsIntel® Software
 
Analog for all_preview
Analog for all_previewAnalog for all_preview
Analog for all_previewAnand Udupa
 
Brief Introduction To Teseda
Brief Introduction To TesedaBrief Introduction To Teseda
Brief Introduction To TesedaRhokanson
 
Android Internals (This is not the droid you’re loking for...)
Android Internals (This is not the droid you’re loking for...)Android Internals (This is not the droid you’re loking for...)
Android Internals (This is not the droid you’re loking for...)Giacomo Bergami
 

What's hot (8)

Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
 
Pjproject su Android: uno scontro su più livelli
Pjproject su Android: uno scontro su più livelliPjproject su Android: uno scontro su più livelli
Pjproject su Android: uno scontro su più livelli
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor Graphics
 
Analog for all_preview
Analog for all_previewAnalog for all_preview
Analog for all_preview
 
Accelerated Android Development with Linaro
Accelerated Android Development with LinaroAccelerated Android Development with Linaro
Accelerated Android Development with Linaro
 
Synopsys jul1411
Synopsys jul1411Synopsys jul1411
Synopsys jul1411
 
Brief Introduction To Teseda
Brief Introduction To TesedaBrief Introduction To Teseda
Brief Introduction To Teseda
 
Android Internals (This is not the droid you’re loking for...)
Android Internals (This is not the droid you’re loking for...)Android Internals (This is not the droid you’re loking for...)
Android Internals (This is not the droid you’re loking for...)
 

Similar to Squeezing Blood From a Stone V1.2

Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application MonitoringRavi Okade
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Works on My Machine Syndrome
Works on My Machine SyndromeWorks on My Machine Syndrome
Works on My Machine SyndromeKamran Bilgrami
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesIntel® Software
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceSergey Arkhipov
 
Closed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma CollinsClosed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma CollinsLiz Warner
 
Closed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma CollinsClosed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma CollinsLiz Warner
 
TI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBootTI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBootandrewmurraympc
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
Larson and toubro
Larson and toubroLarson and toubro
Larson and toubroanoopc1998
 
Performance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL ModelsPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL ModelsSpace Codesign
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on EmulatorsDVClub
 
Using GitLab CI
Using GitLab CIUsing GitLab CI
Using GitLab CILingvokot
 
Using GitLab CI
Using GitLab CIUsing GitLab CI
Using GitLab CIColCh
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Community
 

Similar to Squeezing Blood From a Stone V1.2 (20)

Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application Monitoring
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Works on My Machine Syndrome
Works on My Machine SyndromeWorks on My Machine Syndrome
Works on My Machine Syndrome
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android Devices
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python Performance
 
Closed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma CollinsClosed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma Collins
 
Closed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma CollinsClosed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma Collins
 
TI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBootTI TechDays 2010: swiftBoot
TI TechDays 2010: swiftBoot
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Larson and toubro
Larson and toubroLarson and toubro
Larson and toubro
 
Performance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL ModelsPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Using GitLab CI
Using GitLab CIUsing GitLab CI
Using GitLab CI
 
Using GitLab CI
Using GitLab CIUsing GitLab CI
Using GitLab CI
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 

Recently uploaded

Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 

Recently uploaded (20)

Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 

Squeezing Blood From a Stone V1.2

  • 1. Getting Back Memory and Performance Jen Costillo While you wait, download: http://tinyurl.com/nha7853 V1.2 release
  • 2. Why Optimize? Lower memory -> cheaper BOM Lower Memory footprint Low RAM footprint PerformancePerformance Bottlenecks Power considerations Maintenance Sometime your compiler can’t do everything. Sometime you want a challenge 11/15/2015 2Costillo Rebelbot
  • 3. Things Covered Basics of: Intro to tools: Profiling tools Code Optimization RAM optimization Keil IDA RAM optimization Map files Compiler/Linker documentation secrets 11/15/2015 3Costillo Rebelbot
  • 4. Things Not Covered (but are pretty cool) Virtual memory Caching for speed Branch optimization Processor pipeline considerationsProcessor pipeline considerations Specifics of a particular processor family Instead focus on where to find the info to accomplish the goal 11/15/2015 4Costillo Rebelbot
  • 5. Performance Speed of execution Resolve bottlenecks S p a c eResolve bottlenecks Meeting design guidelines Meet power consumption model Time e 11/15/2015 5Costillo Rebelbot
  • 6. Maintenance Refactoring Code Hygiene Primarily an avoidance mechanism 11/15/2015 6Costillo Rebelbot
  • 7. Lab 0 Install toolchain Keil Download project Github Install ST Link SW and Driver STM32L476 Discovery board Hook Up Scope (Optional- /Utilities/PC SW/Saleae)Hook Up Scope (Optional- /Utilities/PC SW/Saleae) PB2 PE8 PA0 Make sure it compiles with “LAB0” defined in project options- > C/C++-> Preprocessor Symbols-> Define WARNING: Keil may decide additional packages needed 11/15/2015 7Costillo Rebelbot
  • 8. Build and Load in Keil Open Workspace: project menu -> Open Project. Select Optimizer.uvprojx Change Options->Change Options-> C/C++-> Preprocessor Symbols-> Define Rebuild / F7 Debug/ Ctrl + F5 Run/ F511/15/2015 8Costillo Rebelbot
  • 9. Program Structure MainThread 500ms timer thrLED Toggle LED4 thrIsrLED Sample trigger Signal Set EXTI GPIO ISR Signal Set Joystick Select Press 11/15/2015 9Costillo Rebelbot
  • 10. Before You Dive In Don’t optimize too early Keep a baseline Profiling Keep track of memory utilizationKeep track of memory utilization Leverage tools with compiler tool chain/IDE Create your own profiling systems 11/15/2015 10Costillo Rebelbot
  • 12. Performance Measurements 1. Model – what are you expecting them to be a. Tasks b. RTOS – task switching c. ISRs 2. Measure 1. Review compiler listings and map files1. Review compiler listings and map files 2. Home-grown profiling tools 3. Leverage simulators 4. IDE/RTOS toolchain tools 3. Modify – basic tools 1. Leverage toolchain intrinsics 2. Utilize compiler optimizer 3. Use Big-O to improve code structure 4. Count and shrink instruction count 5. Assembly based on processor pipeline knowledge11/15/2015 12Costillo Rebelbot
  • 13. Measurements Interface Tradeoffs Type Pro Cons Logging –based: Human readable Serial or other live stream Happens in “real” time Serial port is slow Overhead is high Can disrupt execution order Circular buffer Faster than serial Need extraction tool Not reading in real time Limited data File system with extraction tool Stays until you extract it Size is limited by allocation Requires extraction tool Not reading in real time HW-based: Execution disruption is low Need to decode for readability GPIOs Setup is low Potentially high pin count PWMs Low pin count Overhead can be high Can be painful without spectrum analyzer DAC Low pin count 2^n bits of event levels Need oscilloscope 11/15/2015 13Costillo Rebelbot
  • 14. Modification Improvement Strategies Type Method Impact Algorithm efficiency function • Review your Big O(n) • Leverage preprocessor intrinsics • Count instructions and write new code ROM, Scale, speed Code size function ROM, processor pipeline, write new code • Utilize optimizer flags • C/Assembly based on processor knowledge Code size function ROM, processor pipeline, Target size or call frequency Memory usage • Leverage compiler intrinsics • Utilize optimizer or linker flags RAM stack. heap Memory location RAM 11/15/2015 14Costillo Rebelbot
  • 15. Profiling LED Sample Task MainThread thrLED thrIsrLED Sample triggerSignal SetMainThread 500ms timer thrLED Toggle LED4 trigger EXTI GPIO ISR Signal Set Signal Set 11/15/2015 15Costillo Rebelbot
  • 16. Lab1 - Make Profiler Module Go to project Options and in C/C++, change LAB0 -> LAB1 Load Saleae logic settings in /Saleae (Optional) Exercise: How to improveExercise: How to improve Observe: Measurement accuracy 11/15/2015 16Costillo Rebelbot
  • 17. Lab2 -Improve Profiler Module Go to project Options and in C/C++, change LAB1 -> LAB2 Exercise: Count Instruction Cycles Observe: 8MHz processor speed ~56us Interrupt Delay (448cycles)~56us Interrupt Delay (448cycles) ~12ms blip (~96k cycles) 11/15/2015 17Costillo Rebelbot
  • 18. Estimate Instruction Count Exercise: Estimate the number of instructions, if you dare 11/15/2015 18Costillo Rebelbot
  • 19. Intro to reading a map file Cross reference – location of data/functions Symbol Table – size in section Memory Map – memory section Image Component Sizes – by moduleImage Component Sizes – by module Callgraph – depth of stack usage * Summary *Keil uses a separate .htm file 11/15/2015 19Costillo Rebelbot
  • 20. Image Breakdown segment Data type Contains .text/ .code READ ONLY Code Functions, const, strings literals and pre- defined values .bss/.zinit READ WRITE Zero-init UninitializedWRITE Uninitialized global static variables .data READ WRITE Initialized Global variables, static variables STACK READ WRITE Call stack Local function vars HEAP READ WRITE Malloc() https://en.wikipedia.org/wiki/Data_segment11/15/2015 20Costillo Rebelbot
  • 21. Estimate Instruction Count Exercise: Estimate # number of instructions executed via profiling and counting instructions. Are they close? 11/15/2015 21Costillo Rebelbot
  • 22. Estimate Clocks Per Instruction (CPI) Method 1: CPI= Execution Time/ (Instruction Count * Clock Frequency) Instruction Cycle Count HAL_GPIO_TogglePin : LDR R2, [R0,#0x14] 2 EORS R2, R1 1 (12ms)/(x* 8Mhz) = ??? Hard to count to find reasonable number ~1-2 Method 2: CPI = weighted average of instruction types EORS R2, R1 1 STR R2, [R0,#0x14] 2 BX LR 1 + (Pipeline refill 1-3) TOTAL instructions: 4 Total cycles: 6-8 CPI 1.5 - 2 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html11/15/2015 22Costillo Rebelbot
  • 23. Instruction Counting versus Time Counting/CPI Time Tedious Counting Non-representative result Not required in most cases Look at average execution times, not instructions Breakdown only to the levelNot required in most cases Works best in small encapsulated functions() with limited call stack depth Use when concerned with nanoseconds. Breakdown only to the level of detail required Best for large procedures and subsystems Use when at microsecond and millisecond scope 11/15/2015 23Costillo Rebelbot
  • 24. Lab2B –More Granularity Go to project Options and in C/C++, change LAB2-> LAB2B Exercise: Determine where processing time is spent Toggle on each f() call in task Observe:Observe: LCD calls are long LCD Clear LCD Display 11/15/2015 24Costillo Rebelbot
  • 26. Now for something interesting thrIsrLED() includes: Creates sliding averaging window structure Collect Gyro data sample with thrIsrLED Sample trigger Collect Gyro data sample with each button press. Calculates magnitude of the 3 axis – just for fun. Adds it to a sliding averaging window Prints out average of the window on LCD screen. EXTI GPIO ISR Signal Set 11/15/2015 26Costillo Rebelbot
  • 27. Lab 3 – Measure Data Processing Time Go to project Options and in C/C++, change LAB2 -> LAB3 Exercise: Measure code in terms of time and size Utilize compiler listings under IDAUtilize compiler listings under IDA Observations: Profiler time goes up (8ms-14ms 11ms -15ms) Algorithm choices are poor 11/15/2015 27Costillo Rebelbot
  • 28. Bytes By the Numbers Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5 Main.o 868 +124+88 778 +62 +88 822 + 72 + 88 1008 + 136 + 88 thrIsrLED() 18 24 54 140 Slidingwindow.o 154 SWInsert&Ave() 92 Code Total 27572 27576 27648 27988 Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5 Main.o 21 21 28 21 +224 thrIsrLED() 224 Slidingwindow.o 0 Total 240 240 244 240 RW Data 11/15/2015 28Costillo Rebelbot
  • 29. Using IDA Select “New” disassemble a new file. Open “Optimizer.axf” Change Processor TypeChange Processor Type to ARM Little Endian Select “Ok” and “yes” for everything else 11/15/2015 29Costillo Rebelbot
  • 30. LAB 3- thrIsrLEDER_IROM1:08005438 thrIsrLED …… ER_IROM1:08005446 loc_8005446 ; CODE XREF: thrIsrLED+8Aj ER_IROM1:08005446 MOV.W R2, #0xFFFFFFFF ; millisec ER_IROM1:0800544A MOVS R1, #1 ; signals ER_IROM1:0800544C MOV R0, SP ; retstr ER_IROM1:0800544E BL osSignalWait ER_IROM1:08005452 ADD R0, SP, #0x30+GyroBuffer ; pfData ER_IROM1:08005454 BL BSP_GYRO_GetXYZ ER_IROM1:08005458 VLDR S0, [SP,#0x30+GyroBuffer] ER_IROM1:0800545C VMUL.F32 S0, S0, S0 ER_IROM1:08005460 VCVTR.S32.F32 S0, S0 ER_IROM1:08005464 VMOV R1, S0 ER_IROM1:08005468 VLDR S0, [SP,#0x30+GyroBuffer+4] void thrIsrLED(void const *argument) { uint8_t strbuff[20]; SlidingWindow16Init(&average_gyro,AVERAGE_WINDOW_SIZ E, windowbuffer ); float GyroBuffer[3]; for (;;) { osSignalWait(0x0001, osWaitForever); //1. Get sample and calculate the magnitude BSP_GYRO_GetXYZ(GyroBuffer); int16_t sample = MAGNITUDE_3AXIS(GyroBuffer[AXIS_DIR__X],GyroBuff er[AXIS_DIR__Y],GyroBuffer[AXIS_DIR__Z]); ER_IROM1:08005468 VLDR S0, [SP,#0x30+GyroBuffer+4] ER_IROM1:0800546C VMUL.F32 S0, S0, S0 ER_IROM1:08005470 VCVTR.S32.F32 S0, S0 ER_IROM1:08005474 VMOV R2, S0 ER_IROM1:08005478 ADD R2, R1 ER_IROM1:0800547A VLDR S0, [SP,#0x30+GyroBuffer+8] ER_IROM1:0800547E VMUL.F32 S0, S0, S0 ER_IROM1:08005482 VCVTR.S32.F32 S0, S0 ER_IROM1:08005486 VMOV R1, S0 ER_IROM1:0800548A ADDS R0, R2, R1 ; val ER_IROM1:0800548C BL sq_rt ER_IROM1:08005490 SXTH R4, R0 ER_IROM1:08005492 MOV R1, R4 ; sample ER_IROM1:08005494 LDR R0, =average_gyro ; pwindow ER_IROM1:08005496 BL SlidingWindowInsertSampleAndUpdateAverage16 ER_IROM1:0800549A MOVS R0, #0 …… ER_IROM1:080054C2 B loc_8005446 //2. Process sample SlidingWindowInsertSampleAndUpdateAverage16(&aver age_gyro, sample); int16_t ave = 0; SlidingWindowGetAverage16(&average_gyro, &ave); //3. Print Average result BSP_LCD_GLASS_Clear(); /* Get the current menu */ sprintf((char *)strbuff, "%d", ave); BSP_LCD_GLASS_DisplayString((uint8_t *)strbuff); LED5_PROFILE__STOP; } } 11/15/2015 30Costillo Rebelbot
  • 31. Revisit Estimate Instruction Count Exercise: Is utilizing IDA to count instructions effective? 11/15/2015 31Costillo Rebelbot
  • 32. Quick Tips Big O notation matters: Iterations improvements pay off big Most compilers are smart Math Consideration: Data type matters Division becomes >> operations. Use powers of 2 for bufferMost compilers are smart enough to take optimize if you tell them. Use powers of 2 for buffer sizes on averaging windows. Skip % operations. They usually become some version of / and are expensive. While/subtraction loops can be faster in some cases. Pow(), sqrt(), and math.h are expensive. Focus on “good enough” NOTE: some of these will appear in the next labs 11/15/2015 32Costillo Rebelbot
  • 34. RAM Optimization Symptoms Solutions You are out of space and can’t link Keep blowing your Look at your local variables inside functions. Remove debug helperKeep blowing your stack/heap (i.e. things are suddenly in the weeds or weird values) malloc() keeps failing. Remove debug helper variables. Reduce stack size if possible Alter your memory map Inputs on stack versus send pointer to struct Referencing globals and static variables 11/15/2015 34Costillo Rebelbot
  • 35. Lab 4 – Lower RAM footprint Go to project Options and in C/C++, change LAB3 -> LAB4 Exercise: Role of global, static, and local variables in RAMRole of global, static, and local variables in RAM footprint- what happens as they shift attributions? Tradeoffs in hiding variables in the call stacks Select the right stack size for your task Observation: Smaller .DATA segment Decreased algorithm size 11/15/2015 35Costillo Rebelbot
  • 36. Bytes By the Numbers Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5 Main.o 868 +124+88 874+124 +88 880 + 124 + 88 1008 + 136 + 88 1000 + 128+ 88 thrIsrLED() 18 24 54 140 140 Slidingwindow.o 154 146 SWInsert&Ave() 92 84 Code SWInsert&Ave() 92 84 Total 27572 27576 27648 27988 27972 Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5 Main.o 21 21 28 21 +224 21 thrIsrLED() 224 Stack! Slidingwindow.o 0 0 Total 240 240 244 240 240 RW Data 11/15/2015 36Costillo Rebelbot
  • 37. Deeper Code Space Optimization Reduce number of instructions Use listing file Check stack and heap usage * Use compiler flagsUse compiler flags Use processor intrinsics 11/15/2015 37Costillo Rebelbot
  • 38. Lab 5 – More Code Space Optimizer Through Toolchain Go to project Options and in C/C++, change LAB4 -> LAB5 Exercise: Optimize only SlidingWindowAverage() with intrinsics.Optimize only SlidingWindowAverage() with intrinsics. Use smarter math operation selections. Is there a trade off? Observation: Check current size on listing 11/15/2015 38Costillo Rebelbot
  • 39. Bytes By the Numbers Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5 Main.o 868 +124+88 874+124 +88 880 + 124 + 88 1008 + 136 + 88 1000 + 128+ 88 1000 + 130 + 88 thrIsrLED() 18 24 54 140 140 138 Slidingwindow.o 154 146 136 SWInsert&Ave() 92 84 72 Code Total 27572 27576 27648 27988 27972 27960 RW Data Component Lab0 Lab1 Lab2 Lab3 Lab4 Lab5 Main.o 21 21 28 21 +224 21 21 thrIsrLED() 224 Stack! Stack! Slidingwindow. o 0 0 0 Total 240 240 244 240 240 240 11/15/2015 39Costillo Rebelbot
  • 40. Open Lab - Things to try Unroll loop (need to standardize buffer size) Write a better squareroot function using a lookup table. Actually turn on Optimizer for • #pragma unroll [(n)] • Hint: Look up table Actually turn on Optimizer for space or speed Check call graph for deepest stack usage Turn on RTOS run time stat feature in FreeRTOSconfig.h file Customize map file Who can get the MOST EFFICIENT CODE? • Type –Os3 • Optimizer.htm • configGENERATE_RUN_TI ME_STATS • HINT: remove PADDING 11/15/2015 40Costillo Rebelbot