Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
1. Can I use Neural Engine
to run my neural networks
on A11 devices?
Koan-Sin Tan
freedom@computer.org
Hsinch Coding Serfs Meeting, Nov, 2018
2. https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-
silicon-secrets/5
• AnandTech is one of my favorite tech sites. Usually, they provides
good technical analysis
• E.g., Apple’s CPUs
• cache sizes
• execution units
• various instruction latency
• Not good enough for NN accelerators on mobile phones
• floating-point VGG16, Inception V3, and ResNet34?
• come on, are you still in Neolithic era?
ANE on A12, how about A11?
3. Why I said VGG16 is
Neolithic Era
• Lightweight models are there
• MobileNet V1 could have roughly
the same top-1 accuracy event
with quantized uint8
• MobileNet V2 could have better
top-1 accuracy
• Mnasnet could be better than
MobileNet V1
• Classification, object detection,
segmentation, etc.
• 8-bit quantization are good enough
for many cases
https://github.com/tensorflow/models/raw/master/research/slim/nets/mobilenet/
madds_top1_accuracy.png
4. How to use Neural Engine
• According to Apple:
• A11: 600 G ops per second, A12: 5 T ops per second
• Yes, by default, it's enabled on A12 device. If you have pre-iOS 10.12 apps built on top of Core ML, they
should be able to use it automatically. But, not on A11 devices.
• How to verify it?
• MLConfiguration [1]: instance variable
@property(readwrite) MLComputeUnits computeUnits;
• there is usesCPUOnly for VNRequest in iOS11, but not something like MLComputUnits
• See my example [2]
[1] https://developer.apple.com/documentation/coreml/mlmodelconfiguration?language=objc
[2] https://github.com/freedomtan/coremlbenchmark/
5. Why not VNRequest?
• Since I mentioned VNRequest in Vision.framework, why not VNCoreMLRequest?
• Yes, I wrote simple VNCoreMLRequest based app before. Both Swift and objective-c
ones [1][2].
• Simplified interface and image crop and scale for you.
• Yes, image operations time.
• This actually reminds us an important system software issue.
• Modern cellphone SoCs use DVFS and all kinds of energy-saving techniques
extensively. How can use get good performance?
• Inference with camera on is usually faster than with camera off!!!
[1] https://github.com/freedomtan/SimpleInceptionV3/
[2] https://github.com/freedomtan/SimpleInceptionV3-ObjC
6. Neural Engine in Action
• H11ANESevicesThread
• A12 is for iPhone11,x
• No H10ANEServicesThread
• So, who started
H11ANEServicesThread? There is no
anything named H11 in /System/
Library/Frameworks/
CoreML.framework/CoreML
• It seems it’s in /System/Library/
PrivateFrameworks/
ANEServices.framework/
ANEServices
• A12 devices only
14. Mach-O Headers
• Mac OS X ABI Mach-O File Format Reference, no longer
available on Apple web site, google it.
• headers: /usr/include/mach-o/loader.h
• objc runtime
• https://opensource.apple.com/source/objc4/
objc4-723/, https://opensource.apple.com/tarballs/
objc4/objc4-723.tar.gz
15. Dive a bit deeper into Core
ML
• Frameworks and some binaries used to be shipped unstripped as parts of iPhoneOS
SDK in Xcode. Not anymore, most framework binaries are in dyld_shared_cache.
• Fortunately, It’s quite easy to check iOS file system nowadays. Apple stopped encrypting
.ipsw since iOS 10 beta (more than 2 years ago). So, get a .ipsw, unzip it (remember it's
a .zip file), then mount the largest .dmg (this needs extra steps on Windows and Linux
though). E.g.,
1. get iOS 12.0 ipsw for iPhone Xs Max [1]. See [2] for other firmwares.
2. unzip it.
3. mount 048-10782-224.dmg, that's it. You can see the whole filesystem used by
iPhone Xs Max.
• Thus, we can get /System/Library/Caches/com.apple.dyld/
dyld_shared_cache_arm* we want
[1] http://updates-http.cdn-apple.com/2018FallFCS/fullrestores/091-65188/11BE19F6-AC8E-11E8-A312-F5CEDE149863/iPhone11,4,iPhone11,6_12.0_16A366_Restore.ipsw
[2] https://www.theiphonewiki.com/wiki/Firmware/iPhone/12.x
16. Dive a bit deeper into Core
ML
• If you are on macOS and have Xcode installed, there are some binaries
with symbols in ~/Library/Developer/Xcode/iOS
DeviceSupport/12.1 (16B92) arm64e/
• What do I mean by “some”? E.g., there is /System/Library/
PrivateFrameworks/AppleNeuralEngine.framework/
XPCServices/ANECompilerService.xpc/
ANECompilerService on A12 devices, but not in Xcode’s support
library
• Yes, we can find /System/Library/Frameworks/
CoreML.framework/CoreML
• Even /System/Library/Caches/com.apple.dyld/
dyld_shared_cache_arm* is there
17. extract binaries from
dyld_shared_cache
• jtool can do it for you. E.g.,
• list
~/work/ios-hacking/tools/jtool -l /Volumes/Peace16A366.D331OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e
• extract
~/work/ios-hacking/tools/jtool -e /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine /Volumes/Peace16A366.D331OS/System/
Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e
Extracting /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine at 0x2be22000 into dyld_shared_cache_arm64e.AppleNeuralEngine
• dyld source code
• https://opensource.apple.com/source/dyld/dyld-551.4/, https://
opensource.apple.com/tarballs/dyld/dyld-551.4.tar.gz
• Read dyld source and [1] for more about dyld_shared_cache
[1] https://iphonedevwiki.net/index.php/Dyld_shared_cache
19. kernel side
• So, how about extract or just put ANE related stuff into A11
devices?
• Well, if you look into kernel_cache of A11 and A12 devices
• As expected, we can see lots of H11ANE information in
A12 kernel_cache
• A11 kernel_cache does mentioned H11ANE several
times, but it seems important modules are not there.
• So, I guess if we don’t jailbreak and root, we are out of luck!
21. Isn’t XNU (Darwin source
code open)?
• Well, there are more than 200 kernel modules, only some of them
are open
$ ~/work/ios-hacking/tools/jtool2 -k ../../iphonex/ipsw/kernelcache.release.iphone10b
0xfffffff00583c000:com.apple.kpi.mach
0xfffffff00583c080:com.apple.kpi.private
0xfffffff00583c100:com.apple.kpi.unsupported
0xfffffff00583c180:com.apple.kpi.iokit
0xfffffff00583c200:com.apple.kpi.libkern
0xfffffff00583c280:com.apple.kpi.bsd
0xfffffff00583c300:com.apple.iokit.IONetworkingFamily
0xfffffff00583de00:com.apple.iokit.IOTimeSyncFamily
0xfffffff0058416c0:com.apple.iokit.IOSlowAdaptiveClockingFamily
0xfffffff005841c40:com.apple.iokit.IOStorageFamily
0xfffffff005842e80:com.apple.iokit.IOReportFamily
0xfffffff005843680:com.apple.driver.AppleARMPlatform
0xfffffff00584cd80:com.apple.driver.AppleSamsungSPI
0xfffffff00584dd00:com.apple.kpi.dsep
0xfffffff00584dd80:com.apple.kec.corecrypto
…