SlideShare a Scribd company logo
1 of 21
Download to read offline
Can I use Neural Engine
to run my neural networks
on A11 devices?
Koan-Sin Tan

freedom@computer.org

Hsinch Coding Serfs Meeting, Nov, 2018
https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-
silicon-secrets/5
• AnandTech is one of my favorite tech sites. Usually, they provides
good technical analysis

• E.g., Apple’s CPUs

• cache sizes

• execution units

• various instruction latency

• Not good enough for NN accelerators on mobile phones

• floating-point VGG16, Inception V3, and ResNet34?

• come on, are you still in Neolithic era?
ANE on A12, how about A11?
Why I said VGG16 is
Neolithic Era
• Lightweight models are there

• MobileNet V1 could have roughly
the same top-1 accuracy event
with quantized uint8

• MobileNet V2 could have better
top-1 accuracy

• Mnasnet could be better than
MobileNet V1

• Classification, object detection,
segmentation, etc.

• 8-bit quantization are good enough
for many cases
https://github.com/tensorflow/models/raw/master/research/slim/nets/mobilenet/
madds_top1_accuracy.png
How to use Neural Engine
• According to Apple:

• A11: 600 G ops per second, A12: 5 T ops per second

• Yes, by default, it's enabled on A12 device. If you have pre-iOS 10.12 apps built on top of Core ML, they
should be able to use it automatically. But, not on A11 devices.

• How to verify it?

• MLConfiguration [1]: instance variable 

@property(readwrite) MLComputeUnits computeUnits;

• there is usesCPUOnly for VNRequest in iOS11, but not something like MLComputUnits

• See my example [2]

[1] https://developer.apple.com/documentation/coreml/mlmodelconfiguration?language=objc

[2] https://github.com/freedomtan/coremlbenchmark/
Why not VNRequest?
• Since I mentioned VNRequest in Vision.framework, why not VNCoreMLRequest?

• Yes, I wrote simple VNCoreMLRequest based app before. Both Swift and objective-c
ones [1][2].

• Simplified interface and image crop and scale for you.

• Yes, image operations time.

• This actually reminds us an important system software issue.

• Modern cellphone SoCs use DVFS and all kinds of energy-saving techniques
extensively. How can use get good performance?

• Inference with camera on is usually faster than with camera off!!!
[1] https://github.com/freedomtan/SimpleInceptionV3/

[2] https://github.com/freedomtan/SimpleInceptionV3-ObjC
Neural Engine in Action
• H11ANESevicesThread

• A12 is for iPhone11,x

• No H10ANEServicesThread

• So, who started
H11ANEServicesThread? There is no
anything named H11 in /System/
Library/Frameworks/
CoreML.framework/CoreML

• It seems it’s in /System/Library/
PrivateFrameworks/
ANEServices.framework/
ANEServices
• A12 devices only
iPhone Xs Max
default 17:17:14.002705 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) :
H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy
default 17:17:14.004821 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) :
H11ANE:ANE_ProcessDestroy_gated Removed client aned from programHandle=0x8a03aa2e112. Num clients for program=0
default 17:17:14.004938 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) :
H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy
default 17:17:14.011142 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) :
H11ANE:ANE_ProcessDestroy_gated Removed client aned from programHandle=0x8a02e50c71e. Num clients for program=0
default 17:17:14.011358 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) :
H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy
default 17:17:14.024969 +0800 kernel IOReturn H11ANEInUserClient::ANE_PowerOff() - client aned requesting Power Off
default 17:17:14.025291 +0800 kernel IOReturn H11ANEIn::setPowerStateGated(unsigned long, IOService *) : H11ANEIn::setPowerStateGated: 0
default 17:17:14.026850 +0800 kernel IOReturn H11ANEIn::ANE_deInit() : H11ANEIn::ANE_deInit - CSNE_CMD_POWER_DOWN command completed:
res=0x00000000
default 17:17:14.026880 +0800 kernel IOReturn H11ANEIn::ANE_deInit() : H11ANEIn::ANE_deInit - ANECPU in WFI after CSNE_CMD_SUSPEND/
CISP_CMD_POWER_DOWN. retries=0 ASCWRAP_IDLE_STATUS = 0x2d
default 17:17:14.039520 +0800 kernel IOReturn H11ANEIn::ANE_HandlePowerStateChecksForClient() : INFO: H11ANEIn: ANE power status:
isPowered: 0, fDeInitInProgress: 0, fFirmwareTimeout: 0
default 17:17:14.039563 +0800 kernel IOReturn H11ANEIn::ANE_UserClientCleanup_gated(void *) : Info: H11ANEIn: Skipping user client
cleanup for client (<private>) as power is already off
default 17:17:14.039723 +0800 kernel virtual IOReturn H11ANEInUserClient::clientClose() - aned
default 17:17:14.039749 +0800 kernel virtual void H11ANEInUserClient::free() - Freeing UserClient for process: aned (pid 191)
iPhone 8 Plus
default 17:08:51.256253 +0800 kernel ISPCPU: CmdTurnOffDevicePower: TS: 2.901495 Disable CAM0_SHUTDOWN=0
default 17:08:51.256277 +0800 kernel ISPCPU: Addr: 0x00000002122a8000
default 17:08:51.258444 +0800 kernel ISPCPU: TurnOffPower:DONE TS: 2.903766 rail: 0x5, ch: 0, cameraPowerBitEnable:
0x7e
default 17:08:51.258684 +0800 kernel AppleH10CamIn::ISP_PPMAdmissionCheck_gated: subClientID=1; budgetReq=0;
budgetAlloc=0; result=0x00000000
default 17:08:51.258726 +0800 kernel AppleH10CamIn::ISP_StopCamera_gated: subClientID=1; channel=0; budgetReq=0;
budgetAlloc=0; result=0x00000000, numPreviewFrames=72, numStillCaptureFrames:0
default 17:08:51.258813 +0800 kernel ISPCPU: [ISP: 2.904275] CH = 0 CMD = 0x0104 [CISP_CMD_CH_BUFFER_RETURN]
default 17:08:51.266156 +0800 kernel AppleH10CamIn::ISP_FlushInactiveDARTMappings: 0x00000000
default 17:08:51.266234 +0800 mediaserverd H10ISPServicesRemote: SetProperty 2 (sent)
default 17:08:51.267404 +0800 mediaserverd H10ISPServicesRemote: SetProperty 2 (reply=0x00000000)
default 17:08:51.272115 +0800 kernel ISPCPU: [ISP: 2.917542] CH = 0 CMD = 0x820b
[CISP_CMD_APPLE_CH_AE_TILES_MATRIX_METADATA_ENABLE]
default 17:08:51.273311 +0800 kernel ISPCPU: [ISP: 2.918641] CH = 0 CMD = 0x0130 [CISP_CMD_CH_GENERAL_PROCESS_STOP]
default 17:08:51.273747 +0800 kernel AppleH10CamIn::ISP_FlushInactiveDARTMappings: 0x00000000
default 17:08:51.276237 +0800 kernel AppleH10CamIn::ISP_ReleaseChannel_gated - channel: 0 (process: mediaserverd)
iPhone 6s
default 17:18:52.814006 +0800 kernel AppleH6CamIn::setPowerStateGated: 1
default 17:18:52.814054 +0800 kernel AppleH6CamIn::power_on_hardware
default 17:18:52.910762 +0800 kernel AppleH6CamIn::MotionDataEnable: Enabling for Endpoint 0
default 17:18:52.924652 +0800 mediaserverd FigSignalError: -12785, invalidated
default 17:18:52.954154 +0800 mediaserverd FigSignalError: -12785, invalidated
default 17:18:52.954361 +0800 kernel AppleH6CamIn::ISP_SelectBestMIPIFrequencyIndex_gated - channel: 0, currentRawBitDepth: 1, index: 2
default 17:18:53.118463 +0800 kernel AppleH6CamIn::ISP_CopySetfile_gated (camChan=0)
default 17:19:07.301879 +0800 kernel AppleH6CamIn::ISP_FlushInactiveDARTMappings: 0x00000000
default 17:19:12.307839 +0800 kernel AppleH6CamInUserClient::free - Freeing UserClient for process: mediaserverd (pid 2465)
default 17:19:12.308025 +0800 kernel AppleH6CamIn::setPowerStateGated: 0
default 17:19:12.308185 +0800 kernel AppleH6CamIn::power_off_hardware
default 17:19:12.321409 +0800 kernel AppleH6CamIn::ISP_FlushInactiveDARTMappings: 0x00000000
default 17:19:12.321478 +0800 kernel AppleH6CamIn::MotionDataDisable: Enabling for Endpoint 0
iPhone Xs Max iPhone 8 Plus
https://github.com/freedomtan/TestANE/
/* Generated by RuntimeBrowser
Image: /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine
*/
@interface _ANEDeviceInfo : NSObject
+ (id)bootArgs;
+ (id)buildVersion;
+ (bool)hasANE;
+ (bool)isInternalBuild;
+ (bool)precompiledModelChecksDisabled;
@end
https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/
AppleNeuralEngine.framework/_ANEDeviceInfo.h
size -l -x -m /tmp/arm64e/System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine
Segment __TEXT: 0x11000 (vmaddr 0x1abe22000 fileoff 0)
Section __text: 0xb728 (addr 0x1abe23d18 offset 7448)
Section __auth_stubs: 0x3d0 (addr 0x1abe2f440 offset 54336)
Section __cstring: 0xb87 (addr 0x1abe2f810 offset 55312)
Section __objc_methname: 0x10a5 (addr 0x1abe30397 offset 58263)
Section __objc_classname: 0x140 (addr 0x1abe3143c offset 62524)
Section __objc_methtype: 0x498 (addr 0x1abe3157c offset 62844)
Section __gcc_except_tab: 0x8cc (addr 0x1abe31a14 offset 64020)
Section __const: 0xd0 (addr 0x1abe322e0 offset 66272)
Section __oslogstring: 0x8d0 (addr 0x1abe323b0 offset 66480)
Section __unwind_info: 0x330 (addr 0x1abe32c80 offset 68736)
Section __eh_frame: 0x50 (addr 0x1abe32fb0 offset 69552)
total 0xf2e8
Segment __DATA: 0xe00 (vmaddr 0x1ba4ef3b8 fileoff 69632)
Section __objc_selrefs: 0x3e0 (addr 0x1ba4ef3b8 offset 69632)
Section __objc_protorefs: 0x10 (addr 0x1ba4ef798 offset 70624)
Section __objc_classrefs: 0x1b8 (addr 0x1ba4ef7a8 offset 70640)
Section __objc_superrefs: 0x38 (addr 0x1ba4ef960 offset 71080)
Section __objc_ivar: 0x60 (addr 0x1ba4ef998 offset 71136)
Section __objc_data: 0x4b0 (addr 0x1ba4ef9f8 offset 71232)
Section __data: 0x228 (addr 0x1ba4efea8 offset 72432)
Section __auth_ptr: 0x8 (addr 0x1ba4f00d0 offset 72984)
Section __bss: 0xe0 (addr 0x1ba4f00d8 offset 0)
total 0xe00
…
otool -o /tmp/arm64e/System/Library/
PrivateFrameworks/AppleNeuralEngine.framework/
AppleNeuralEngine
/tmp/arm64e/System/Library/PrivateFrameworks/
AppleNeuralEngine.framework/AppleNeuralEngine:
Contents of (__DATA_CONST,__objc_classlist)
section
00000001b7a76a78 0x80001ba4efa20
00000001b7a76a80 0x80001ba4efa70
00000001b7a76a88 0x80001ba4efa98
00000001b7a76a90 0x80001ba4efae8
…
~/work/ios-hacking/tools/jtool -d objc /tmp/arm64/System/
Library/PrivateFrameworks/AppleNeuralEngine.framework/
AppleNeuralEngine
Fat binary, big-endian, 1 architectures: will auto-process
this architecture
arm64_ANEDeviceInfo
_ANEDataReporter
_ANEProgramForEvaluation
_ANEModel
_ANEHashEncodin
_ANERequest
_ANELog
_ANEQoSMapper
_ANEStrings
_ANEDaemonConnection
_ANEIOSurfaceObject
_ANEDeviceController
_ANEClient
_ANEErrors
_ANECloneHelper
http://www.newosxbook.com/tools/jtool.html
Mach-O Headers
• Mac OS X ABI Mach-O File Format Reference, no longer
available on Apple web site, google it.

• headers: /usr/include/mach-o/loader.h

• objc runtime

• https://opensource.apple.com/source/objc4/
objc4-723/, https://opensource.apple.com/tarballs/
objc4/objc4-723.tar.gz
Dive a bit deeper into Core
ML
• Frameworks and some binaries used to be shipped unstripped as parts of iPhoneOS
SDK in Xcode. Not anymore, most framework binaries are in dyld_shared_cache.

• Fortunately, It’s quite easy to check iOS file system nowadays. Apple stopped encrypting
.ipsw since iOS 10 beta (more than 2 years ago). So, get a .ipsw, unzip it (remember it's
a .zip file), then mount the largest .dmg (this needs extra steps on Windows and Linux
though). E.g.,

1. get iOS 12.0 ipsw for iPhone Xs Max [1]. See [2] for other firmwares.

2. unzip it.

3. mount 048-10782-224.dmg, that's it. You can see the whole filesystem used by
iPhone Xs Max.

• Thus, we can get /System/Library/Caches/com.apple.dyld/
dyld_shared_cache_arm* we want 

[1] http://updates-http.cdn-apple.com/2018FallFCS/fullrestores/091-65188/11BE19F6-AC8E-11E8-A312-F5CEDE149863/iPhone11,4,iPhone11,6_12.0_16A366_Restore.ipsw
[2] https://www.theiphonewiki.com/wiki/Firmware/iPhone/12.x
Dive a bit deeper into Core
ML
• If you are on macOS and have Xcode installed, there are some binaries
with symbols in ~/Library/Developer/Xcode/iOS
DeviceSupport/12.1 (16B92) arm64e/

• What do I mean by “some”? E.g., there is /System/Library/
PrivateFrameworks/AppleNeuralEngine.framework/
XPCServices/ANECompilerService.xpc/
ANECompilerService on A12 devices, but not in Xcode’s support
library

• Yes, we can find /System/Library/Frameworks/
CoreML.framework/CoreML
• Even /System/Library/Caches/com.apple.dyld/
dyld_shared_cache_arm* is there
extract binaries from
dyld_shared_cache
• jtool can do it for you. E.g.,

• list

~/work/ios-hacking/tools/jtool -l /Volumes/Peace16A366.D331OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e
• extract

~/work/ios-hacking/tools/jtool -e /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine /Volumes/Peace16A366.D331OS/System/
Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e
Extracting /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine at 0x2be22000 into dyld_shared_cache_arm64e.AppleNeuralEngine
• dyld source code

• https://opensource.apple.com/source/dyld/dyld-551.4/, https://
opensource.apple.com/tarballs/dyld/dyld-551.4.tar.gz

• Read dyld source and [1] for more about dyld_shared_cache

[1] https://iphonedevwiki.net/index.php/Dyld_shared_cache
What to read beyond
Apple’s docs
• https://www.theiphonewiki.com, e.g., https://
www.theiphonewiki.com/wiki/Firmware/iPhone/12.x

• http://iphonedevwiki.net/index.php/Main_Page, e.g.,
http://iphonedevwiki.net/index.php/
Reverse_Engineering_Tools

• http://newosxbook.com/index.php, e.g., http://
newosxbook.com/index.php?page=notes

• https://papers.put.as
kernel side
• So, how about extract or just put ANE related stuff into A11
devices?

• Well, if you look into kernel_cache of A11 and A12 devices

• As expected, we can see lots of H11ANE information in
A12 kernel_cache

• A11 kernel_cache does mentioned H11ANE several
times, but it seems important modules are not there.

• So, I guess if we don’t jailbreak and root, we are out of luck!
That’s it
Isn’t XNU (Darwin source
code open)?
• Well, there are more than 200 kernel modules, only some of them
are open

$ ~/work/ios-hacking/tools/jtool2 -k ../../iphonex/ipsw/kernelcache.release.iphone10b
0xfffffff00583c000:com.apple.kpi.mach
0xfffffff00583c080:com.apple.kpi.private
0xfffffff00583c100:com.apple.kpi.unsupported
0xfffffff00583c180:com.apple.kpi.iokit
0xfffffff00583c200:com.apple.kpi.libkern
0xfffffff00583c280:com.apple.kpi.bsd
0xfffffff00583c300:com.apple.iokit.IONetworkingFamily
0xfffffff00583de00:com.apple.iokit.IOTimeSyncFamily
0xfffffff0058416c0:com.apple.iokit.IOSlowAdaptiveClockingFamily
0xfffffff005841c40:com.apple.iokit.IOStorageFamily
0xfffffff005842e80:com.apple.iokit.IOReportFamily
0xfffffff005843680:com.apple.driver.AppleARMPlatform
0xfffffff00584cd80:com.apple.driver.AppleSamsungSPI
0xfffffff00584dd00:com.apple.kpi.dsep
0xfffffff00584dd80:com.apple.kec.corecrypto
…

More Related Content

What's hot

High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)Roger Zhou 周志强
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OGeorge Cao
 
Working with PowerVC via its REST APIs
Working with PowerVC via its REST APIsWorking with PowerVC via its REST APIs
Working with PowerVC via its REST APIsJoe Cropper
 
The Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEPThe Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEPGregor Schmidt
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例National Cheng Kung University
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architectureKhanh Le
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
SQL Server In-Memory Internals and Performance Tips
SQL Server In-Memory Internals and Performance TipsSQL Server In-Memory Internals and Performance Tips
SQL Server In-Memory Internals and Performance TipsHamid J. Fard
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel TLV
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...confluent
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleMihai Criveti
 
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...The Linux Foundation
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliAnne Nicolas
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?Pradeep Kumar
 
Cfgmgmtcamp 2023 — eBPF Superpowers
Cfgmgmtcamp 2023 — eBPF SuperpowersCfgmgmtcamp 2023 — eBPF Superpowers
Cfgmgmtcamp 2023 — eBPF SuperpowersRaphaël PINSON
 

What's hot (20)

High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/O
 
Working with PowerVC via its REST APIs
Working with PowerVC via its REST APIsWorking with PowerVC via its REST APIs
Working with PowerVC via its REST APIs
 
The Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEPThe Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEP
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
SQL Server In-Memory Internals and Performance Tips
SQL Server In-Memory Internals and Performance TipsSQL Server In-Memory Internals and Performance Tips
SQL Server In-Memory Internals and Performance Tips
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
I2C Drivers
I2C DriversI2C Drivers
I2C Drivers
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Cfgmgmtcamp 2023 — eBPF Superpowers
Cfgmgmtcamp 2023 — eBPF SuperpowersCfgmgmtcamp 2023 — eBPF Superpowers
Cfgmgmtcamp 2023 — eBPF Superpowers
 

Similar to Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?

HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 
Tesla Hacking to FreedomEV
Tesla Hacking to FreedomEVTesla Hacking to FreedomEV
Tesla Hacking to FreedomEVJasper Nuyens
 
Windows内核技术介绍
Windows内核技术介绍Windows内核技术介绍
Windows内核技术介绍jeffz
 
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...The Linux Foundation
 
Accelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slidesAccelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slidesDmitry Vostokov
 
Zen alert - Why You Need and How It Works
Zen alert - Why You Need and How It WorksZen alert - Why You Need and How It Works
Zen alert - Why You Need and How It WorksZenAlert
 
Accelerators: the good, the bad, and the ugly
Accelerators: the good, the bad, and the uglyAccelerators: the good, the bad, and the ugly
Accelerators: the good, the bad, and the uglyIntel IT Center
 
the NML project
the NML projectthe NML project
the NML projectLei Yang
 
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon YangPractical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon YangLyon Yang
 
Tesla hacking presentation fri3d
Tesla hacking presentation fri3dTesla hacking presentation fri3d
Tesla hacking presentation fri3dJasper Nuyens
 
Panic report 121112
Panic report 121112Panic report 121112
Panic report 121112wangxueGT
 
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Igalia
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSFelipe Prado
 
XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...
XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...
XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...The Linux Foundation
 
Android Things in action
Android Things in actionAndroid Things in action
Android Things in actionStefano Sanna
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld
 

Similar to Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices? (20)

HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
Tesla Hacking to FreedomEV
Tesla Hacking to FreedomEVTesla Hacking to FreedomEV
Tesla Hacking to FreedomEV
 
Windows内核技术介绍
Windows内核技术介绍Windows内核技术介绍
Windows内核技术介绍
 
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
 
Accelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slidesAccelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slides
 
Zen alert - Why You Need and How It Works
Zen alert - Why You Need and How It WorksZen alert - Why You Need and How It Works
Zen alert - Why You Need and How It Works
 
Accelerators: the good, the bad, and the ugly
Accelerators: the good, the bad, and the uglyAccelerators: the good, the bad, and the ugly
Accelerators: the good, the bad, and the ugly
 
the NML project
the NML projectthe NML project
the NML project
 
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon YangPractical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
 
Tesla hacking presentation fri3d
Tesla hacking presentation fri3dTesla hacking presentation fri3d
Tesla hacking presentation fri3d
 
Panic report 121112
Panic report 121112Panic report 121112
Panic report 121112
 
Nmap Guide
Nmap GuideNmap Guide
Nmap Guide
 
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...
XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...
XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Inte...
 
Android Things in action
Android Things in actionAndroid Things in action
Android Things in action
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowKoan-Sin Tan
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPUKoan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphonesKoan-Sin Tan
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Koan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchKoan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08Koan-Sin Tan
 

More from Koan-Sin Tan (17)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?

  • 1. Can I use Neural Engine to run my neural networks on A11 devices? Koan-Sin Tan freedom@computer.org Hsinch Coding Serfs Meeting, Nov, 2018
  • 2. https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the- silicon-secrets/5 • AnandTech is one of my favorite tech sites. Usually, they provides good technical analysis • E.g., Apple’s CPUs • cache sizes • execution units • various instruction latency • Not good enough for NN accelerators on mobile phones • floating-point VGG16, Inception V3, and ResNet34? • come on, are you still in Neolithic era? ANE on A12, how about A11?
  • 3. Why I said VGG16 is Neolithic Era • Lightweight models are there • MobileNet V1 could have roughly the same top-1 accuracy event with quantized uint8 • MobileNet V2 could have better top-1 accuracy • Mnasnet could be better than MobileNet V1 • Classification, object detection, segmentation, etc. • 8-bit quantization are good enough for many cases https://github.com/tensorflow/models/raw/master/research/slim/nets/mobilenet/ madds_top1_accuracy.png
  • 4. How to use Neural Engine • According to Apple: • A11: 600 G ops per second, A12: 5 T ops per second • Yes, by default, it's enabled on A12 device. If you have pre-iOS 10.12 apps built on top of Core ML, they should be able to use it automatically. But, not on A11 devices. • How to verify it? • MLConfiguration [1]: instance variable @property(readwrite) MLComputeUnits computeUnits; • there is usesCPUOnly for VNRequest in iOS11, but not something like MLComputUnits • See my example [2] [1] https://developer.apple.com/documentation/coreml/mlmodelconfiguration?language=objc [2] https://github.com/freedomtan/coremlbenchmark/
  • 5. Why not VNRequest? • Since I mentioned VNRequest in Vision.framework, why not VNCoreMLRequest? • Yes, I wrote simple VNCoreMLRequest based app before. Both Swift and objective-c ones [1][2]. • Simplified interface and image crop and scale for you. • Yes, image operations time. • This actually reminds us an important system software issue. • Modern cellphone SoCs use DVFS and all kinds of energy-saving techniques extensively. How can use get good performance? • Inference with camera on is usually faster than with camera off!!! [1] https://github.com/freedomtan/SimpleInceptionV3/ [2] https://github.com/freedomtan/SimpleInceptionV3-ObjC
  • 6. Neural Engine in Action • H11ANESevicesThread • A12 is for iPhone11,x • No H10ANEServicesThread • So, who started H11ANEServicesThread? There is no anything named H11 in /System/ Library/Frameworks/ CoreML.framework/CoreML • It seems it’s in /System/Library/ PrivateFrameworks/ ANEServices.framework/ ANEServices • A12 devices only
  • 7. iPhone Xs Max default 17:17:14.002705 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) : H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy default 17:17:14.004821 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) : H11ANE:ANE_ProcessDestroy_gated Removed client aned from programHandle=0x8a03aa2e112. Num clients for program=0 default 17:17:14.004938 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) : H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy default 17:17:14.011142 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) : H11ANE:ANE_ProcessDestroy_gated Removed client aned from programHandle=0x8a02e50c71e. Num clients for program=0 default 17:17:14.011358 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) : H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy default 17:17:14.024969 +0800 kernel IOReturn H11ANEInUserClient::ANE_PowerOff() - client aned requesting Power Off default 17:17:14.025291 +0800 kernel IOReturn H11ANEIn::setPowerStateGated(unsigned long, IOService *) : H11ANEIn::setPowerStateGated: 0 default 17:17:14.026850 +0800 kernel IOReturn H11ANEIn::ANE_deInit() : H11ANEIn::ANE_deInit - CSNE_CMD_POWER_DOWN command completed: res=0x00000000 default 17:17:14.026880 +0800 kernel IOReturn H11ANEIn::ANE_deInit() : H11ANEIn::ANE_deInit - ANECPU in WFI after CSNE_CMD_SUSPEND/ CISP_CMD_POWER_DOWN. retries=0 ASCWRAP_IDLE_STATUS = 0x2d default 17:17:14.039520 +0800 kernel IOReturn H11ANEIn::ANE_HandlePowerStateChecksForClient() : INFO: H11ANEIn: ANE power status: isPowered: 0, fDeInitInProgress: 0, fFirmwareTimeout: 0 default 17:17:14.039563 +0800 kernel IOReturn H11ANEIn::ANE_UserClientCleanup_gated(void *) : Info: H11ANEIn: Skipping user client cleanup for client (<private>) as power is already off default 17:17:14.039723 +0800 kernel virtual IOReturn H11ANEInUserClient::clientClose() - aned default 17:17:14.039749 +0800 kernel virtual void H11ANEInUserClient::free() - Freeing UserClient for process: aned (pid 191)
  • 8. iPhone 8 Plus default 17:08:51.256253 +0800 kernel ISPCPU: CmdTurnOffDevicePower: TS: 2.901495 Disable CAM0_SHUTDOWN=0 default 17:08:51.256277 +0800 kernel ISPCPU: Addr: 0x00000002122a8000 default 17:08:51.258444 +0800 kernel ISPCPU: TurnOffPower:DONE TS: 2.903766 rail: 0x5, ch: 0, cameraPowerBitEnable: 0x7e default 17:08:51.258684 +0800 kernel AppleH10CamIn::ISP_PPMAdmissionCheck_gated: subClientID=1; budgetReq=0; budgetAlloc=0; result=0x00000000 default 17:08:51.258726 +0800 kernel AppleH10CamIn::ISP_StopCamera_gated: subClientID=1; channel=0; budgetReq=0; budgetAlloc=0; result=0x00000000, numPreviewFrames=72, numStillCaptureFrames:0 default 17:08:51.258813 +0800 kernel ISPCPU: [ISP: 2.904275] CH = 0 CMD = 0x0104 [CISP_CMD_CH_BUFFER_RETURN] default 17:08:51.266156 +0800 kernel AppleH10CamIn::ISP_FlushInactiveDARTMappings: 0x00000000 default 17:08:51.266234 +0800 mediaserverd H10ISPServicesRemote: SetProperty 2 (sent) default 17:08:51.267404 +0800 mediaserverd H10ISPServicesRemote: SetProperty 2 (reply=0x00000000) default 17:08:51.272115 +0800 kernel ISPCPU: [ISP: 2.917542] CH = 0 CMD = 0x820b [CISP_CMD_APPLE_CH_AE_TILES_MATRIX_METADATA_ENABLE] default 17:08:51.273311 +0800 kernel ISPCPU: [ISP: 2.918641] CH = 0 CMD = 0x0130 [CISP_CMD_CH_GENERAL_PROCESS_STOP] default 17:08:51.273747 +0800 kernel AppleH10CamIn::ISP_FlushInactiveDARTMappings: 0x00000000 default 17:08:51.276237 +0800 kernel AppleH10CamIn::ISP_ReleaseChannel_gated - channel: 0 (process: mediaserverd)
  • 9. iPhone 6s default 17:18:52.814006 +0800 kernel AppleH6CamIn::setPowerStateGated: 1 default 17:18:52.814054 +0800 kernel AppleH6CamIn::power_on_hardware default 17:18:52.910762 +0800 kernel AppleH6CamIn::MotionDataEnable: Enabling for Endpoint 0 default 17:18:52.924652 +0800 mediaserverd FigSignalError: -12785, invalidated default 17:18:52.954154 +0800 mediaserverd FigSignalError: -12785, invalidated default 17:18:52.954361 +0800 kernel AppleH6CamIn::ISP_SelectBestMIPIFrequencyIndex_gated - channel: 0, currentRawBitDepth: 1, index: 2 default 17:18:53.118463 +0800 kernel AppleH6CamIn::ISP_CopySetfile_gated (camChan=0) default 17:19:07.301879 +0800 kernel AppleH6CamIn::ISP_FlushInactiveDARTMappings: 0x00000000 default 17:19:12.307839 +0800 kernel AppleH6CamInUserClient::free - Freeing UserClient for process: mediaserverd (pid 2465) default 17:19:12.308025 +0800 kernel AppleH6CamIn::setPowerStateGated: 0 default 17:19:12.308185 +0800 kernel AppleH6CamIn::power_off_hardware default 17:19:12.321409 +0800 kernel AppleH6CamIn::ISP_FlushInactiveDARTMappings: 0x00000000 default 17:19:12.321478 +0800 kernel AppleH6CamIn::MotionDataDisable: Enabling for Endpoint 0
  • 10. iPhone Xs Max iPhone 8 Plus https://github.com/freedomtan/TestANE/
  • 11. /* Generated by RuntimeBrowser Image: /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine */ @interface _ANEDeviceInfo : NSObject + (id)bootArgs; + (id)buildVersion; + (bool)hasANE; + (bool)isInternalBuild; + (bool)precompiledModelChecksDisabled; @end https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/ AppleNeuralEngine.framework/_ANEDeviceInfo.h
  • 12. size -l -x -m /tmp/arm64e/System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine Segment __TEXT: 0x11000 (vmaddr 0x1abe22000 fileoff 0) Section __text: 0xb728 (addr 0x1abe23d18 offset 7448) Section __auth_stubs: 0x3d0 (addr 0x1abe2f440 offset 54336) Section __cstring: 0xb87 (addr 0x1abe2f810 offset 55312) Section __objc_methname: 0x10a5 (addr 0x1abe30397 offset 58263) Section __objc_classname: 0x140 (addr 0x1abe3143c offset 62524) Section __objc_methtype: 0x498 (addr 0x1abe3157c offset 62844) Section __gcc_except_tab: 0x8cc (addr 0x1abe31a14 offset 64020) Section __const: 0xd0 (addr 0x1abe322e0 offset 66272) Section __oslogstring: 0x8d0 (addr 0x1abe323b0 offset 66480) Section __unwind_info: 0x330 (addr 0x1abe32c80 offset 68736) Section __eh_frame: 0x50 (addr 0x1abe32fb0 offset 69552) total 0xf2e8 Segment __DATA: 0xe00 (vmaddr 0x1ba4ef3b8 fileoff 69632) Section __objc_selrefs: 0x3e0 (addr 0x1ba4ef3b8 offset 69632) Section __objc_protorefs: 0x10 (addr 0x1ba4ef798 offset 70624) Section __objc_classrefs: 0x1b8 (addr 0x1ba4ef7a8 offset 70640) Section __objc_superrefs: 0x38 (addr 0x1ba4ef960 offset 71080) Section __objc_ivar: 0x60 (addr 0x1ba4ef998 offset 71136) Section __objc_data: 0x4b0 (addr 0x1ba4ef9f8 offset 71232) Section __data: 0x228 (addr 0x1ba4efea8 offset 72432) Section __auth_ptr: 0x8 (addr 0x1ba4f00d0 offset 72984) Section __bss: 0xe0 (addr 0x1ba4f00d8 offset 0) total 0xe00 …
  • 13. otool -o /tmp/arm64e/System/Library/ PrivateFrameworks/AppleNeuralEngine.framework/ AppleNeuralEngine /tmp/arm64e/System/Library/PrivateFrameworks/ AppleNeuralEngine.framework/AppleNeuralEngine: Contents of (__DATA_CONST,__objc_classlist) section 00000001b7a76a78 0x80001ba4efa20 00000001b7a76a80 0x80001ba4efa70 00000001b7a76a88 0x80001ba4efa98 00000001b7a76a90 0x80001ba4efae8 … ~/work/ios-hacking/tools/jtool -d objc /tmp/arm64/System/ Library/PrivateFrameworks/AppleNeuralEngine.framework/ AppleNeuralEngine Fat binary, big-endian, 1 architectures: will auto-process this architecture arm64_ANEDeviceInfo _ANEDataReporter _ANEProgramForEvaluation _ANEModel _ANEHashEncodin _ANERequest _ANELog _ANEQoSMapper _ANEStrings _ANEDaemonConnection _ANEIOSurfaceObject _ANEDeviceController _ANEClient _ANEErrors _ANECloneHelper http://www.newosxbook.com/tools/jtool.html
  • 14. Mach-O Headers • Mac OS X ABI Mach-O File Format Reference, no longer available on Apple web site, google it. • headers: /usr/include/mach-o/loader.h • objc runtime • https://opensource.apple.com/source/objc4/ objc4-723/, https://opensource.apple.com/tarballs/ objc4/objc4-723.tar.gz
  • 15. Dive a bit deeper into Core ML • Frameworks and some binaries used to be shipped unstripped as parts of iPhoneOS SDK in Xcode. Not anymore, most framework binaries are in dyld_shared_cache. • Fortunately, It’s quite easy to check iOS file system nowadays. Apple stopped encrypting .ipsw since iOS 10 beta (more than 2 years ago). So, get a .ipsw, unzip it (remember it's a .zip file), then mount the largest .dmg (this needs extra steps on Windows and Linux though). E.g., 1. get iOS 12.0 ipsw for iPhone Xs Max [1]. See [2] for other firmwares. 2. unzip it. 3. mount 048-10782-224.dmg, that's it. You can see the whole filesystem used by iPhone Xs Max. • Thus, we can get /System/Library/Caches/com.apple.dyld/ dyld_shared_cache_arm* we want [1] http://updates-http.cdn-apple.com/2018FallFCS/fullrestores/091-65188/11BE19F6-AC8E-11E8-A312-F5CEDE149863/iPhone11,4,iPhone11,6_12.0_16A366_Restore.ipsw [2] https://www.theiphonewiki.com/wiki/Firmware/iPhone/12.x
  • 16. Dive a bit deeper into Core ML • If you are on macOS and have Xcode installed, there are some binaries with symbols in ~/Library/Developer/Xcode/iOS DeviceSupport/12.1 (16B92) arm64e/ • What do I mean by “some”? E.g., there is /System/Library/ PrivateFrameworks/AppleNeuralEngine.framework/ XPCServices/ANECompilerService.xpc/ ANECompilerService on A12 devices, but not in Xcode’s support library • Yes, we can find /System/Library/Frameworks/ CoreML.framework/CoreML • Even /System/Library/Caches/com.apple.dyld/ dyld_shared_cache_arm* is there
  • 17. extract binaries from dyld_shared_cache • jtool can do it for you. E.g., • list ~/work/ios-hacking/tools/jtool -l /Volumes/Peace16A366.D331OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e • extract ~/work/ios-hacking/tools/jtool -e /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine /Volumes/Peace16A366.D331OS/System/ Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e Extracting /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine at 0x2be22000 into dyld_shared_cache_arm64e.AppleNeuralEngine • dyld source code • https://opensource.apple.com/source/dyld/dyld-551.4/, https:// opensource.apple.com/tarballs/dyld/dyld-551.4.tar.gz • Read dyld source and [1] for more about dyld_shared_cache [1] https://iphonedevwiki.net/index.php/Dyld_shared_cache
  • 18. What to read beyond Apple’s docs • https://www.theiphonewiki.com, e.g., https:// www.theiphonewiki.com/wiki/Firmware/iPhone/12.x • http://iphonedevwiki.net/index.php/Main_Page, e.g., http://iphonedevwiki.net/index.php/ Reverse_Engineering_Tools • http://newosxbook.com/index.php, e.g., http:// newosxbook.com/index.php?page=notes • https://papers.put.as
  • 19. kernel side • So, how about extract or just put ANE related stuff into A11 devices? • Well, if you look into kernel_cache of A11 and A12 devices • As expected, we can see lots of H11ANE information in A12 kernel_cache • A11 kernel_cache does mentioned H11ANE several times, but it seems important modules are not there. • So, I guess if we don’t jailbreak and root, we are out of luck!
  • 21. Isn’t XNU (Darwin source code open)? • Well, there are more than 200 kernel modules, only some of them are open $ ~/work/ios-hacking/tools/jtool2 -k ../../iphonex/ipsw/kernelcache.release.iphone10b 0xfffffff00583c000:com.apple.kpi.mach 0xfffffff00583c080:com.apple.kpi.private 0xfffffff00583c100:com.apple.kpi.unsupported 0xfffffff00583c180:com.apple.kpi.iokit 0xfffffff00583c200:com.apple.kpi.libkern 0xfffffff00583c280:com.apple.kpi.bsd 0xfffffff00583c300:com.apple.iokit.IONetworkingFamily 0xfffffff00583de00:com.apple.iokit.IOTimeSyncFamily 0xfffffff0058416c0:com.apple.iokit.IOSlowAdaptiveClockingFamily 0xfffffff005841c40:com.apple.iokit.IOStorageFamily 0xfffffff005842e80:com.apple.iokit.IOReportFamily 0xfffffff005843680:com.apple.driver.AppleARMPlatform 0xfffffff00584cd80:com.apple.driver.AppleSamsungSPI 0xfffffff00584dd00:com.apple.kpi.dsep 0xfffffff00584dd80:com.apple.kec.corecrypto …