Blog


STM32 in Academic Publications (Nov 2020)

posted Nov 27, 2020, 11:32 PM by MUHAMMAD MUN`IM AHMAD ZABIDI

Problem categories that benefit from machine learning

posted Nov 15, 2020, 2:09 AM by MUHAMMAD MUN`IM AHMAD ZABIDI

Fig. 2 from R. Boutaba et al., “A comprehensive survey on machine learning for networking: evolution, applications and research opportunities,” J. Internet Serv. Appl., vol. 9, no. 1, p. 99, 2018.

Problem categories that benefit from machine learning.
a Clustering.
b Classification.
c Regression.
d Rule extraction



The constituents of ML-based solutions

posted Nov 15, 2020, 2:06 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Nov 15, 2020, 2:06 AM ]

Fig. 3 from R. Boutaba et al., “A comprehensive survey on machine learning for networking: evolution, applications and research opportunities,” J. Internet Serv. Appl., vol. 9, no. 1, p. 99, 2018.

The evolution of machine learning techniques with key milestones

posted Nov 15, 2020, 1:59 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Nov 15, 2020, 2:03 AM ]

Fig. 5 from R. Boutaba et al., “A comprehensive survey on machine learning for networking: evolution, applications and research opportunities,” J. Internet Serv. Appl., vol. 9, no. 1, p. 99, 2018.


Computing devices that have been used for machine learning at the edge

posted Nov 15, 2020, 1:56 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Nov 15, 2020, 1:57 AM ]

Table 3 from M. G. S. Murshed, C. Murphy, D. Hou, N. Khan, G. Ananthanarayanan, and F. Hussain, “Machine Learning at the Network Edge: A Survey,” arXiv Prepr. arXiv1908.00080, pp. 1–28, 2019.

Device

GPU

CPU

RAM

Flash

memory

Power

consumption

Example

applications

Raspberry Pi

400MHz

VideoCore IV

Quad

Cortex A53 @ 1.2GHz

1 GB SDRAM

32 GB

2.5 Amp

video analysis [78, 114]

Coral Dev Board (Edge TPU)

GC7000 Lite

Graphics + Edge TPU coprocessor

Quad Cortex-A53, Cortex-M4F

1 GB LPDDR4

8 GB LPDDR4


5V DC

image processing [18]


SparkFun Edge


-

32-bit ARM

Cortex-M4F 48MHz

(with 96MHz

burst mode) processor


384KB


1MB


6uA/MHz


speech recognition [31]


Jetson TX1

Nvidia

Maxwell 256 CUDA

cores

Quad ARM A57/2 MB L2

4 GB

64 bit LPDDR4

25.6 GB/s

16 GB eMMC, SDIO, SATA


10-W

video, image analysis [68] [61]

robotics [33]


Jetson TX2

Nvidia Pascal

256 CUDA

cores

HMP Dual

Denver 2/2 MB L2 +

Quad ARM

A57/2 MB L2

8 GB

128 bit LPDDR4

59.7 GB/s


32 GB eMMC, SDIO, SATA


7.5-W


video, image analysis [68], [92]

robotics [24]

Intel Movidius Neural Compute Stick


High Performance VPU

Myriad 2 Vision Processing Unit


1 GB


4 GB

2 trillion

16-bit operations per second within

500 mW


classification [73] computer

vision [14, 43]

ARM ML

-

ARM ML

processor

1 GB

-

4 TOPs/W

(Tera Operations)

image, voice recognition [107]


RISC-V GAP8


-

nona-core

32-bit RISC-V

microprocessor

@250 MHz


16 MiB SDRAM


-


1 GOPs/mW


image, audio processing [35]

OpenMV

Cam

-

ARM 32-bit

Cortex-M7

512KB

2 MB

200mA

@ 3.3V

image

processing [6]

BeagleBone AI

-

Cortex-A15

Sitara AM5729 SoC with 4 EVEs

1 GB

16 GB

-

computer vision [25]

EMC3531

-

ARM Cortex-M3

NXP Coolflux DSP

-

-

-

audio, video analysis38

Machine learning frameworks that have been used on edge devices

posted Nov 15, 2020, 1:53 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Nov 15, 2020, 1:55 AM ]

Table 3 from M. G. S. Murshed, C. Murphy, D. Hou, N. Khan, G. Ananthanarayanan, and F. Hussain, “Machine Learning at the Network Edge: A Survey,” arXiv Prepr. arXiv1908.00080, pp. 1–28, 2019.

Framework

Core language

Interface

Part running on the edge

Example applications

TensorFlow Lite (Google)

C++

Java

C/C++

Java

TensorFlow Lite NN API

computer vision [109],

speech recognition [42, 1]

Caffe2

Caffe2Go (Facebook)

C++

Android iOs

NNPack

image analysis, video analysis [53]

Apache MXNet

C++

Python R

Linux

MacOS Windows

Full Model

object detection, recognition [78]

Core ML2 (Apple)

Python

iOS

CoreML

image analysis [16]

NLP [105]

ML Kit (Google)

C++

Java

Android iOs

Full Model

image recognition,

text recognition, bar-code scaning [26]

AI2GO

C, Python Java, Swift

Linux macOs

Full Model

object detection, classification [5]

DeepThings

C/C++

Linux

Full Model

object detection [119]

DeepIoT

Python

Ubilinux

Full Model

human activity

recognition,

user identification [116]

DeepCham

C++

Java

Linux Android

Full Model

object recognition [62]

SparseSep

-

Linux Android

Full Model

mobile object

recognition, audio classification [15]

Edgent

-

Ubuntu

Major part of the DNN

image recognition [63]


17

Projects that "Kill" Microcontrollers (Nov2020)

posted Nov 11, 2020, 7:02 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Nov 11, 2020, 7:15 AM ]

Some embedded/IoT projects that require extensive processing and low power at the same time.
Like those that would take advantage of STM32F4/F7 DSP, FPU and SIMD, plus CMSIS-NN and CMSIS-DSP libraries.
The selected are those that can be downloaded directly. I skip those that require authentication.

  1. Amoh,  J.  and  Odame,  K.  (2016).   Deep  neural networks for identifying cough sounds. IEEE transactions on biomedical circuits and systems,  10(5):1003–1011. 
    Available  https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7570164.

  2. Amoh,  J.  and  Odame,  K.  M.  (2019).   An  optimized  recurrent  unit  for  ultra-low-power  keyword  spotting. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,3(2):1–17. 
    Available https://dl.acm.org/doi/pdf/10.1145/3328907.

  3. Lai,  L.,  Suda,  N.,  and  Chandra,  V.  (2018). CMSIS-NN:Efficient  neural  network  kernels  for  ARM  Cortex-M  CPUs. arXiv preprint arXiv:1801.06601. 
    Available https://arxiv.org/pdf/1801.06601.pdf.

  4. Liu, P.-X., Chen, Y.-j., Jiang, B.-H., and Zhang, X. (2016). Design  of  the  data  acquisition  system  based  on  STM32.   In 2016 Inter-national Conference on Computational Science and Engineering (ICCSE2016). Atlantis Press.
    Available https://download.atlantis-press.com/article/25862099.pdf.

  5. Xie, Y., Su, X., He, Y., Chen, X., Cai, G., Xu, B., and Ye,W.  (2017).   STM32-based  vehicle  data  acquisition  system  for  internet-of-vehicles.  In 2017 IEEE/ACIS 16th International Conference on Computerand Information Science (ICIS),  pages  895–898.  IEEE.   
    Available  https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7960119.

  6. Zhang, Y. and Li, L. (2014). Smart home system basedon Zigbee network and STM32f407 microprocessor. The Open Cybernetics& Systemics Journal,  8(1).
    Available  https://benthamopen.com/contents/pdf/TOCSJ/TOCSJ-8-651.pdf.

  7. Zhang,  Y.,  Suda,  N.,  Lai,  L.,  and  Chandra,  V.  (2017). Hello  Edge:   Keyword   spotting   on   microcontrollers. arXiv preprintarXiv:1711.07128. 
    Available https://arxiv.org/pdf/1711.07128.pdf.

  8. Fezari, Mohamed, Mounir Bousbia-Salah, and Mouldi Bedda. Microcontroller Based Heart Rate Monitor. International Arab Journal of Information Technology (IAJIT) 5.4 (2008).
    Available http://www.academia.edu/download/49169562/11-198.pdf.

Free Books by Experts in Machine Learning

posted Oct 5, 2020, 8:15 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Oct 17, 2020, 9:52 PM ]

Spectrograms in R

posted Sep 27, 2020, 4:03 PM by MUHAMMAD MUN`IM AHMAD ZABIDI

http://viz.smultron.org/r/spectrograms/#more-94

How good is CMSIS-NN ?

posted Sep 22, 2020, 4:43 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Sep 22, 2020, 5:12 PM ]

Original paper:


Results using CIFAR-10 below.

CIFAR-10 CNN as implemented by CMSIS-NN.

Performance of CMSIS-NN over baseline CNN written using arm_conv in CMSIS-DSP.
Platform: NUCLEO-F746ZG mbed board with an Arm Cortex-M7 core running at 216 MHz.
Note: using state-of-art DNN, researchers has achieved ≥93% of accuracy on CIFAR-10.

https://medium.com/@aiotalabs/the-sad-story-of-an-edge-computing-device-why-cant-i-impress-olive-dnn-ddf055922712
Performance on STM32 compared with other SoCs, as reported by AiOTA Labs.

The players:
  • STM32F7 from STmicro. Up to 216 MHz.
  • GAP8 from Greenwaves with CNN benchmarks. Max freq 175 MHz "Cluster", 250 MHz "Fabric Controller". RISC-V core.
  • i.Mx 6ULL from NXP. This guy goes up to 900 MHz.

Looks like the GAP8 is many times more power-efficient than Cortex. At 10 FPS, the GAP8 needs 3.7 mW versus 60 mW on STM32. But it comes at a price. Very heavy price. The cost of an Arduino-compatible GAP8 board is 100,00€. And it's not running CMSIS-NN either. CMSIS-NN is for ARM processors only.

Back to CMSIS-NN.


1-10 of 162

Comments