Blog


Intel Penang Students Working on STM32 ARM Cortex-M4

posted Jun 12, 2018, 7:01 PM by MUHAMMAD MUN`IM AHMAD ZABIDI

Choosing the Right Algorithm

posted Jun 12, 2018, 5:41 PM by MUHAMMAD MUN`IM AHMAD ZABIDI

Deep Learning for Facial Expression Analysis

posted Jun 12, 2018, 5:22 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Jun 12, 2018, 5:33 PM ]

Machines can be taught to recognize expression using deep learning.

Example convolutional neural network (CNN) for facial expression analysis.

Deep learning consists of the training phase and inference/deployment phase.

During the training phase, CPUs and GPUs are used to find the best network architecture. This is lengthy process of finding the best number neural network layes, number of nodes per layer and the weights for each node. Typically, packages such as TensorFlow, Caffe or Theano are used. Matlab? not much.

CNN training phase.

After the architecture is defined, the coefficients (graphs and weights) derived from the training can be used for inference on an embedded system. This part has been successfully implemented by my Master's student on a Raspberry Pi.

CNN Inference phase.

Images: Synopsis


Overview of Speech Recognition Technology

posted May 25, 2018, 9:20 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated May 26, 2018, 1:33 AM ]

Speech recognition is a technology that converts speech to text. This technology has evolved over several decades and has become a very promising application in the AI field. Well, what are the basic principles behind this mysterious speech recognition technology? For reasons of space, here, we can only briefly explain the basic principles of speech recognition.



Traditional speech recognition systems are created using statistical machine learning methods. A typical speech recognition system consists of the following modules:
  1. Voice Acquisition Module: In this module, the microphone audio input is acquired and expressed as a digitized voice signal. For example, a 16-bit digital voice signal with a 16k sampling rate means that each second of speech is represented as 16,000 16-bit integers.
  2. Feature Extraction Module: This module is primarily tasked with converting the digital voice signal into a feature vector and supplying it to the acoustic model for processing. The most common acoustic features extracted from speech input signals are Mel-Frequency Cepstral Coefficients, or MFCCs, although there are other methods, including Linear Predictive Coding (LPC) and Perceptual Linear Prediction (PLP).
  3. Acoustic Model: The acoustic model represents the relationship between the audio signal and basic units of the language (feature -> phoneme) to reconstruct what was actually uttered. Traditionally, speech recognition used Hidden Markov Model-Gaussian Mixture Model (HMM-GMM), but in recent years, many systems have used Hidden Markov Model-Deep Neural Network (HMM-DNN) or other improved models.
  4. Lexicon: The lexicon or pronunciation dictionary contains all the words and pronunciations that can be handled by a speech recognition system. It provides data about words and their pronunciations, and links acoustic model and language model.
  5. Language Model: The language model constrains the decoder's choice of words to those that are most likely to make sense. It predicts the next words based on the last words. Statistical language models (typically n-grams) are compiled from large corpora of text, usually from specific "domains" or subject areas.
  6. Decoder: The decoder is tasked with receiving the input feature vector and searching for the word string that most probably outputs this feature vector based on the acoustic and language models. Generally, this search process is completed using a beam search-based Viterbi algorithm.
Current speech recognition systems used on mobiles (Google Now, Apple Siri, Amazon Alexa, Microsoft Cortana) use deep learning methods instead. In the statistical method above, deep learning was used for the acoustic model only. Full deep learning approach are also known as end-to-end speech recognition.

References:

Power Efficiency of Device Implementations

posted May 19, 2018, 5:46 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated May 19, 2018, 5:49 PM ]

The common microprocessor is actually not that very power-efficient.
GPU is slightly more power-efficient.
FPGA is approximately 10 times more power-efficient.
And the winner is ASIC with is about 100 times better.
All these depend on the specific algorithm of course.
Some algorithms may be more amenable for parallel processing, thus achieving even higher speedups.



 Abbrev.
Long Name
Examples
GPP
General Purpose Processor   
Intel Architecture (IA-32), ARM, microcontrollers
GPU
Graphics Processing Unit
Nvidia GeForce, ATI Radeon
FPGA Field-Programmable Gate Array
Altera/Intel, Xilinx
ASIC
Application-Specific Integrated Circuit
Names chosen by foundry (eg TSMC, Samsung) customer.



Deep Learning vs Everthing Else

posted May 17, 2018, 12:36 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated May 17, 2018, 12:38 AM ]




Trends in Deep Neural Networks

posted May 16, 2018, 6:32 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated May 16, 2018, 11:59 PM ]

1: Deeper Networks

Deeper networks lead to improved inference accuracies.



Deeper networks means increasing the number of parameters, incresing the model sizes.

2: Mathematical Transforms

Mathematical transforms lead to optimizations. For example, Winograd transformations can be applied to 3x3 filters. Fast Fourier Transforms (FFT) have been shown to be amenable to larger filters (5x5 and above).

3: Compact Data Types

Many researchers have shown that representing data in less than 32 bit (FP32) leads to only a small reduction in accuracy.
The lates GPUs provides support for FP16 and INT8. Research in binarized neural networks (BNN) used 1-bit data types, restricted to +1 or -1.
There are also work in ternary neural networks (TNN) with weights constrained to +1, 0 and -1.

4: Exploiting Sparsity

Sparsity or presence of zeros can improve efficiency. About half the neurons in AlexNet and VGG are zeros. Computations of zero valued neurons are unnecessary.

Zeroing out ("pruning") weights are deemed not important makes the weights sparse. Pruning on AlexNet and VGG16 have resulted in 95%  and 96% sparsity for certain layers without reduction in accuracy.



Ternarization on TNN leads to many zero weights. A ternarized ResNet has 50% weight sparsity while delivering comparable accuracy.

DNN is rapidly evolving. Nevertheless, compact data types and sparsity exploitation are likely to be the norm on next-gen DNNs.

5: Compression

Weight sharing, hashing and Huffman coding reduce the resources required to set up DNNs.


References
  1. E. Nurvitadhi et al., “Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 5–14.
  2. A. Lavin and S. Gray, “Fast Algorithms for Convolutional Neural Networks,” 2016 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–20, 2016.

FSM Controllers

posted May 6, 2018, 2:31 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated May 6, 2018, 2:32 AM ]

The difference between basic FSM, FSM controllers and FSM with datapath (CU+DU).


Evoluton of FSM

posted May 4, 2018, 8:42 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated May 4, 2018, 11:34 PM ]


Saturation Arithmetic

posted Apr 29, 2018, 10:56 PM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Apr 29, 2018, 11:06 PM ]

Take a sinewave normalized to 1 in 16 bits. By the way, this is how a sinewave looks like in Q15 format.



Assume the signal is passed through an amplifier has a gain of 2. The resulting numbers would overflow and alias.


That would be totally wrong in terms of signal amplification since it no longer has any resemblance of the original wave. That is where saturation arithmetic comes.

Saturation arithmetic simply clips the positive values to 32767 (largest value in 16-bits) and negative values to -32768 (minimum value in -32768). Multiplying each value in the original sinewave by 2 gives the following result:


The last waveform was generated by Python. The others may generated by commenting and modifying the code as appropriate.

import numpy as np
import matplotlib.pyplot as plot
time        = np.arange(0, 10, 0.01);
amplitude   = 65536*np.sin(time)
clipped     = np.clip(amplitude, -32768, 32767)
#aliased    = np.where(amplitude>32767,amplitude-65536,amplitude)
#aliased2   = np.where(aliased<-32768,aliased+65536,aliased)
print (clipped)
# Plot a sine wave using time and amplitude obtained for the sine wave
plot.plot(time, clipped)
plot.grid(True, which='both')
plot.axhline(y=0, color='k')
plot.show()

1-10 of 60

Comments