posted Jun 12, 2018, 7:01 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
posted Jun 12, 2018, 5:41 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
posted Jun 12, 2018, 5:22 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated Jun 12, 2018, 5:33 PM
]
Machines can be taught to recognize expression using deep learning.
Example convolutional neural network (CNN) for facial expression analysis.
Deep learning consists of the training phase and inference/deployment phase.
During the training phase, CPUs and GPUs are used to find the best network architecture. This is lengthy process of finding the best number neural network layes, number of nodes per layer and the weights for each node. Typically, packages such as TensorFlow, Caffe or Theano are used. Matlab? not much.
After the architecture is defined, the coefficients (graphs and weights) derived from the training can be used for inference on an embedded system. This part has been successfully implemented by my Master's student on a Raspberry Pi.

posted May 25, 2018, 9:20 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 26, 2018, 1:33 AM
]
Speech recognition is a technology that converts speech to text. This technology has evolved over several decades and has become a very promising application in the AI field. Well, what are the basic principles behind this mysterious speech recognition technology? For reasons of space, here, we can only briefly explain the basic principles of speech recognition.
Traditional speech recognition systems are created using statistical machine learning methods. A typical speech recognition system consists of the following modules:
 Voice Acquisition Module: In this module, the microphone audio input is acquired and expressed as a digitized voice signal. For example, a 16bit digital voice signal with a 16k sampling rate means that each second of speech is represented as 16,000 16bit integers.
 Feature Extraction Module: This module is primarily tasked with converting the digital voice signal into a feature vector and supplying it to the acoustic model for processing. The most common acoustic features extracted from speech input signals are MelFrequency Cepstral Coefficients, or MFCCs, although there are other methods, including Linear Predictive Coding (LPC) and Perceptual Linear Prediction (PLP).
 Acoustic Model: The acoustic model represents the relationship between the audio signal and basic units of the language (feature > phoneme) to reconstruct what was actually uttered. Traditionally, speech recognition used Hidden Markov ModelGaussian Mixture Model (HMMGMM), but in recent years, many systems have used Hidden Markov ModelDeep Neural Network (HMMDNN) or other improved models.
 Lexicon: The lexicon or pronunciation dictionary contains all the words and pronunciations that can be handled by a speech recognition system. It provides data about words and their pronunciations, and links acoustic model and language model.
 Language Model: The language model constrains the decoder's choice of words to those that are most likely to make sense. It predicts the next words based on the last words. Statistical language models (typically ngrams) are compiled from large corpora of text, usually from specific "domains" or subject areas.
 Decoder: The decoder is tasked with receiving the input feature vector and searching for the word string that most probably outputs this feature vector based on the acoustic and language models. Generally, this search process is completed using a beam searchbased Viterbi algorithm.
Current speech recognition systems used on mobiles (Google Now, Apple Siri, Amazon Alexa, Microsoft Cortana) use deep learning methods instead. In the statistical method above, deep learning was used for the acoustic model only. Full deep learning approach are also known as endtoend speech recognition.
References:

posted May 19, 2018, 5:46 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 19, 2018, 5:49 PM
]
The common microprocessor is actually not that very powerefficient.
GPU is slightly more powerefficient.
FPGA is approximately 10 times more powerefficient.
And the winner is ASIC with is about 100 times better.
All these depend on the specific algorithm of course.
Some algorithms may be more amenable for parallel processing, thus achieving even higher speedups.
Abbrev.

Long Name

Examples

GPP

General Purpose Processor

Intel Architecture (IA32), ARM, microcontrollers

GPU

Graphics Processing Unit

Nvidia GeForce, ATI Radeon

FPGA 
FieldProgrammable Gate Array

Altera/Intel, Xilinx

ASIC

ApplicationSpecific Integrated Circuit

Names chosen by foundry (eg TSMC, Samsung) customer.


posted May 17, 2018, 12:36 AM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 17, 2018, 12:38 AM
]
posted May 16, 2018, 6:32 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 16, 2018, 11:59 PM
]
Deeper networks lead to improved inference accuracies.
Deeper networks means increasing the number of parameters, incresing the model sizes.
2: Mathematical Transforms
Mathematical transforms lead to optimizations. For example, Winograd transformations can be applied to 3x3 filters. Fast Fourier Transforms (FFT) have been shown to be amenable to larger filters (5x5 and above).
3: Compact Data Types
Many researchers have shown that representing data in less than 32 bit (FP32) leads to only a small reduction in accuracy.
The lates GPUs provides support for FP16 and INT8. Research in binarized neural networks (BNN) used 1bit data types, restricted to +1 or 1.
There are also work in ternary neural networks (TNN) with weights constrained to +1, 0 and 1.
4: Exploiting Sparsity
Sparsity or presence of zeros can improve efficiency. About half the neurons in AlexNet and VGG are zeros. Computations of zero valued neurons are unnecessary.
Zeroing out ("pruning") weights are deemed not important makes the weights sparse. Pruning on AlexNet and VGG16 have resulted in 95% and 96% sparsity for certain layers without reduction in accuracy.
Ternarization on TNN leads to many zero weights. A ternarized ResNet has 50% weight sparsity while delivering comparable accuracy.
DNN is rapidly evolving. Nevertheless, compact data types and sparsity exploitation are likely to be the norm on nextgen DNNs.
5: Compression
Weight sharing, hashing and Huffman coding reduce the resources required to set up DNNs.
References
 E. Nurvitadhi et al., “Can FPGAs Beat GPUs in Accelerating NextGeneration Deep Neural Networks?,” in Proceedings of the 2017 ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, 2017, pp. 5–14.
 A. Lavin and S. Gray, “Fast Algorithms for Convolutional Neural Networks,” 2016 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–20, 2016.

posted May 6, 2018, 2:31 AM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 6, 2018, 2:32 AM
]
The difference between basic FSM, FSM controllers and FSM with datapath (CU+DU).

posted May 4, 2018, 8:42 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 4, 2018, 11:34 PM
]
posted Apr 29, 2018, 10:56 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated Apr 29, 2018, 11:06 PM
]
Take a sinewave normalized to 1 in 16 bits. By the way, this is how a sinewave looks like in Q15 format.
Assume the signal is passed through an amplifier has a gain of 2. The resulting numbers would overflow and alias.
That would be totally wrong in terms of signal amplification since it no longer has any resemblance of the original wave. That is where saturation arithmetic comes.
Saturation arithmetic simply clips the positive values to 32767 (largest value in 16bits) and negative values to 32768 (minimum value in 32768). Multiplying each value in the original sinewave by 2 gives the following result:
The last waveform was generated by Python. The others may generated by commenting and modifying the code as appropriate.
import numpy as np
import matplotlib.pyplot as plot
time = np.arange(0, 10, 0.01);
amplitude = 65536*np.sin(time)
clipped = np.clip(amplitude, 32768, 32767)
#aliased = np.where(amplitude>32767,amplitude65536,amplitude)
#aliased2 = np.where(aliased<32768,aliased+65536,aliased)
print (clipped)
# Plot a sine wave using time and amplitude obtained for the sine wave
plot.plot(time, clipped)
plot.grid(True, which='both')
plot.axhline(y=0, color='k')
plot.show()

