Blog
Digital Design Entry Methods
The most popular systemlevel development tools are Vivao HLS and Altera SDK for OpenCL. Vivado HLS requires more hardware knowledge. Altera OpenCL is relatively easier for software programmers but uses more FPGA resources. Ref: Qin, S., & Berekovic, M. (2015). A Comparison of HighLevel Design Tools for SoCFPGA on Disparity Map Calculation Example. In 2nd International Workshop on FPGAs for Software Programmers (FSP 2015). Retrieved from http://arxiv.org/abs/1509.00036 
Array Multiplier
Digital multiplication is the most extensively used operation (especially in signal processing), people who design digital signal processors sacrifice a lot of chip area in order to make the multiply as fast as possible. Parallel Multiplication

AI Accelerators
AI chip design projects:
AI implementations and main weakness of each:
When to Use FPGAs
When to Use GPGPUs
Bibliography

Raspberry Pi GPU
The Raspberry Pi is a great platform for embedded computer vision but the CPU is slow. Using the onboard GPU may accelerate video operations. Broadcomm calls it the Quad Processing Units (QPUs). The QPU is a vector processor developed by Broadcom with instructions that operate on 16element vectors of 32bit integer or floating point values

Spiking Neural Networks
Fig 1. Machine learning encompasses a range of algorithms. Spiking neural networks (SNNs) are a different form of neural networks that more closely matches biological neurons. SNNs, which use feedforward training, have low computational and power requirements (Fig. 2) compared to CNNs. Fig 2. Leveraging feedforward training, spiking neural networks have low computational and power requirements compared to CNNs. SNN models also work differently from CNNs because of their spiking nature. Information flows through CNN models in a wavelike fashion; information is modified by weights associated with the nodes in each network layer. SNNs emit spikes in a somewhat similar fashion, but spikes aren’t always generated at each point, depending on the data. SNN training and hardware requirements are significantly different from CNNs. There are applications where one is much better than the other and areas where they overlap. 
Posit Number System
Posit numbers are a new way to represent real numbers for computers, an alternative to the standard IEEE floating point formats. The primary advantage of posits is the ability to get more precision or dynamic range out of a given number of bits. A conventional floating point number (IEEE 754) has a sign bit, a set of bits to represent the exponent, and a set of bits called the significand (formerly called the mantissa). For a given size number, the lengths of the various parts are fixed. A 64bit floating point number, for example, has 1 sign bit, 11 exponent bits, and 52 bits for the significand. A posit adds an additional category of bits, known as the regime. A posit has four parts
Unlike IEEE numbers, the exponent and fraction parts of a posit do not have fixed length. The sign and regime bits have first priority. Next, the remaining bits, if any, go into the exponent. If there are still bits left after the exponent, the rest go into the fraction Generic format An example. Advantages of posit over IEEE 754:
Who will be the first to produce a chip with posit arithmetic?

Embedded Deep Learning
Market trends:
Energyefficient deep learning sits at the intersection between machine learning and computer architecture. New architectures can potentially revolutionize deep learning and deploy deep learning at scale. Stateoftheart algorithms for applications like face recognition, object identification, and tracking utilize deep learningbased models for inference. Edgebased systems like security cameras and selfdriving cars necessarily need to make use of deep learning in order to go beyond the minimum viable product. However, the core deciding factors for such edgebased systems are power, performance, and cost, as these devices possess limited bandwidth, have zero latency tolerance, and are constrained by intense privacy issues. The situation is further exacerbated by the fact that deep learning algorithms require computation of the order of teraops for a single inference at test time, translating to a few seconds per inference for some of the more complex networks. Such high latencies are not practical for edge devices, which typically need realtime response with zero latency. Additionally, deep learning solutions are extremely compute intensive, resulting in edge devices not being able to afford deep learning inference. Deep learning is necessary to bring intelligence and autonomy to the edge. The first wave of embedded AI is marked by Apple Siri. It's not really embedded because Siri relies on the cloud to perform the full speech recognition process. The second wave is marked by Apple's Face ID. The intelligence happens on the device, independent of the cloud.

IoT Scrapbook
The term "Internet of Things" was first used by Kevin
Ashton in 1999. S. Madakam, R. Ramaswamy, and S. Tripathi, "Internet of Things (IoT): A literature review," Journal of Computer and Communications, vol. 3, p. 164, 2015 IoT was first introduced to the
(Massachusetts Institute of Technology) when the world
was described in a vision where all our personal and
working devices including inanimate objects, have not only
digital identity but potential processing ability allowing
central computer system to organize and manage them. F. Wortmann and K. Flüchter, "Internet of things," Business &
Information Systems Engineering, vol. 57, pp. 221224, 2015. IoT architecture 
LoRa vs NBIoT

Practical guide to text classification
Example of text classification. Another example is sentiment analysis. 