posted May 16, 2018, 6:32 PM by MUHAMMAD MUN`IM AHMAD ZABIDI
[
updated May 16, 2018, 11:59 PM
]
Deeper networks lead to improved inference accuracies.
Deeper networks means increasing the number of parameters, incresing the model sizes.
2: Mathematical Transforms
Mathematical transforms lead to optimizations. For example, Winograd transformations can be applied to 3x3 filters. Fast Fourier Transforms (FFT) have been shown to be amenable to larger filters (5x5 and above).
3: Compact Data Types
Many researchers have shown that representing data in less than 32 bit (FP32) leads to only a small reduction in accuracy.
The lates GPUs provides support for FP16 and INT8. Research in binarized neural networks (BNN) used 1bit data types, restricted to +1 or 1.
There are also work in ternary neural networks (TNN) with weights constrained to +1, 0 and 1.
4: Exploiting Sparsity
Sparsity or presence of zeros can improve efficiency. About half the neurons in AlexNet and VGG are zeros. Computations of zero valued neurons are unnecessary.
Zeroing out ("pruning") weights are deemed not important makes the weights sparse. Pruning on AlexNet and VGG16 have resulted in 95% and 96% sparsity for certain layers without reduction in accuracy.
Ternarization on TNN leads to many zero weights. A ternarized ResNet has 50% weight sparsity while delivering comparable accuracy.
DNN is rapidly evolving. Nevertheless, compact data types and sparsity exploitation are likely to be the norm on nextgen DNNs.
5: Compression
Weight sharing, hashing and Huffman coding reduce the resources required to set up DNNs.
References
 E. Nurvitadhi et al., “Can FPGAs Beat GPUs in Accelerating NextGeneration Deep Neural Networks?,” in Proceedings of the 2017 ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, 2017, pp. 5–14.
 A. Lavin and S. Gray, “Fast Algorithms for Convolutional Neural Networks,” 2016 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–20, 2016.

