Blog‎ > ‎

Why Small Deep Neural Nets

posted Aug 26, 2018, 12:40 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Aug 26, 2018, 2:19 AM ]

Published by Microsoft Research Sep 18, 2017

Differences of NN in small devices:
NN in gadgets compared to datacenters
  • Usually safety-critical (except smartphones) vs rarely for datacenters
  • Low-power is required vs nice-to-have
  • Real-time is required vs preferable
Desirable properties on NN on gadgets:
  • sufficiently high accuracy
  • low computational complexity
  • low energy usage
  • small model size

Advantages of small models:
  1. Fewer parameters means bigger opportunities for scaling training - 145X speedup on 256 GPUs for FireNet (CVPR 2016), 47x speedup for GoogLeNet
  2. Enables complete on-chip integration of CNN model with weights - no need for off-chip memory -> dramatically reduces energy for inference, up-close/personal data gathering, integration with sensor
  3. Enables continuous wireless updates of models if retraining is required
Seven ways to squeeze:
  1. Replace FC with CNN
  2. Kernel reduction: reduce height x width of filters e.g. 3x3 -> 1x1
  3. Channel reduction: reduce the number of filters and channels
  4. Evenly spaced downsampling: early vs late vs evenly spaced (gradual) downsampling
  5. Depthwise separable convolutions: apply convolutions only to some channels
  6. Shuffle layer:  idea 2 & idea 5 channels to talk to each other the first time
  7. Distillation & Compression: refer to paper on Deep Compression. Many ways to do it.