Post date: Aug 26, 2018 7:40:55 AM
Summarizing https://www.youtube.com/watch?v=AgpmDOsdTIA
Published by Microsoft Research Sep 18, 2017
Differences of NN in small devices:
NN in gadgets compared to datacenters
Usually safety-critical (except smartphones) vs rarely for datacenters
Low-power is required vs nice-to-have
Real-time is required vs preferable
Desirable properties on NN on gadgets:
sufficiently high accuracy
low computational complexity
low energy usage
small model size
Advantages of small models:
Fewer parameters means bigger opportunities for scaling training - 145X speedup on 256 GPUs for FireNet (CVPR 2016), 47x speedup for GoogLeNet
Enables complete on-chip integration of CNN model with weights - no need for off-chip memory -> dramatically reduces energy for inference, up-close/personal data gathering, integration with sensor
Enables continuous wireless updates of models if retraining is required
Seven ways to squeeze:
Replace FC with CNN
Kernel reduction: reduce height x width of filters e.g. 3x3 -> 1x1
Channel reduction: reduce the number of filters and channels
Evenly spaced downsampling: early vs late vs evenly spaced (gradual) downsampling
Depthwise separable convolutions: apply convolutions only to some channels
Shuffle layer:Â idea 2 & idea 5 channels to talk to each other the first time
Distillation & Compression: refer to paper on Deep Compression. Many ways to do it.