"Further data and more early GPUs can gain immediate performance improvement" [Krizhevsky + 12]
• Large-scale network → Improve potential recognition ability + Increase risk of excessive learning
• Large amount of data → Reduce the risk of over learning + It takes time to calculate
CNN vs. Fully-connected NN
• No pre-training is required - local RF, tied weights - "prewired"
• Difficulty in architectural design - Filter size, number of strides, number of maps, pooling size, stride
Fully-connected NN
• Pre-training Enabled - It was thought that it was essential before, but it seems not
• Alternative method - Drop-out: Fully-connect learning method to avoid over learning of NN [Hinton 12] - discrimintive pretreating