X_val: (1000, 3, 32, 32) X_train: (49000, 3, 32, 32) X_test: (1000, 3, 32, 32) y_val: (1000,) y_train: (49000,) y_test: (1000,)
Testing affine_forward function: difference: 9.76984772881e-10
Testing affine_backward function: dx error: 7.32235116677e-10 dw error: 1.66805890976e-11 db error: 3.27557274386e-12
Testing relu_forward function: difference: 4.99999979802e-08
Testing affine_relu_forward: dx error: 2.64960985774e-10 dw error: 5.18903301389e-10 db error: 3.27558364316e-12
Testing svm_loss: loss: 9.00018301403 dx error: 8.18289447289e-10
Testing softmax_loss: loss: 2.30260379447 dx error: 7.63394898765e-09
Testing initialization … Testing test-time forward pass … Testing training loss (no regularization) Running numeric gradient check with reg = 0.0 W1 relative error: 1.22e-08 W2 relative error: 3.48e-10 b1 relative error: 6.55e-09 b2 relative error: 4.33e-10 Running numeric gradient check with reg = 0.7 W1 relative error: 8.18e-07 W2 relative error: 2.85e-08 b1 relative error: 1.09e-09 b2 relative error: 9.09e-10
(Iteration 1 / 4900) loss: 2.305062 (Epoch 0 / 10) train acc: 0.105000; val_acc: 0.111000 (Epoch 1 / 10) train acc: 0.479000; val_acc: 0.441000 (Epoch 2 / 10) train acc: 0.479000; val_acc: 0.467000 (Epoch 3 / 10) train acc: 0.489000; val_acc: 0.475000 (Epoch 4 / 10) train acc: 0.506000; val_acc: 0.509000 (Epoch 5 / 10) train acc: 0.543000; val_acc: 0.521000 (Epoch 6 / 10) train acc: 0.546000; val_acc: 0.487000 (Epoch 7 / 10) train acc: 0.589000; val_acc: 0.490000 (Epoch 8 / 10) train acc: 0.625000; val_acc: 0.519000 (Epoch 9 / 10) train acc: 0.618000; val_acc: 0.504000 (Epoch 10 / 10) train acc: 0.627000; val_acc: 0.502000
Running check with reg = 0 Initial loss: 2.30788726404 W1 relative error: 1.22e-07 W2 relative error: 5.05e-07 W3 relative error: 3.72e-08 b1 relative error: 8.78e-08 b2 relative error: 4.42e-09 b3 relative error: 9.04e-11 Running check with reg = 3.14 Initial loss: 7.19123990981 W1 relative error: 8.28e-09 W2 relative error: 4.52e-08 W3 relative error: 2.29e-08 b1 relative error: 1.30e-08 b2 relative error: 7.08e-09 b3 relative error: 2.50e-10
(Iteration 1 / 40) loss: 34.194027 (Epoch 0 / 20) train acc: 0.160000; val_acc: 0.106000 (Epoch 1 / 20) train acc: 0.340000; val_acc: 0.112000 (Epoch 2 / 20) train acc: 0.420000; val_acc: 0.157000 (Epoch 3 / 20) train acc: 0.640000; val_acc: 0.152000 (Epoch 4 / 20) train acc: 0.740000; val_acc: 0.153000 (Epoch 5 / 20) train acc: 0.700000; val_acc: 0.154000 (Iteration 11 / 40) loss: 2.735561 (Epoch 6 / 20) train acc: 0.840000; val_acc: 0.155000 (Epoch 7 / 20) train acc: 0.880000; val_acc: 0.147000 (Epoch 8 / 20) train acc: 0.900000; val_acc: 0.142000 (Epoch 9 / 20) train acc: 0.940000; val_acc: 0.138000 (Epoch 10 / 20) train acc: 0.960000; val_acc: 0.148000 (Iteration 21 / 40) loss: 0.182818 (Epoch 11 / 20) train acc: 0.980000; val_acc: 0.152000 (Epoch 12 / 20) train acc: 0.980000; val_acc: 0.153000 (Epoch 13 / 20) train acc: 0.980000; val_acc: 0.146000 (Epoch 14 / 20) train acc: 0.980000; val_acc: 0.150000 (Epoch 15 / 20) train acc: 1.000000; val_acc: 0.143000 (Iteration 31 / 40) loss: 0.010268 (Epoch 16 / 20) train acc: 1.000000; val_acc: 0.151000 (Epoch 17 / 20) train acc: 1.000000; val_acc: 0.152000 (Epoch 18 / 20) train acc: 1.000000; val_acc: 0.152000 (Epoch 19 / 20) train acc: 1.000000; val_acc: 0.152000 (Epoch 20 / 20) train acc: 1.000000; val_acc: 0.153000
(Iteration 1 / 40) loss: 131.712558 (Epoch 0 / 20) train acc: 0.180000; val_acc: 0.107000 (Epoch 1 / 20) train acc: 0.200000; val_acc: 0.126000 (Epoch 2 / 20) train acc: 0.440000; val_acc: 0.112000 (Epoch 3 / 20) train acc: 0.600000; val_acc: 0.133000 (Epoch 4 / 20) train acc: 0.740000; val_acc: 0.136000 (Epoch 5 / 20) train acc: 0.820000; val_acc: 0.120000 (Iteration 11 / 40) loss: 1.823548 (Epoch 6 / 20) train acc: 0.900000; val_acc: 0.127000 (Epoch 7 / 20) train acc: 0.940000; val_acc: 0.123000 (Epoch 8 / 20) train acc: 1.000000; val_acc: 0.128000 (Epoch 9 / 20) train acc: 1.000000; val_acc: 0.129000 (Epoch 10 / 20) train acc: 1.000000; val_acc: 0.132000 (Iteration 21 / 40) loss: 0.004533 (Epoch 11 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 12 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 13 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 14 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 15 / 20) train acc: 1.000000; val_acc: 0.131000 (Iteration 31 / 40) loss: 0.000145 (Epoch 16 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 17 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 18 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 19 / 20) train acc: 1.000000; val_acc: 0.131000 (Epoch 20 / 20) train acc: 1.000000; val_acc: 0.131000
Did you notice anything about the comparative difficulty of training the three-layer net vs training the five layer net?
The five layer network is more difficult to converge because of more parameters, and it needs a higher weight scale since it is easier to become zero.
next_w error: 8.88234703351e-09 velocity error: 4.26928774328e-09
running with sgd (Iteration 1 / 200) loss: 2.550809 (Epoch 0 / 5) train acc: 0.121000; val_acc: 0.131000 (Iteration 11 / 200) loss: 2.197244 (Iteration 21 / 200) loss: 2.163398 (Iteration 31 / 200) loss: 2.041294 (Epoch 1 / 5) train acc: 0.278000; val_acc: 0.261000 (Iteration 41 / 200) loss: 2.111198 (Iteration 51 / 200) loss: 2.089894 (Iteration 61 / 200) loss: 1.931313 (Iteration 71 / 200) loss: 1.801073 (Epoch 2 / 5) train acc: 0.333000; val_acc: 0.284000 (Iteration 81 / 200) loss: 1.955364 (Iteration 91 / 200) loss: 1.749871 (Iteration 101 / 200) loss: 1.744985 (Iteration 111 / 200) loss: 1.724543 (Epoch 3 / 5) train acc: 0.379000; val_acc: 0.286000 (Iteration 121 / 200) loss: 1.628817 (Iteration 131 / 200) loss: 1.916363 (Iteration 141 / 200) loss: 1.711046 (Iteration 151 / 200) loss: 1.612908 (Epoch 4 / 5) train acc: 0.434000; val_acc: 0.298000 (Iteration 161 / 200) loss: 1.849841 (Iteration 171 / 200) loss: 1.689487 (Iteration 181 / 200) loss: 1.529969 (Iteration 191 / 200) loss: 1.559835 (Epoch 5 / 5) train acc: 0.412000; val_acc: 0.315000
running with sgd_momentum (Iteration 1 / 200) loss: 2.587138 (Epoch 0 / 5) train acc: 0.115000; val_acc: 0.117000 (Iteration 11 / 200) loss: 2.230563 (Iteration 21 / 200) loss: 2.030346 (Iteration 31 / 200) loss: 1.813014 (Epoch 1 / 5) train acc: 0.312000; val_acc: 0.263000 (Iteration 41 / 200) loss: 1.959023 (Iteration 51 / 200) loss: 1.769849 (Iteration 61 / 200) loss: 1.863756 (Iteration 71 / 200) loss: 1.726658 (Epoch 2 / 5) train acc: 0.386000; val_acc: 0.282000 (Iteration 81 / 200) loss: 1.704861 (Iteration 91 / 200) loss: 1.720907 (Iteration 101 / 200) loss: 1.836701 (Iteration 111 / 200) loss: 1.547128 (Epoch 3 / 5) train acc: 0.438000; val_acc: 0.339000 (Iteration 121 / 200) loss: 1.664274 (Iteration 131 / 200) loss: 1.465100 (Iteration 141 / 200) loss: 1.621419 (Iteration 151 / 200) loss: 1.494137 (Epoch 4 / 5) train acc: 0.444000; val_acc: 0.320000 (Iteration 161 / 200) loss: 1.463689 (Iteration 171 / 200) loss: 1.422357 (Iteration 181 / 200) loss: 1.344324 (Iteration 191 / 200) loss: 1.499745 (Epoch 5 / 5) train acc: 0.505000; val_acc: 0.342000
next_w error: 9.52468751104e-08 cache error: 2.64779558072e-09
next_w error: 0.207207036686 v error: 4.20831403811e-09 m error: 4.21496319311e-09
running with adam (Iteration 1 / 200) loss: 2.900338 (Epoch 0 / 5) train acc: 0.084000; val_acc: 0.112000 (Iteration 11 / 200) loss: 2.230153 (Iteration 21 / 200) loss: 2.099443 (Iteration 31 / 200) loss: 2.024011 (Epoch 1 / 5) train acc: 0.275000; val_acc: 0.246000 (Iteration 41 / 200) loss: 2.000415 (Iteration 51 / 200) loss: 1.900182 (Iteration 61 / 200) loss: 1.907408 (Iteration 71 / 200) loss: 1.845571 (Epoch 2 / 5) train acc: 0.291000; val_acc: 0.297000 (Iteration 81 / 200) loss: 2.102456 (Iteration 91 / 200) loss: 1.783676 (Iteration 101 / 200) loss: 1.758608 (Iteration 111 / 200) loss: 1.684519 (Epoch 3 / 5) train acc: 0.353000; val_acc: 0.304000 (Iteration 121 / 200) loss: 1.713870 (Iteration 131 / 200) loss: 1.746584 (Iteration 141 / 200) loss: 1.638944 (Iteration 151 / 200) loss: 1.506298 (Epoch 4 / 5) train acc: 0.425000; val_acc: 0.346000 (Iteration 161 / 200) loss: 1.455979 (Iteration 171 / 200) loss: 1.666277 (Iteration 181 / 200) loss: 1.682877 (Iteration 191 / 200) loss: 1.730752 (Epoch 5 / 5) train acc: 0.473000; val_acc: 0.361000
running with rmsprop (Iteration 1 / 200) loss: 2.552075 (Epoch 0 / 5) train acc: 0.153000; val_acc: 0.142000 (Iteration 11 / 200) loss: 2.099180 (Iteration 21 / 200) loss: 1.913110 (Iteration 31 / 200) loss: 1.803509 (Epoch 1 / 5) train acc: 0.367000; val_acc: 0.294000 (Iteration 41 / 200) loss: 1.652624 (Iteration 51 / 200) loss: 1.687771 (Iteration 61 / 200) loss: 1.727509 (Iteration 71 / 200) loss: 1.661583 (Epoch 2 / 5) train acc: 0.446000; val_acc: 0.330000 (Iteration 81 / 200) loss: 1.568144 (Iteration 91 / 200) loss: 1.906969 (Iteration 101 / 200) loss: 1.696090 (Iteration 111 / 200) loss: 1.604414 (Epoch 3 / 5) train acc: 0.504000; val_acc: 0.321000 (Iteration 121 / 200) loss: 1.428074 (Iteration 131 / 200) loss: 1.653949 (Iteration 141 / 200) loss: 1.616327 (Iteration 151 / 200) loss: 1.395913 (Epoch 4 / 5) train acc: 0.493000; val_acc: 0.349000 (Iteration 161 / 200) loss: 1.261602 (Iteration 171 / 200) loss: 1.309488 (Iteration 181 / 200) loss: 1.408998 (Iteration 191 / 200) loss: 1.404335 (Epoch 5 / 5) train acc: 0.564000; val_acc: 0.356000
(Iteration 1 / 9800) loss: 14.657677 (Epoch 0 / 20) train acc: 0.126000; val_acc: 0.144000 (Epoch 1 / 20) train acc: 0.322000; val_acc: 0.315000 (Epoch 2 / 20) train acc: 0.416000; val_acc: 0.414000 (Epoch 3 / 20) train acc: 0.428000; val_acc: 0.447000 (Epoch 4 / 20) train acc: 0.457000; val_acc: 0.449000 (Epoch 5 / 20) train acc: 0.455000; val_acc: 0.430000 (Epoch 6 / 20) train acc: 0.457000; val_acc: 0.459000 (Epoch 7 / 20) train acc: 0.497000; val_acc: 0.459000 (Epoch 8 / 20) train acc: 0.487000; val_acc: 0.471000 (Epoch 9 / 20) train acc: 0.497000; val_acc: 0.480000 (Epoch 10 / 20) train acc: 0.516000; val_acc: 0.475000 (Epoch 11 / 20) train acc: 0.513000; val_acc: 0.483000 (Epoch 12 / 20) train acc: 0.520000; val_acc: 0.501000 (Epoch 13 / 20) train acc: 0.520000; val_acc: 0.503000 (Epoch 14 / 20) train acc: 0.533000; val_acc: 0.504000 (Epoch 15 / 20) train acc: 0.557000; val_acc: 0.509000 (Epoch 16 / 20) train acc: 0.565000; val_acc: 0.506000 (Epoch 17 / 20) train acc: 0.562000; val_acc: 0.534000 (Epoch 18 / 20) train acc: 0.565000; val_acc: 0.520000 (Epoch 19 / 20) train acc: 0.549000; val_acc: 0.524000 (Epoch 20 / 20) train acc: 0.560000; val_acc: 0.506000
Validation set accuracy: 0.534 Test set accuracy: 0.52
新闻热点
疑难解答