前面俩小节已经讲了经典的alex-net和vgg网络,vgg-net在alex网络的基础上,测试了很多种加深网络的方式,得到了vgg16和vgg19最后的结果还不错,但是后来人们发现,在网络深度到达一定程度后,继续加深网络,会有俩个问题,一个是太远了,梯度消失,即数据分散在不再被激活的那个区域导致梯度为0消失了,这个可以通过norimalized核intermediate narmalization layers解决。二个是模型的准确率会迅速下滑,单并不是overfit造成的。作者提出了一个网络结构,通过重复利用输入来优化训练,结果出奇的好。
一般的,总体分为6层,第一层为一个卷积层加一个pool层,最后为一个全链接fc层,中间四层的每一层都由多个residual block组成,然后每个residual block又由2个或3个卷积层加shortcut connections(捷径,即加上了初始的输入x)组成,这样构成的深层次的卷积神经网络。
中间weight layer一般由2层或者3层卷积组成,层与层之间加上batch norimalization以及relu(何凯名再后续的文章中有提到怎么处理比较好,可以看这篇博客Binbin Xu),最后一层加上x后再relu,最为输出。 这里申明下,中间是4层,每层由多个residual block组成,每个block又由多个卷积层加上x(identity效果比较好,Binbin Xu identity好处很多,首先效果好,再者不会增加参数)
最右边即为一个32层的resnet网络,最上面一个卷积加池化,中间分别有3,4,6,3,这四层block,每个block由俩个卷积,即(3+4+6+3=16)×2=32,再加上最后一个fc层即34层的结构。
发现复杂了可能效果还不好,所以就做了一个简单的模型,原始为32*32的图,padding了4位,然后再随机crop出32*32的图,接着便三个卷积层,分别为32*32×16,16*16×32,8*8×64,每层n个block,每个block俩个卷积层,再加上最后fc共6n+2层。说是110层效果最好,1000+层反而还不好了,可能是过拟合。
下面是部分实现代码,来自于ry/tensorflow-resnet
# This is what they use for CIFAR-10 and 100.# See Section 4.2 in http://arxiv.org/abs/1512.03385def inference_small(x, is_training, num_blocks=3, # 6n+2 total weight layers will be used. use_bias=False, # defaults to using batch norm num_classes=10): c = Config() c['is_training'] = tf.convert_to_tensor(is_training, dtype='bool', name='is_training') c['use_bias'] = use_bias c['fc_units_out'] = num_classes c['num_blocks'] = num_blocks c['num_classes'] = num_classes inference_small_config(x, c)def inference_small_config(x, c): c['bottleneck'] = False c['ksize'] = 3 c['stride'] = 1 with tf.variable_scope('scale1'): c['conv_filters_out'] = 16 c['block_filters_internal'] = 16 c['stack_stride'] = 1 x = conv(x, c) x = bn(x, c) x = activation(x) x = stack(x, c) with tf.variable_scope('scale2'): c['block_filters_internal'] = 32 c['stack_stride'] = 2 x = stack(x, c) with tf.variable_scope('scale3'): c['block_filters_internal'] = 64 c['stack_stride'] = 2 x = stack(x, c) # post-net x = tf.reduce_mean(x, reduction_indices=[1, 2], name="avg_pool") if c['num_classes'] != None: with tf.variable_scope('fc'): x = fc(x, c) return x另一种部分实现代码,参考自wenxinxu/resnet-in-tensorflow
def inference(input_tensor_batch, n, reuse): ''' The main function that defines the ResNet. total layers = 1 + 2n + 2n + 2n +1 = 6n + 2 :param input_tensor_batch: 4D tensor :param n: num_residual_blocks :param reuse: To build train graph, reuse=False. To build validation graph and share weights with train graph, resue=True :return: last layer in the network. Not softmax-ed ''' layers = [] with tf.variable_scope('conv0', reuse=reuse): conv0 = conv_bn_relu_layer(input_tensor_batch, [3, 3, 3, 16], 1) activation_summary(conv0) layers.append(conv0) for i in range(n): with tf.variable_scope('conv1_%d' %i, reuse=reuse): if i == 0: conv1 = residual_block(layers[-1], 16, first_block=True) else: conv1 = residual_block(layers[-1], 16) activation_summary(conv1) layers.append(conv1) for i in range(n): with tf.variable_scope('conv2_%d' %i, reuse=reuse): conv2 = residual_block(layers[-1], 32) activation_summary(conv2) layers.append(conv2) for i in range(n): with tf.variable_scope('conv3_%d' %i, reuse=reuse): conv3 = residual_block(layers[-1], 64) layers.append(conv3) assert conv3.get_shape().as_list()[1:] == [8, 8, 64] with tf.variable_scope('fc', reuse=reuse): in_channel = layers[-1].get_shape().as_list()[-1] bn_layer = batch_normalization_layer(layers[-1], in_channel) relu_layer = tf.nn.relu(bn_layer) global_pool = tf.reduce_mean(relu_layer, [1, 2]) assert global_pool.get_shape().as_list()[-1:] == [64] output = output_layer(global_pool, 10) layers.append(output) return layers[-1]层差原文tabble1 了例如50层的话,中间四层分别(3+4+6+3)=16个block,每个block3个卷积,1×1,3×3,1×1,共16×3=48层,加上下俩层就五十层了。 下面是部分实现代码,来自于ry/tensorflow-resnet
def inference(x, is_training, num_classes=1000, num_blocks=[3, 4, 6, 3], # defaults to 50-layer network use_bias=False, # defaults to using batch norm bottleneck=True): c = Config() c['bottleneck'] = bottleneck c['is_training'] = tf.convert_to_tensor(is_training, dtype='bool', name='is_training') c['ksize'] = 3 c['stride'] = 1 c['use_bias'] = use_bias c['fc_units_out'] = num_classes c['num_blocks'] = num_blocks c['stack_stride'] = 2 with tf.variable_scope('scale1'): c['conv_filters_out'] = 64 c['ksize'] = 7 c['stride'] = 2 x = conv(x, c) x = bn(x, c) x = activation(x) with tf.variable_scope('scale2'): x = _max_pool(x, ksize=3, stride=2) c['num_blocks'] = num_blocks[0] c['stack_stride'] = 1 c['block_filters_internal'] = 64 x = stack(x, c) with tf.variable_scope('scale3'): c['num_blocks'] = num_blocks[1] c['block_filters_internal'] = 128 assert c['stack_stride'] == 2 x = stack(x, c) with tf.variable_scope('scale4'): c['num_blocks'] = num_blocks[2] c['block_filters_internal'] = 256 x = stack(x, c) with tf.variable_scope('scale5'): c['num_blocks'] = num_blocks[3] c['block_filters_internal'] = 512 x = stack(x, c) # post-net x = tf.reduce_mean(x, reduction_indices=[1, 2], name="avg_pool") if num_classes != None: with tf.variable_scope('fc'): x = fc(x, c) return x新闻热点
疑难解答