Convolutional Neural Nets

Instructions:

  1. Implement a convolutional layer for your neural network. To make this task simple, a Tensor class with a convolution method was included in the starter kit. The constructor for a convolutional layer might look like this:

    C++:
    size_t countTensorElements(const std::initializer_list<size_t>& dims)
    {
    	size_t n = 1;
    	for(const size_t* it = begin(dims); it != end(dims); ++it)
    		n *= *it;
    	return n;
    }
    
    LayerConv::LayerConv(const std::initializer_list<size_t>& inputDims,
                         const std::initializer_list<size_t>& filterDims,
                         const std::initializer_list<size_t>& outputDims)
    : Layer(countTensorElements(inputDims), countTensorElements(outputDims))
    {
    	...
    


    Java:
    	LayerConv(int[] inputDims, int[] filterDims, int[] outputDims)
    	{
    		...
    


    Such layers could be instantiated like this:

    C++:
    		nn.add(new LayerConv({28, 28}, {5, 5, 32}, {28, 28, 32}));
    


    Java:
    		nn.add(new LayerConv(new int[]{28, 28},
    		                     new int[]{5, 5, 32},
    		                     new int[]{28, 28, 32}));
    


  2. Implement a 2D max pooling layer.


  3. Add a unit test that uses finite differencing to validate that your implementation computes correct gradients for the following topology:
    nn.add(new LayerConv({8, 8}, {5, 5, 4}, {8, 8, 4}));
    nn.add(new LayerLeakyRectifier(8 * 8 * 4));
    nn.add(new LayerMaxPooling2D(8, 8, 4));
    nn.add(new LayerConv({4, 4, 4}, {3, 3, 4, 6}, {4, 4, 1, 6}));
    nn.add(new LayerLeakyRectifier(4 * 4 * 6));
    nn.add(new LayerMaxPooling2D(4, 4, 1 * 6));
    nn.add(new LayerLinear(2 * 2 * 6, 3));
    


  4. (Optional) The following topology will yield fewer than 60 errors on the MNIST dataset after about 2 or 3 days of training. If you have access to a machine on which you can run for that long, it might be fun to test your code.
    nn.add(new LayerConv({28, 28}, {5, 5, 32}, {28, 28, 32}));
    nn.add(new LayerLeakyRectifier(28 * 28 * 32));
    nn.add(new LayerMaxPooling2D(28, 28, 32));
    nn.add(new LayerConv({14, 14, 32}, {5, 5, 32, 64}, {14, 14, 1, 64}));
    nn.add(new LayerLeakyRectifier(14 * 14 * 64));
    nn.add(new LayerMaxPooling2D(14, 14, 64));
    nn.add(new LayerLinear(7 * 7 * 64, 1000));
    nn.add(new LayerLeakyRectifier(1000));
    nn.add(new LayerLinear(1000, 10));
    nn.add(new LayerLeakyRectifier(10));
    


FAQ:

  1. The Tensor.convolve method requires the filters to have the same number of dimensions as the input. How do we implement it to support larger filters?
    You will need to use a loop. For example, if the input has a size of {28, 28} and the filter has a size of {5, 5, 32}, then you will need to call Tensor.convolve 32 times (each time using a {5,5} filter) to complete the convolution.


  2. Do convolutional layers have bias terms?
    Yes, there is one bias value for each filter. It is added to each of the output values.


  3. How does one initialize the weights of a convolutional layer?
    I initialize each weight to "r/c", where "r" is a random value drawn from a normal distribution, and "c" is the total number of elements in the filter.


  4. Q: Can you provide example debug spew to help me debug it?
    A: Okay.
    
    I trained a 3-layer neural network with this topology:
    	Layer 0: Convolution({4, 4}, {3, 3, 2}, {4, 4, 2})
    	Layer 1: LeakyRectifier(4 * 4 * 2)
    	Layer 2: MaxPooling2D(4, 4, 2)
    
    Total number of weights: 20
    	(3*3+1)*2 = 20
    	The "+1" is for the bias term.
    	There is one bias value for each of the 2 filters.
    
    initial weights:
    	0,              (bias #1)
    	0.01,0.02,0.03, (filter #1)
    	0.04,0.05,0.06,
    	0.07,0.08,0.09,
    
    	0.1,            (bias #2)
    	0.11,0.12,0.13, (filter #2)
    	0.14,0.15,0.16,
    	0.17,0.18,0.19
    
    input vector:
    	0,0.1,0.2,0.3,
    	0.4,0.5,0.6,0.7,
    	0.8,0.9,1,1.1,
    	1.2,1.3,1.4,1.5
    
    Layer 0 activation:
    	0.083,0.139,0.178,0.121,
    	0.198,0.303,0.348,0.225,
    	0.33,0.483,0.528,0.333,
    	0.181,0.253,0.274,0.163,
    
    	0.283,0.419,0.518,0.401,
    	0.568,0.853,0.988,0.715,
    	0.94,1.393,1.528,1.063,
    	0.701,1.013,1.094,0.763
    
    Layer 1 activation:
    	0.083,0.139,0.178,0.121,
    	0.198,0.303,0.348,0.225,
    	0.33,0.483,0.528,0.333,
    	0.181,0.253,0.274,0.163,
    
    	0.283,0.419,0.518,0.401,
    	0.568,0.853,0.988,0.715,
    	0.94,1.393,1.528,1.063,
    	0.701,1.013,1.094,0.763
    
    	(Leaky rectifier did nothing because all values were positive.)
    
    Layer 2 activation:
    	0.303,0.348,
    	0.483,0.528,
    
    	0.853,0.988,
    	1.393,1.528
    
    target:
    	0.7,0.6,
    	0.5,0.4,
    
    	0.3,0.2,
    	0.1,0
    
    Layer 2 blame:
    	0.397,0.252,
    	0.017,-0.128,
    
    	-0.553,-0.788,
    	-1.293,-1.528
    
    Layer 1 blame:
    	0,0,0,0,
    	0,0.397,0.252,0,
    	0,0.017,-0.128,0,
    	0,0,0,0,
    
    	0,0,0,0,
    	0,-0.553,-0.788,0,
    	0,-1.293,-1.528,0,
    	0,0,0,0
    
    Layer 0 blame:
    	0,0,0,0,
    	0,0.397,0.252,0,
    	0,0.017,-0.128,0,
    	0,0,0,0,
    
    	0,0,0,0,
    	0,-0.553,-0.788,0,
    	0,-1.293,-1.528,0,
    	0,0,0,0
    
    gradient:
    	0.538,
    	-0.032,0.0218,0.0756,
    	0.1832,0.237,0.2908,
    	0.3984,0.4522,0.506,
    
    	-4.162,
    	-1.36,-1.7762,-2.1924,
    	-3.0248,-3.441,-3.8572,
    	-4.6896,-5.1058,-5.522
    
    updated weights:
    	0.00538,
    	0.00968,0.020218,0.030756,
    	0.041832,0.05237,0.062908,
    	0.073984,0.084522,0.09506,
    
    	0.05838,
    	0.0964,0.102238,0.108076,
    	0.109752,0.11559,0.121428,
    	0.123104,0.128942,0.13478
    


  5. Q: Can you please add another convolutional layer to the spew to help us debug the backprop step?
    A: Yes.
    I trained a 4-layer neural network with this topology:
    	Layer 0: Convolution({4, 4}, {3, 3}, {4, 4})
    	Layer 1: Convolution({4, 4}, {3, 3, 2}, {4, 4, 2})
    	Layer 2: LeakyRectifier(4 * 4 * 2)
    	Layer 3: MaxPooling2D(4, 4, 2)
    
    Total number of weights: 30
    	3*3+1 + (3*3+1)*2 = 20
    	The "+1" is for the bias term.
    	There is one bias value for the filter in layer 0,
    	and one biasvalue for each of the 2 filters in layer 1.
    
    initial weights:
    	0,              (bias in layer 0)
    	0.01,0.02,0.03, (filter in layer 0)
    	0.04,0.05,0.06,
    	0.07,0.08,0.09,
    
    	0.1,            (bias for first filter in layer 1)
    	0.11,0.12,0.13, (first filter in layer 1)
    	0.14,0.15,0.16,
    	0.17,0.18,0.19
    
    	0.20,           (bias for second filter in layer 1)
    	0.21,0.22,0.23, (second filter in layer 1)
    	0.24,0.25,0.26,
    	0.27,0.28,0.29,
    
    
    input vector:
    	0,0.1,0.2,0.3,
    	0.4,0.5,0.6,0.7,
    	0.8,0.9,1,1.1,
    	1.2,1.3,1.4,1.5
    
    Layer 0 activation: 
    	0.083,0.139,0.178,0.121,
    	0.198,0.303,0.348,0.225,
    	0.33,0.483,0.528,0.333,
    	0.181,0.253,0.274,0.163
    
    Layer 1 activation: 
    	0.2279,0.31527,0.32242,0.24273,
    	0.35738,0.52116,0.52342,0.36627,
    	0.37058,0.53488,0.52774,0.36507,
    	0.27002,0.37003,0.36238,0.26085,
    
    	0.4002,0.54017,0.55382,0.42993,
    	0.61098,0.88016,0.88922,0.63957,
    	0.64538,0.92468,0.91874,0.65217,
    	0.49472,0.67493,0.66578,0.49065
    
    Layer 2 activation:
    	0.2279,0.31527,0.32242,0.24273,
    	0.35738,0.52116,0.52342,0.36627,
    	0.37058,0.53488,0.52774,0.36507,
    	0.27002,0.37003,0.36238,0.26085,
    
    	0.4002,0.54017,0.55382,0.42993,
    	0.61098,0.88016,0.88922,0.63957,
    	0.64538,0.92468,0.91874,0.65217,
    	0.49472,0.67493,0.66578,0.49065
    
    Layer 3 activation:
    	0.52116,0.52342,
    	0.53488,0.52774,
    	
    	0.88016,0.88922,
    	0.92468,0.91874
    
    target:
    	0.7,0.6,
    	0.5,0.4,
    
    	0.3,0.2,
    	0.1,0
    
    Layer 3 blame:
    	0.17884,0.07658,
    	-0.03488,-0.12774,
    
    	-0.58016,-0.68922,
    	-0.82468,-0.91874
    
    Layer 2 blame:
    	0,0,0,0,
    	0,0.17884,0.07658,0,
    	0,-0.03488,-0.12774,0,
    	0,0,0,0,
    
    	0,0,0,0,
    	0,-0.58016,-0.68922,0,
    	0,-0.82468,-0.91874,0,
    	0,0,0,0
    
    Layer 1 blame: 
    	0,0,0,0,
    	0,0.17884,0.07658,0,
    	0,-0.03488,-0.12774,0,
    	0,0,0,0,
    
    	0,0,0,0,
    	0,-0.58016,-0.68922,0,
    	0,-0.82468,-0.91874,0,
    	0,0,0,0
    
    Layer 0 blame: 
    	-0.1021612,-0.2424868,-0.2526264,-0.1485652,
    	-0.2912204,-0.6655076,-0.6947076,-0.3948608,
    	-0.3290468,-0.7531076,-0.7823076,-0.4446344,
    	-0.2285932,-0.5069644,-0.5260248,-0.2907052
    
    gradient: 
    	-6.65352,
    	-2.27731944,-3.09769472,-2.82596468,
    	-4.35582312,-5.5801104,-4.87145808,
    	-4.40816792,-5.51474528,-4.72076836,
    	
    	0.0928,
    	-0.02012312,-0.01653216,0.00021995999999997,
    	-0.01459476,-0.0034554,0.01851276,
    	0.05737384,0.08298856,0.08954992,
    	
    	-3.0128,
    	-0.58561972,-0.77292296,-0.68036924,
    	-1.03960116,-1.2990522,-1.09834164,
    	-0.90605436,-1.10450424,-0.91155168
    
    updated weights: 
    	-0.0665352,
    	-0.0127731944,-0.0109769472,0.0017403532,
    	-0.0035582312,-0.005801104,0.0112854192,
    	0.0259183208,0.0248525472,0.0427923164,
    
    	0.100928,
    	0.1097987688,0.1198346784,0.1300021996,
    	0.1398540524,0.149965446,0.1601851276,
    	0.1705737384,0.1808298856,0.1908954992,
    
    	0.169872,
    	0.2041438028,0.2122707704,0.2231963076,
    	0.2296039884,0.237009478,0.2490165836,
    	0.2609394564,0.2689549576,0.2808844832