Back to the table of contents

Previous Next

Neural network examples

This document gives examples for writing code to use the neural network class in Waffles. Let's begin with an overview of the most important classes:

Some traditional textbooks consider each layer in a neural network to contain both weights and a nonlinearity (a.k.a. activation function, a.k.a. transfer function). To enable greater flexibility, we treat these as two separate blocks of network units.
Each GBlock object represents a block of network units. There are many types of blocks: GBlockLinear contains a fully-connected set of weights. GBlockTanh contains tanh units. Etc.
A GNeuralNet combines many GBlock objects.
You can call GNeuralNet::add to add a block to your neural network as its own layer. It will be added on the output-most end.
Alternatively, you can call GNeuralNet::concatenate to append a block to the last existing layer.
GNeuralNet is also a type of GBlock, so you can nest entire neural networks inside of neural networks if you want to.
GNeuralNetOptimizer is the base class for methods that train the weights of a GNeuralNet. There are several types of GNeuralNetOptimizer. For example, GSGDOptimizer trains the neural net with stochastic gradient descent.
GNeuralNetLearner is a convenience wrapper around a GNeuralNet and a GSGDOptimizer. It implements GSupervisedLearner, so it makes it easy to compare a neural network with other learning algorithms in Waffles.

Logistic regression

Logistic regression is fitting your data with a single layer of logistic units. Here are the #includes that we are going to need for this example:

	#include <GClasses/GActivation.h>
	#include <GClasses/GMatrix.h>
	#include <GClasses/GNeuralNet.h>
	#include <GClasses/GRand.h>
	#include <GClasses/GTransform.h>

We are going to need some data, so let's load some data from an ARFF file. Let's use Iris, a well-known dataset for machine learning examples:

	GMatrix data;
	data.loadArff("iris.arff");

"data" is a 150x5 matrix. Next, we need to divide this data into a feature matrix (the inputs) and a label matrix (the outputs):

	GDataColSplitter cs(data, 1); // the "iris" dataset has only 1 column of "labels"
	const GMatrix& inputs = cs.features();
	const GMatrix& outputs = cs.labels();

"inputs" is a 150x4 matrix of real values, and "outputs" is a 150x1 matrix of categorical values. Neural networks typically only support continuous values, but the labels in the iris dataset are categorical, so we will convert them to use a real-valued representation (also known as a categorical distribution, a one-hot representation, or binarized representation):

	GNominalToCat nc;
	nc.train(outputs);
	GMatrix* pRealOutputs = nc.transformBatch(outputs);

pRealOutputs points to a 150x3 matrix of real values. Now, lets further divide our data into a training portion and a testing portion:

	GRand r(0);
	GDataRowSplitter rs(inputs, *pRealOutputs, r, 75);
	const GMatrix& trainingFeatures = rs.features1();
	const GMatrix& trainingLabels = rs.labels1();
	const GMatrix& testingFeatures = rs.features2();
	const GMatrix& testingLabels = rs.labels2();

Now, we are ready to train a layer of logistic units that takes 4 inputs and gives 3 outputs. The activation function is specified as a separate layer:

	GNeuralNet nn;
	nn.add(new GBlockLinear(4, 3));
	nn.add(new GBlockLogistic(3));

To train our model, we will create an optimizer. We wil use stochastic gradient descent (SGD). We also set the learning rate here:

	GRand(0);
	GSGDOptimizer optimizer(nn, rand);
	optimizer.setLearningRate(0.05);
	optimizer.optimizeWithValidation(trainingFeatures, trainingLabels);

Let's test our model to see how well it performs:

	double sse = nn.measureLoss(optimizer.weights(), testingFeatures, testingLabels);
	double mse = sse / testingLabels.rows();
	double rmse = sqrt(mse);
	std::cout << "The root-mean-squared test error is " << to_str(rmse) << "\n";

Finally, don't forget to delete pRealOutputs:

	delete(pRealOutputs);

Or, preferably:

	std::unique_ptr<GMatrix> hRealOutputs(pRealOutputs);

Classification

The previous example was not actually very useful because root-mean-squared only tells us how poorly the neural network fit to our continuous representation of the data. It does not really tell us how accurate the neural network is for classifying this data. So, instead of transforming the data to meet the model, we can transform the model to meet the data. Specifically, we can use the GAutoFilter class to turn the neural network into a classifier:

	GNeuralNetLearner learner;
	learner.nn().add(new GBlockLinear(4, 3));
	learner.nn().add(new GBlockLogistic(3));
	GAutoFilter af(&learner, false); // false means the auto-filter
		does not need to "delete(&nn)" when it is destroyed.

Now, we can train the auto-filter using the original data (with nominal labels). We no longer need to explicitly train the neural network because when we train the auto-filter, it trains the inner model.

	af.train(inputs, outputs);

The auto-filter automatically filters the data as needed for its inner model, but ultimately behaves as a model that can handle whatever types of data you have available. In this case, it turns a neural network into a classifier, since "outputs" contains 1 column of categorical values. So, now we can obtain the misclassification rate as follows:

	double mis = af.sumSquaredError(inputs, outputs);
	std::cout << "The model misclassified " << to_str(mis)  <<
		" out of " << to_str(outputs.rows()) << " instances.\n";

Why does a method named "sumSquaredError" return the total number of misclassifications? Because it uses Hamming distance for categorical labels, which reports a squared-error of 1 for each misclassification.

(Note that if you are training a big network with big data, then efficiency may be critical. In such cases, it is generally better to use the approach of transforming the data to meet the model, so that it does not waste a lot of time transformaing data during training.)

Stopping criteria

The GNeuralNetOptimizer::optimizeWithValidation method divides the training data into a training portion and a validation portion. The default uses 65% for training and 35% for validation. Suppose you wanted to change this ratio to 60/40. This would be done as follows:

	optimizer.optimizeWithValidation(features, labels, 0.4);

By default, training continues until validation accuracy does not improve by 0.2% over a window of 100 epochs. If you wanted to change this to 0.1% over a window of 10 epochs, then you could do this prior to calling optimizeWithValidation:

	optimizer.setImprovementThresh(0.001);
	optimizer.setWindowSize(10);

You can also train for a set number of epochs instead of using validation. For example, to optimize for 1000 epochs:

	optimizer.setEpochs(1000);
	optimizer.optimize(features, labels);

By default, optimize will run stochastically through the entire set of training samples each epoch. To use a mini-match instead, set the batch size and (optionally) the number of batches per epoch before calling optimize:

	optimizer.setBatchSize(25);
	optimizer.setBatchesPerEpoch(4);

Serialization

You can write your neural network to a file:

GDom doc;
doc.setRoot(nn.serialize(&doc));
doc.saveJson("my_neural_net.json");

Then, you can load it from the file again:

GDom doc;
doc.loadJson("my_neural_net.json");
GNeuralNet* pNN = new GNeuralNet(doc.root());

MNIST

A popular test for a neural network is the MNIST dataset. (Click here to download the data.) And, here is some code that trains a neural network with this data:

#include <iostream>
#include <cmath>
#include <GClasses/GApp.h>
#include <GClasses/GError.h>
#include <GClasses/GNeuralNet.h>
#include <GClasses/GActivation.h>
#include <GClasses/GTransform.h>
#include <GClasses/GTime.h>
#include <GClasses/GVec.h>

using namespace GClasses;
using std::cerr;
using std::cout;

int main(int argc, char *argv[])
{
	// Load the data
	GMatrix train;
	train.loadArff("/somepath/data/mnist/train.arff");
	GMatrix test;
	test.loadArff("/somepath/data/mnist/test.arff");
	GMatrix rawTestLabels(test, 0, test.cols() - 1, test.rows(), 1);

	// Preprocess the data
	GDataPreprocessor dpFeat(train,
		0, 0, // rowStart, colStart
		train.rows(), train.cols() - 1, // rowCount, colCount
		false, false, true, // allowMissing, allowNominal, allowContinuous
		-1.0, 1.0); // minVal, maxVal
	dpFeat.add(test, 0, 0, test.rows(), test.cols() - 1);
	GDataPreprocessor dpLab(train,
		0, train.cols() - 1, // rowStart, colStart
		train.rows(), 1, // rowCount, colCount
		false, false, true, // allowMissing, allowNominal, allowContinuous
		-1.0, 1.0); // minVal, maxVal
	dpLab.add(test, 0, test.cols() - 1, test.rows(), 1);
	GMatrix& trainFeatures = dpFeat.get(0);
	GMatrix& trainLabels = dpLab.get(0);
	GMatrix& testFeatures = dpFeat.get(1);

	// Make a neural network
	GNeuralNet nn;
	nn.add(new GBlockLinear(28 * 28, 80), new GBlockTanh(80),
		new GBlockLinear(80, 30), new GBlockTanh(30),
		new GBlockLinear(30, 10), new GBlockTanh(10));

	// Print some info
	cout << "% Training patterns: " << to_str(trainFeatures.rows()) << "\n";
	cout << "% Testing patterns: " << to_str(testFeatures.rows()) << "\n";
	cout << "% Topology:\n";
	cout << nn.to_str("% ") << "\n";
	cout << "@RELATION neural_net_training\n";
	cout << "@ATTRIBUTE misclassification_rate real\n";
	cout << "@ATTRIBUTE elapsed_time real\n";
	cout << "@DATA\n";

	// Train
	GRand rand(0);
	GSGDOptimizer optimizer(nn, rand, &trainFeatures, &trainLabels);
	optimizer.setLearningRate(0.01);
	double starttime = GTime::seconds();
	for(size_t epoch = 0; epoch < 10; epoch++)
	{
		double mis = nn.measureLoss(optimizer.weights(), testFeatures, rawTestLabels);
		cout << to_str((double)mis / testFeatures.rows()) << "," << to_str(GTime::seconds() - starttime) << "\n";
		cout.flush();
		optimizer.optimizeEpoch();
	}
	return 0;
}

Here are the results that I get:

% Training patterns: 60000
% Testing patterns: 10000
% Topology:
% [GNeuralNet: 784->10, Weights=65540
%    0) [GBlockLinear: 784->80, Weights=62800]
%    1) [GBlockTanh: 80->80, Weights=0]
%    2) [GBlockLinear: 80->30, Weights=2430]
%    3) [GBlockTanh: 30->30, Weights=0]
%    4) [GBlockLinear: 30->10, Weights=310]
%    5) [GBlockTanh: 10->10, Weights=0]
% ]
@RELATION neural_net_training
@ATTRIBUTE misclassification_rate real
@ATTRIBUTE elapsed_time real
@DATA
0.9243,0.4613778591156
0.0622,10.968685865402
0.0509,21.50560092926
0.0422,32.133005857468
0.0406,42.690910816193
0.043,53.197825908661
0.0337,63.745694875717
0.037,74.263633966446
0.0367,85.197949886322
0.0338,95.829666852951

The left-most column shows that we get 338/10000 misclassifications in just 96 seconds of training on a modest computer. You can get much better accuracy using bigger layers, but then training will take longer too.

If you want better results, and you are willing to wait for days to get your results, you can use a neural network with a bigger topology:

	// Make a neural network
	GNeuralNet nn;
	nn.add(new GBlockConv({28, 28}, {5, 5, 32}, {28, 28, 32}));
	nn.add(new GBlockLeakyRectifier(28 * 28 * 32));
	nn.add(new GBlockMaxPooling2D(28, 28, 32));
	nn.add(new GBlockConv({14, 14, 32}, {5, 5, 32, 64}, {14, 14, 1, 64}));
	nn.add(new GBlockLeakyRectifier(14 * 14 * 64));
	nn.add(new GBlockMaxPooling2D(14, 14, 64));
	nn.add(new GBlockLinear(7 * 7 * 64, 1000));
	nn.add(new GBlockLeakyRectifier(1000));
	nn.add(new GBlockLinear(1000, 10));
	nn.add(new GBlockLeakyRectifier(10));

Results:

% Training patterns: 60000
% Testing patterns: 10000
% Topology:
% [GNeuralNet: 784->10, Weights=3199106
%    0) [GBlockConv: 784->25088, Weights=832]
%    1) [GBlockLeakyRectifier: 25088->25088, Weights=0]
%    2) [GBlockMaxPooling2D: 25088->6272, Weights=0]
%    3) [GBlockConv: 6272->12544, Weights=51264]
%    4) [GBlockLeakyRectifier: 12544->12544, Weights=0]
%    5) [GBlockMaxPooling2D: 12544->3136, Weights=0]
%    6) [GBlockLinear: 3136->1000, Weights=3137000]
%    7) [GBlockLeakyRectifier: 1000->1000, Weights=0]
%    8) [GBlockLinear: 1000->10, Weights=10010]
%    9) [GBlockLeakyRectifier: 10->10, Weights=0]
% ]
@RELATION neural_net_training
@ATTRIBUTE misclassification_rate real
@ATTRIBUTE elapsed_time real
@DATA
0.902,406.85959005356
0.0083,8818.7493638992
0.0069,17199.837296009
0.0074,25583.630297899
0.0052,33968.527590036
0.0052,42350.959198952
0.004,50734.125929832
0.0045,59116.222265959
0.0046,67543.822458982
0.0047,75979.265183926
0.0041,84432.53459096
0.0035,92888.616713047

Training more directly

If you want more fine-grained control, you can train your neural network manually, instead of using one of the pre-built optimizers. Here are some changes to the training section of the previous example to train the neural network with more direct calls:

	// Train
	GRand rand(0);
	GVec weights(nn.weightCount());
	GVec grad(nn.gradCount());
	grad.fill(0.0);
	nn.initWeights(rand, weights);
	double learningRate = 0.01;
	double momentum = 0.0;
	GRandomIndexIterator ii(trainFeatures.rows(), rand);
	double starttime = GTime::seconds();
	for(size_t epoch = 0; epoch < 10; epoch++)
	{
		double mis = nn.measureLoss(weights, testFeatures, rawTestLabels);
		cout << to_str((double)mis / testFeatures.rows()) << "," << to_str(GTime::seconds() - starttime) << "\n";
		cout.flush();
		ii.reset();
		size_t index;
		while(ii.next(index))
		{
			nn.forwardProp(weights, trainFeatures[index]);
			nn.computeBlame(trainLabels[index]);
			nn.backpropagate(weights);
			grad *= momentum;
			nn.updateGradient(weights, grad);
			nn.step(grad, weights, learningRate);
		}
	}
	return 0;

If you want to get more direct than that, you will probably have to start digging into the code itself. I have worked hard to keep the code clean and easy to read, but I would welcome suggestions for improving it.

Previous Next

Back to the table of contents