Deciding When to Stop
=====================

In the previous tutorial, we set up a general optimization tasks which were
trained iteratively. This approach has two notable usage downsides compared
to the convenience of the one-step LDA- or SVM-trainer:

#. We have to decide when the accuracy achieved by the optimization steps
   is high enough.

#. We need to write more code to set up all parts.

While the second point is just a nuisance, the first point is a "real"
structural problem. In the LDA example, we did not have to bother
whether the solution was sufficiently exact, as the LDA problem can be
solved analytically.

Motivation
++++++++++

In general, choosing a good number of iterations
for an iterative optimizer touches on two issues:

* First, simply a computational point of view: we do not want to perform
  more iterations than necessary to reach a "good" solution. But we also
  don't want to stop before we found a "good" solution. 

* Second, stopping early also constitutes a way of regularizing the
  adaptation of a model to a training set. Hence, stopping even earlier
  than would be indicated solely by the training dataset might be desired
  in machine learning usage anyways.

One means of early-stopping that goes beyong picking an arbitrary
number of iterations is monitoring the performance on a validation
split, which needs to be created from the dataset in addition to
training and test split.

Neural-network training example
+++++++++++++++++++++++++++++++++

Overview
&&&&&&&&&&&&


This tutorial will introduce different stopping criteria. As an example,
we consider a slightly more complex learning task than in
the previous tutorials, namely classification with a simple feed-forward
neural network. You can learn more on neural networks in Shark in 
the :doc:`../algorithms/ffnet` tutorial. The code for this example can be found in
:doxy:`StoppingCriteria.cpp <StoppingCriteria.cpp>`.


We show how to create a trainer for this task which generalizes
important concepts and saves us manual work. Then, we construct and compare
three different stopping criteria for that trainer. To this end, we introduce
the ``AbstractStoppingCriterion``, another interface of Shark. In addition to
this tutorial, a concept tutorial on :doc:`../concepts/library_design/stopping_criteria` exists, 
which gives a more detailed explanation about how stopping criteria are implemented in shark.

Building blocks & includes
&&&&&&&&&&&&&&&&&&&&&&&&&&&&

We first list all includes for this tutorial and then motivate their
usage for each one::


	#include <shark/Data/Csv.h>
	#include <shark/Models/FFNet.h> //Feed forward neural network class
	#include <shark/Algorithms/GradientDescent/Rprop.h> //Optimization algorithm
	#include <shark/ObjectiveFunctions/Loss/CrossEntropy.h> //Loss used for training
	#include <shark/ObjectiveFunctions/Loss/ZeroOneLoss.h> //The real loss for testing.
	#include <shark/Algorithms/Trainers/OptimizationTrainer.h> // Trainer wrapping iterative optimization
	#include <shark/Algorithms/StoppingCriteria/MaxIterations.h> //A simple stopping criterion that stops after a fixed number of iterations
	#include <shark/Algorithms/StoppingCriteria/TrainingError.h> //Stops when the algorithm seems to converge
	#include <shark/Algorithms/StoppingCriteria/GeneralizationQuotient.h> //Uses the validation error to track the progress
	#include <shark/Algorithms/StoppingCriteria/ValidatedStoppingCriterion.h> //Adds the validation error to the value of the point
	

As before, ``Csv.h`` is included for reading in data. The header ``FFNet.h`` is needed
because we want to train a neural network to distinguish between two classes.
``Rprop`` is a fast and stable algorithm for gradient-based optimization of
a differentiable objective function. Since the 0-1-loss is not differentiable,
and would thus not be compatible with any gradient descent method including
Rprop, we instead use the ``CrossEntropy`` as surrogate loss. But for testing,
we still want to use and hence include the ``ZeroOneLoss``. As in the last
tutorial, the ``ErrorFunction`` binds together the model, dataset and the loss function.
For a given set of parameters, it returns the actual error of the model with this parameters
measured by the loss function on the dataset.
The remaining includes are needed for the different stopping
criteria we will examine.

Using an AbstractStoppingCriterion
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&

We want to use a feed-forward neural network with one hidden layer and two output
neurons for classification, and train it under three different stopping criteria:
a fixed number of iterations, progress on the training error, and progress on a
validation set. To facilitate our experiments, we create one single, local, auxiliary
function that takes an ``AbstractStoppingCriterion`` -- the base class of all
stopping criteria -- as an argument. This auxiliary function creates and
trains a neural network using the abstract stopping criterion. In
addition, instead of manually and explicitly coding an optimization loop as in
the previous examples, we use a so-called ``OptimizationTrainer`` that encapsulates
the entire training process given an ObjectiveFunction, Optimizer, and StoppingCriterion.
Overall, we use the following function to create, train and evaluate our neural
network under a given stopping criterion::


	template<class T>
	double experiment(AbstractStoppingCriterion<T> & stoppingCriterion, ClassificationDataset const& trainingset, ClassificationDataset const& testset){
		//create a feed forward neural network with one layer of 10 hidden neurons and one output for every class
		FFNet<LogisticNeuron,LinearNeuron> network;
		network.setStructure(inputDimension(trainingset),10,numberOfClasses(trainingset));
		initRandomUniform(network,-0.1,0.1);
	
		//The Cross Entropy maximises the activation of the cth output neuron 
		// compared to all other outputs for a sample with class c.
		CrossEntropy loss;
	
		//we use IRpropPlus for network optimization
		IRpropPlus optimizer;
		
		//create an optimization trainer and train the model
		OptimizationTrainer<FFNet<LogisticNeuron,LinearNeuron>,unsigned int > trainer(&loss, &optimizer, &stoppingCriterion);
		trainer.train(network, trainingset);
		
		//evaluate the performance on the test set using the classification loss we choose 0.5 as threshold since Logistic neurons have values between 0 and 1.
		
		ZeroOneLoss<unsigned int, RealVector> loss01(0.5);
		Data<RealVector> predictions = network(testset.inputs()); 
		return loss01(testset.labels(),predictions);
	}
	

To run the experiment, we need to load a dataset and split it into training, validation and test set:


		ClassificationDataset data;
		importCSV(data, "data/diabetes.csv",LAST_COLUMN, ',');
		data.shuffle();
		ClassificationDataset test = splitAtElement(data,static_cast<std::size_t>(0.75*data.numberOfElements()));
		ClassificationDataset validation = splitAtElement(data,static_cast<std::size_t>(0.66*data.numberOfElements()));
		

Evaluation
++++++++++

Now it is time to load some data and try out different stopping criteria.



Fixed number of iterations
&&&&&&&&&&&&&&&&&&&&&&&&&&


The simplest stopping heuristic is halting after a fixed number of iterations.
``MaxIterations`` then is the subclass of choice, which simply provides this
trivial functionality for within the framework of an AbstractStoppingCriterion.
We try out several different numbers of steps::


		MaxIterations<> maxIterations(10);
		double resultMaxIterations1 = experiment(maxIterations,data,test);
		maxIterations.setMaxIterations(100);
		double resultMaxIterations2 = experiment(maxIterations,data,test);
		maxIterations.setMaxIterations(500);
		double resultMaxIterations3 = experiment(maxIterations,data,test);
		

Progress on training error
&&&&&&&&&&&&&&&&&&&&&&&&&&

Next we employ a stopping criterion that monitors progress on the
training error :math:`E`. The stopping criterion ``TrainingError``
takes in its constructor a window size (or number of time steps)
:math:`T` together with a threshold value :math:`\epsilon`. If the
improvement over the last :math:`T` timesteps does not exceed
:math:`\epsilon`, that is, :math:`E(t-T)-E(t) < \epsilon`, the
stopping criterion becomes active and tells the optimizer to stop
(because it assumes that progress over subsequent optimization steps
will be negligible as well). Note that a danger when using this
stopping criterion is that it may stop optimization even when the
algorithm only traverses a plateau or saddle
point. However, the optimizer used here, ``IRpropPlus``, dynamically
adapts it step size and and hence is somewhat less vulnerable to these
problems. After all the groundwork has been done, we can test this
stopping criterion with only two lines of code::


		TrainingError<> trainingError(10,1.e-5);
		double resultTrainingError = experiment(trainingError,data,test);
		


Progress on a validation set
&&&&&&&&&&&&&&&&&&&&&&&&&&&&


To use validation error information, we need to define an additional validation error
function. In the simplest case, this is just an error function using the same objects
as that on the training set, but a different dataset. For simplicity of the tutorial,
we will instead just create it from scratch. The class that takes the current point
of the search space from the optimizer and passes it on the the evaluation error function
is the so-called ``ValidatedStoppingCriterion``. It constructor takes as argument not
only the validation error function, but also another stopping criterion, to which the
result of the validation run is passed and which is prepared to make its decision based
on both training and validation information. In this example, we will use the
``GeneralizationQuotient`` as such a stopping criterion. In detail, it calculates the
ratio of two other criteria to reach its decision, and hence we refer to the class
documentation for an exact description, as well as the scientific publication
mentioned therein.

In summary, this code uses the progress on a validation set to decide when to stop::


		FFNet<LogisticNeuron,LogisticNeuron> network;
		network.setStructure(inputDimension(data),10,numberOfClasses(data));
		CrossEntropy loss;
		ErrorFunction validationFunction(validation,&network,&loss);
		
		GeneralizationQuotient<> generalizationQuotient(10,0.1);
		ValidatedStoppingCriterion validatedLoss(&validationFunction,&generalizationQuotient);
		double resultGeneralizationQuotient = experiment(validatedLoss,data,test);
		



Printing the results
++++++++++++++++++++

Printing all variables of type ``double`` defined in the snippets above, we get


		cout << "RESULTS: " << endl;
		cout << "======== \n" << endl;
		cout << "10 iterations   : " << resultMaxIterations1 << endl;
		cout << "100 iterations : " << resultMaxIterations2 << endl;
		cout << "500 iterations : " << resultMaxIterations3 << endl;
		cout << "training Error : " << resultTrainingError << endl;
		cout << "generalization Quotient : " << resultGeneralizationQuotient << endl;
		



So stopping after around 100 iterations yielded the lowest error on the test
set. The TrainingError criterion will, as predicted, wait a lot longer. The
GeneralizationQuotient does in fact stop too early in this case, which is very
likely due to the small size of the data set used in the example code.



What you learned
++++++++++++++++


You should have learned the following aspects in this Tutorial:

* How to train a feed forward neural network
* How to create a trainer from a general optimization task
* That the choice of stopping criterion matters.



What next?
++++++++++


Now you should be ready to leave the "first steps" section of the tutorials
and read through its other sections, which will tell you about various
aspects of the library in more detail.
