Sparse Autoencoder
============================

This tutorial builds up on the previous :doc:`./autoencoders` tutorial. 
In the previous tutorials in the series on autoencoders, we have discussed to regularize autoencoders by either the number 
of hidden units, tying their weights, adding noise on the inputs, are dropping hidden units by setting them randomly to 0. 
There are more direct regularisation schemes which try to directly regularize the representation of the hidden units.
This can be done by using a regularization term summing the KL-divergence

.. math ::
   KL(\rho \| \rho_j) = \rho \log(\frac{\rho}{\rho_j}) +
   (1 - \rho) \log(\frac{1-\rho}{1-\rho_j})

over all hidden neurons indexed by :math:`j`.  The regularization
parameter :math:`\rho` defines a target mean activation and the mean
activation :math:`\rho_j` of hidden neuron :math:`j` over the whole
training dataset is interpreted as activation probability. With this
constraint, the optimisation problem can be stated as:

.. math ::
	\min_{\theta} \frac 1 N \sum_i^N (\vec x_i - f_{\theta}(\vec x_i))^2 +\sum_{j=1}^m KL(\rho \| \rho_j) 

The idea ofthe constraint is to make the hidden representation more sparse by giving the units a very small 
expected activation, which ideally is close to 0 most of the time. These autoencoders are therefore called
sparse autoencoders.

For a detailed introduction to sparse autoencoders, we refer to the `Stanford  UFLDL Tutorial 
<http://ufldl.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder>`_. 
In the following, we show how to solve the autoencoder exercise from
this recommended tutorial using Shark.


Sparse Autoencoder in Shark
---------------------------

For this tutorial, the following (lengthy) list of includes is needed::


	#include <shark/Data/Pgm.h> //for exporting the learned filters
	#include <shark/Data/Csv.h>//for reading in the images as csv
	#include <shark/Data/Statistics.h> //for normalization
	#include <shark/ObjectiveFunctions/SparseAutoencoderError.h>//the error function performing the regularisation of the hidden neurons
	#include <shark/Algorithms/GradientDescent/LBFGS.h>// the L-BFGS optimization algorithm
	#include <shark/ObjectiveFunctions/Loss/SquaredLoss.h> // squared loss used for regression
	#include <shark/ObjectiveFunctions/Regularizer.h> //L2 regulariziation
	

Edges in natural images
^^^^^^^^^^^^^^^^^^^^^^^

As an example of finding features with a sparse autoencoder, we will
use natural images as input. We will then see how the autoencoder
discovers edges as a good representation of natural images.

The images we will use are the same as in the `Stanford Tutorial
<http://ufldl.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder>`_.


Generating a training set
^^^^^^^^^^^^^^^^^^^^^^^^^

In order to generate a training set, we will randomly select image patches
of equal size (we use *8x8*) and use each patch as a data point::


	const unsigned int numsamples = 10000; //number of generated patches
	const unsigned int w = 512;//width of loaded image
	const unsigned int h = 512;//height of loaded image
	const unsigned int psize = 8;//size of a patch
	
		// Read images
		UnlabeledData<RealVector> images;
		importCSV(images, "data/images.csv");
		unsigned int n = images.numberOfElements(); // number of images
		
		// Sample equal amount of patches per image
		size_t patchesPerImg = numsamples / n;
		typedef UnlabeledData<RealVector>::element_range::iterator ElRef;
		
		// Create patches
		vector<RealVector> patches;
		for (ElRef it = images.elements().begin(); it != images.elements().end(); ++it) {
			for (size_t i = 0; i < patchesPerImg; ++i) {
				// Upper left corner of image
				unsigned int ulx = Rng::discrete(0,w-psize-1);
				unsigned int uly = Rng::discrete(0,w-psize-1);
				// Transform 2d coordinate into 1d coordinate and get the sample
				unsigned int ul = ulx * h + uly;
				RealVector sample(psize * psize);
				const RealVector& img = *it;
				for (size_t j = 0; j < psize; ++j)
					for (size_t k = 0; k < psize; ++k)
						sample(j * psize + k) = img(ul + k + j * h);
				patches.push_back(sample);
			}
		}
		
		UnlabeledData<RealVector> samples = createDataFromRange(patches);
		

This creates a data set containing *10000* patches of size *8x8* transformed
into a vector with *64* elements. Before we can use this data set we need
to normalize the data points. We first truncate outliers to *+/- 3* standard
deviations, and then normalize to the range :math:`[0.1, 0.9]`. We also need
to make it a regression data set with same input and target values::


		// zero mean
		RealVector meanvec = mean(samples);
		samples = transform(samples, Shift(-meanvec));
	
		// Remove outliers outside of +/- 3 standard deviations
		// and normalize to [0.1, 0.9]
		RealVector pstd = 3 * sqrt(variance(samples));
		samples = transform(samples, TruncateAndRescale(-pstd, pstd, 0.1, 0.9));
		

We encapsulate the above code in a function for brevity::


	UnlabeledData<RealVector> getSamples()
	


Sparse autoencoder objective function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We first load our data and generate a dataset with labels equals to the points::


		UnlabeledData<RealVector> samples = getSamples();
		RegressionDataset data(samples, samples);
		

This does not create an additional copy of the samples but merely generates 
another reference to every single point.
In order to create the objective function for the sparse autoencoder,
we first need to create an instance of the :doxy:`Autoencoder` class::


	const unsigned int numhidden = 25;
	const double rho = 0.01; // Sparsity parameter
	const double beta = 6.0; // Regularization parameter
	const double lambda = 0.0002; // Weight decay paramater
	
		Autoencoder<LogisticNeuron, LogisticNeuron> model;
		model.setStructure(psize * psize, numhidden);
		initializeFFNet(model);
		

We can also use the :doxy:`TiedAutoencoder` here instead!
We then need to add the sparsity constraint to the regression loss
using the :doxy:`SparseAutoencoderError` as well as atwo-norm-regularisation::


		SquaredLoss<RealVector> loss;
		SparseAutoencoderError error(data,&model, &loss, rho, beta);
		// Add weight regularization
		TwoNormRegularizer regularizer(error.numberOfVariables());
		error.setRegularizer(lambda,&regularizer);
		

This creates the entire objective function for the sparse autoencoder,
with sparsity constraint and weight regularization.


Training the autoencoder
^^^^^^^^^^^^^^^^^^^^^^^^

In order to train the autoencoder we use the limited memory BFGS (L-BFGS)
algorithm with a line search satisfying the Wolfe conditions. We also need
to chose a starting point for the optimization. For this we use values
uniformly taken from :math:`[-r, r]` for the weights and :math:`0` for the
biases, with

.. math ::
    r = \frac{\sqrt{6}}{n_{in} + n_{out} + 1}

where :math:`n_{in}` and :math:`n_{out}` is the number of input and output
values per neuron.

The training is then done as follows: ::


	const unsigned int maxIter = 400;
	
		LBFGS optimizer;
		optimizer.lineSearch().lineSearchType() = LineSearch::WolfeCubic;
		optimizer.init(error);
		
		for (unsigned int i = 0; i < maxIter; ++i) {
			optimizer.step(error);
			cout << "Error: " << optimizer.solution().value << endl;
		}
		

In our trials we got final error values around 0.8 to 0.9.


Visualizing the autoencoder
^^^^^^^^^^^^^^^^^^^^^^^^^^^

After training, each row of the first weight matrix, :math:`W1`, will
correspond to a feature learned by the autoencoder. To visualize these
features, we export each row as an *8x8* PGM image using the PGM library
of Shark::


		exportFiltersToPGMGrid("features",model.encoderMatrix(),psize,psize);
		

We got the following result:

.. figure:: ../images/features.png
  :width: 400
  :height: 400
  :alt: Plot of features learned by the autoencoder


Full example program
--------------------

A complete program performing the above steps is :doxy:`SparseAETutorial.cpp`.
