GClasses
GClasses::GSelfOrganizingMap Class Reference

Detailed Description

An implementation of a Kohonen self-organizing map.

Note: does not support more than 2^52 nodes – I don't believe this will be a problem within the next 20 years.

See: T. Kohonen "Self Organizing Maps" Third Edition, 2001, published by Springer

#include <GSelfOrganizingMap.h>

Inheritance diagram for GClasses::GSelfOrganizingMap:
GClasses::GIncrementalTransform GClasses::GTransform

Public Member Functions

 GSelfOrganizingMap (int nMapDims, int nMapWidth, GRand &rand, SOM::Reporter *r=new SOM::NoReporting())
 Creates a map whose nodes are on a grid that uses euclidean distance to find the nearest point and is trained with the batch SOM algorithm with neighborhood decreasing exponentially from 2*nMapWidth to 1 in 100 super-epochs with each super-epoch involving waiting up to 100 iterations for convergence at that neighborhood width. The neighbors affect one another through a unit height Gaussian with sigma=neighborhood size. Every epoch and super-epoch, the reporter is notified. This map will be responsible for deleting the dynamically allocated reporter. More...
 
 GSelfOrganizingMap (const std::vector< double > &outputAxes, std::size_t numNodes, SOM::NodeLocationInitialization *topology, SOM::TrainingAlgorithm *trainer, GDistanceMetric *weightDistance, GDistanceMetric *nodeDistance)
 Create a self-organizing map. More...
 
 GSelfOrganizingMap (const GDomNode *pNode)
 Reconstruct this self-organizing map from its serialized form in a dom document. More...
 
virtual ~GSelfOrganizingMap ()
 
std::vector< std::size_t > bestData (const GMatrix *pData) const
 Given a matrix containing input data of the correct dimensions, returns a vector v of nodes.size() indices into that matrix. v[i] is the index of the data in pData that gives the strongest response for node i. That is, pData->row(v[i]) is the data element that best matches node i. More...
 
std::size_t bestMatch (const GVec &pIn) const
 Return the index of the node whose weight vector best matches in under the distance metric for this SOM. Assumes there is at least one node. More...
 
unsigned inputDimensions () const
 Return the number of dimensions this takes as input. More...
 
std::vector< std::size_t > nearestNeighbors (unsigned nodeIdx, unsigned numNeighbors, double epsilon=1e-8) const
 Inspector giving the indices of the nearest neighbors in output space of the node at index nodeIdx. If any neighbor with a given distance is returned all neighbors at that distance will be returned. Any distances within epsilon of one another are considered equivalent. More...
 
std::vector< SOM::NodeAndDistanceneighborsInCircle (unsigned nodeIdx, double radius) const
 Return all neighbors of the node at that have a distance from nodeIdx of less than radius. More...
 
const GDistanceMetricnodeDistance () const
 Inspector for the distance metric used in output space, that is, between two nodes for their relative influence. More...
 
const std::vector< SOM::Node > & nodes () const
 Inspector for the nodes making up this map. More...
 
std::vector< SOM::Node > & nodes ()
 Accessor for the nodes making up this map. More...
 
std::vector< double > outputAxes () const
 Return the maximum for each output axis. More...
 
size_t outputDimensions () const
 Return the number of dimensions this returns as output. More...
 
virtual GMatrixreduce (const GMatrix &in) override
 Transforms pIn after training on it. More...
 
virtual GDomNodeserialize (GDom *) const override
 Add this map to a dom document and return the pointer to the tree added.

WARNING: the current imlementation DOES NOT SERIALIZE THE TRAINING ALGORITHM - do not train a GSelfOrganizingMap obtained from a dom file. More...
 
virtual GRelationtrainInner (const GMatrix &in) override
 see comment on GIncrementalTransform::train(GMatrix&) More...
 
virtual GRelationtrainInner (const GRelation &in) override
 Throws an exception (because this transform cannot be trained without data) More...
 
void transform (const GVec &pIn, GVec &pOut) override
 Transform the given vector from input coordinates to map coordinates by finding the best match among the nodes. More...
 
virtual void untransform (const GVec &pIn, GVec &pOut) override
 Throws an exception (because this transform cannot be reversed). More...
 
virtual void untransformToDistribution (const GVec &pIn, GPrediction *pOut) override
 Throws an exception (because this transform cannot be reversed). More...
 
const GDistanceMetricweightDistance () const
 Inspector for the distance metric used in input space, that is, between an input point and a weight for determining the winner. More...
 
- Public Member Functions inherited from GClasses::GIncrementalTransform
 GIncrementalTransform ()
 
 GIncrementalTransform (const GDomNode *pNode)
 
virtual ~GIncrementalTransform ()
 
const GRelationafter () const
 Returns a relation object describing the data after it is transformed. More...
 
const GRelationbefore () const
 Returns a relation object describing the data before it is transformed. More...
 
GVecinnerBuf ()
 Returns a buffer of sufficient size to store an inner (transformed) vector. The caller should not to delete the buffer. The same buffer will be returned each time. More...
 
void setAfter (GRelation *pRel)
 Sets the after relation. Takes ownership of pRel. More...
 
void setBefore (GRelation *pRel)
 Sets the before relation. Takes ownership of pRel. More...
 
void train (const GMatrix &data)
 Trains the transform on the data in pData. (This method may be a no-op for transformations that always behave in the same manner.) More...
 
void train (const GRelation &pRelation)
 "Trains" the transform without any data. More...
 
virtual GMatrixtransformBatch (const GMatrix &in)
 This assumes that train has already been called, and transforms all the rows in in returning the resulting matrix. The caller is responsible for deleting the new matrix. More...
 
virtual std::unique_ptr< GMatrixuntransformBatch (const GMatrix &in)
 This assumes train was previously called, and untransforms all the rows in pIn and returns the results. More...
 
- Public Member Functions inherited from GClasses::GTransform
 GTransform ()
 
 GTransform (const GDomNode *pNode)
 
virtual ~GTransform ()
 

Static Public Member Functions

static void test ()
 Performs unit tests for this class. Throws an exception if there is a failure. More...
 
- Static Public Member Functions inherited from GClasses::GIncrementalTransform
static void test ()
 Performs unit tests for this class. Throws an exception if there is a failure. More...
 

Protected Member Functions

void invalidateSortedNeighbors ()
 Marks sortedNeighbors as invalid and need of regeneration. More...
 
void regenerateSortedNeighbors () const
 Eliminates the current contents of m_sortedNeighbors and regenerates it from m_pNodeDistance and m_nodes. sets m_sortedNeighborsIsValid on completion. Note that the members this method changes are mutable, thus it can be marked as a const method and called from const methods. More...
 
const std::vector< std::vector< SOM::NodeAndDistance > > & sortedNeighbors () const
 Return a sorted neighbor structure in which entry i is a list of the node indices sorted by their distance from node i. More...
 
- Protected Member Functions inherited from GClasses::GIncrementalTransform
virtual GDomNodebaseDomNode (GDom *pDoc, const char *szClassName) const
 Child classes should use this in their implementation of serialize. More...
 

Protected Attributes

unsigned m_nInputDims
 Number of input dimensions. More...
 
std::vector< SOM::Nodem_nodes
 The nodes in the map. Do not change the order of nodes once sortedNeighbors has been created unless you also invalidate sortedNeighbors. Remember to call m_pWeightDistance->init if you change the number of weights. More...
 
std::vector< double > m_outputAxes
 Vector containing the maximum number in each output axis - so axis #0 would have a range 0..m_vOutputAxes[0]. Remember to call m_pNodeDistance->init whenever you change the number of output axes. More...
 
GDistanceMetricm_pNodeDistance
 The distance function to use when calculating the distance between nodes in the map. If you change this, remember to invalidate the neighbor structure. - owned by this object. Remember to call init whenever you change the number of output axes. More...
 
SOM::TrainingAlgorithmm_pTrainer
 The algorithm to be used to train this map on the next call to train. This object owns the training algorithm and is responsable for deleting it. More...
 
GDistanceMetricm_pWeightDistance
 The distance function to use when finding the closest weight to a given input point - owned by this object. Remember to call init on this whenever you change the number of weights. More...
 
std::vector< std::vector< SOM::NodeAndDistance > > m_sortedNeighbors
 Each entry sortedNeighbors[i] contains a vector of all the other nodes sorted by their distance from i in outputSpace. More...
 
bool m_sortedNeighborsIsValid
 

Friends

class SOM::TrainingAlgorithm
 

Constructor & Destructor Documentation

GClasses::GSelfOrganizingMap::GSelfOrganizingMap ( int  nMapDims,
int  nMapWidth,
GRand rand,
SOM::Reporter r = new SOM::NoReporting() 
)

Creates a map whose nodes are on a grid that uses euclidean distance to find the nearest point and is trained with the batch SOM algorithm with neighborhood decreasing exponentially from 2*nMapWidth to 1 in 100 super-epochs with each super-epoch involving waiting up to 100 iterations for convergence at that neighborhood width. The neighbors affect one another through a unit height Gaussian with sigma=neighborhood size. Every epoch and super-epoch, the reporter is notified. This map will be responsible for deleting the dynamically allocated reporter.

nMapDims specifies the number of dimensions for the map.

nMapWidth specifies the size in one dimension of the map.

(so if nMapDims is 3 and nMapWidth is 10, the map will contain 1000 (10^3) nodes.)

GClasses::GSelfOrganizingMap::GSelfOrganizingMap ( const std::vector< double > &  outputAxes,
std::size_t  numNodes,
SOM::NodeLocationInitialization topology,
SOM::TrainingAlgorithm trainer,
GDistanceMetric weightDistance,
GDistanceMetric nodeDistance 
)

Create a self-organizing map.

Parameters
outputAxeseach node in the network will have a vector location and component i of that vector will range from 0..outputAxes[i]
numNodesthe number of nodes in the network
topologydetermines the locations assigned to each node
traineralgorithm to train the network on data
weightDistancethe distance used in determining which node's weight-set is closest to an input point
nodeDistancethe distance used to determine the influence of two nodes with different coordinates on one another.
GClasses::GSelfOrganizingMap::GSelfOrganizingMap ( const GDomNode pNode)

Reconstruct this self-organizing map from its serialized form in a dom document.

virtual GClasses::GSelfOrganizingMap::~GSelfOrganizingMap ( )
virtual

Member Function Documentation

std::vector<std::size_t> GClasses::GSelfOrganizingMap::bestData ( const GMatrix pData) const

Given a matrix containing input data of the correct dimensions, returns a vector v of nodes.size() indices into that matrix. v[i] is the index of the data in pData that gives the strongest response for node i. That is, pData->row(v[i]) is the data element that best matches node i.

Assumes there is at least one row in pData

std::size_t GClasses::GSelfOrganizingMap::bestMatch ( const GVec pIn) const

Return the index of the node whose weight vector best matches in under the distance metric for this SOM. Assumes there is at least one node.

unsigned GClasses::GSelfOrganizingMap::inputDimensions ( ) const
inline

Return the number of dimensions this takes as input.

void GClasses::GSelfOrganizingMap::invalidateSortedNeighbors ( )
inlineprotected

Marks sortedNeighbors as invalid and need of regeneration.

std::vector<std::size_t> GClasses::GSelfOrganizingMap::nearestNeighbors ( unsigned  nodeIdx,
unsigned  numNeighbors,
double  epsilon = 1e-8 
) const

Inspector giving the indices of the nearest neighbors in output space of the node at index nodeIdx. If any neighbor with a given distance is returned all neighbors at that distance will be returned. Any distances within epsilon of one another are considered equivalent.

If there are enough nodes in the network, at least numNeigbors neighbors will be returned. If there are fewer nodes in the network than the number of neighbors, a list of all indices will be returned.

std::vector<SOM::NodeAndDistance> GClasses::GSelfOrganizingMap::neighborsInCircle ( unsigned  nodeIdx,
double  radius 
) const

Return all neighbors of the node at that have a distance from nodeIdx of less than radius.

const GDistanceMetric* GClasses::GSelfOrganizingMap::nodeDistance ( ) const
inline

Inspector for the distance metric used in output space, that is, between two nodes for their relative influence.

const std::vector<SOM::Node>& GClasses::GSelfOrganizingMap::nodes ( ) const
inline

Inspector for the nodes making up this map.

std::vector<SOM::Node>& GClasses::GSelfOrganizingMap::nodes ( )
inline

Accessor for the nodes making up this map.

std::vector<double> GClasses::GSelfOrganizingMap::outputAxes ( ) const
inline

Return the maximum for each output axis.

size_t GClasses::GSelfOrganizingMap::outputDimensions ( ) const
inline

Return the number of dimensions this returns as output.

virtual GMatrix* GClasses::GSelfOrganizingMap::reduce ( const GMatrix in)
overridevirtual

Transforms pIn after training on it.

Reimplemented from GClasses::GIncrementalTransform.

void GClasses::GSelfOrganizingMap::regenerateSortedNeighbors ( ) const
protected

Eliminates the current contents of m_sortedNeighbors and regenerates it from m_pNodeDistance and m_nodes. sets m_sortedNeighborsIsValid on completion. Note that the members this method changes are mutable, thus it can be marked as a const method and called from const methods.

virtual GDomNode* GClasses::GSelfOrganizingMap::serialize ( GDom ) const
overridevirtual

Add this map to a dom document and return the pointer to the tree added.

WARNING: the current imlementation DOES NOT SERIALIZE THE TRAINING ALGORITHM - do not train a GSelfOrganizingMap obtained from a dom file.

TODO: make all serialize's const

Implements GClasses::GIncrementalTransform.

const std::vector<std::vector<SOM::NodeAndDistance> >& GClasses::GSelfOrganizingMap::sortedNeighbors ( ) const
inlineprotected

Return a sorted neighbor structure in which entry i is a list of the node indices sorted by their distance from node i.

static void GClasses::GSelfOrganizingMap::test ( )
static

Performs unit tests for this class. Throws an exception if there is a failure.

virtual GRelation* GClasses::GSelfOrganizingMap::trainInner ( const GMatrix in)
inlineoverridevirtual

see comment on GIncrementalTransform::train(GMatrix&)

Implements GClasses::GIncrementalTransform.

virtual GRelation* GClasses::GSelfOrganizingMap::trainInner ( const GRelation in)
inlineoverridevirtual

Throws an exception (because this transform cannot be trained without data)

Implements GClasses::GIncrementalTransform.

void GClasses::GSelfOrganizingMap::transform ( const GVec pIn,
GVec pOut 
)
overridevirtual

Transform the given vector from input coordinates to map coordinates by finding the best match among the nodes.

see comment on GIncrementalTransform::(const GVec& pIn, GVec& pOut)

Implements GClasses::GIncrementalTransform.

virtual void GClasses::GSelfOrganizingMap::untransform ( const GVec pIn,
GVec pOut 
)
inlineoverridevirtual

Throws an exception (because this transform cannot be reversed).

Implements GClasses::GIncrementalTransform.

virtual void GClasses::GSelfOrganizingMap::untransformToDistribution ( const GVec pIn,
GPrediction pOut 
)
inlineoverridevirtual

Throws an exception (because this transform cannot be reversed).

Implements GClasses::GIncrementalTransform.

const GDistanceMetric* GClasses::GSelfOrganizingMap::weightDistance ( ) const
inline

Inspector for the distance metric used in input space, that is, between an input point and a weight for determining the winner.

Friends And Related Function Documentation

friend class SOM::TrainingAlgorithm
friend

Member Data Documentation

unsigned GClasses::GSelfOrganizingMap::m_nInputDims
protected

Number of input dimensions.

std::vector<SOM::Node> GClasses::GSelfOrganizingMap::m_nodes
protected

The nodes in the map. Do not change the order of nodes once sortedNeighbors has been created unless you also invalidate sortedNeighbors. Remember to call m_pWeightDistance->init if you change the number of weights.

std::vector<double> GClasses::GSelfOrganizingMap::m_outputAxes
protected

Vector containing the maximum number in each output axis - so axis #0 would have a range 0..m_vOutputAxes[0]. Remember to call m_pNodeDistance->init whenever you change the number of output axes.

GDistanceMetric* GClasses::GSelfOrganizingMap::m_pNodeDistance
protected

The distance function to use when calculating the distance between nodes in the map. If you change this, remember to invalidate the neighbor structure. - owned by this object. Remember to call init whenever you change the number of output axes.

SOM::TrainingAlgorithm* GClasses::GSelfOrganizingMap::m_pTrainer
protected

The algorithm to be used to train this map on the next call to train. This object owns the training algorithm and is responsable for deleting it.

GDistanceMetric* GClasses::GSelfOrganizingMap::m_pWeightDistance
protected

The distance function to use when finding the closest weight to a given input point - owned by this object. Remember to call init on this whenever you change the number of weights.

std::vector<std::vector<SOM::NodeAndDistance> > GClasses::GSelfOrganizingMap::m_sortedNeighbors
mutableprotected

Each entry sortedNeighbors[i] contains a vector of all the other nodes sorted by their distance from i in outputSpace.

bool GClasses::GSelfOrganizingMap::m_sortedNeighborsIsValid
mutableprotected