GClasses
|
An implementation of a Kohonen self-organizing map.
Note: does not support more than 2^52 nodes – I don't believe this will be a problem within the next 20 years.
See: T. Kohonen "Self Organizing Maps" Third Edition, 2001, published by Springer
#include <GSelfOrganizingMap.h>
Public Member Functions | |
GSelfOrganizingMap (int nMapDims, int nMapWidth, GRand &rand, SOM::Reporter *r=new SOM::NoReporting()) | |
Creates a map whose nodes are on a grid that uses euclidean distance to find the nearest point and is trained with the batch SOM algorithm with neighborhood decreasing exponentially from 2*nMapWidth to 1 in 100 super-epochs with each super-epoch involving waiting up to 100 iterations for convergence at that neighborhood width. The neighbors affect one another through a unit height Gaussian with sigma=neighborhood size. Every epoch and super-epoch, the reporter is notified. This map will be responsible for deleting the dynamically allocated reporter. More... | |
GSelfOrganizingMap (const std::vector< double > &outputAxes, std::size_t numNodes, SOM::NodeLocationInitialization *topology, SOM::TrainingAlgorithm *trainer, GDistanceMetric *weightDistance, GDistanceMetric *nodeDistance) | |
Create a self-organizing map. More... | |
GSelfOrganizingMap (const GDomNode *pNode) | |
Reconstruct this self-organizing map from its serialized form in a dom document. More... | |
virtual | ~GSelfOrganizingMap () |
std::vector< std::size_t > | bestData (const GMatrix *pData) const |
Given a matrix containing input data of the correct dimensions, returns a vector v of nodes.size() indices into that matrix. v[i] is the index of the data in pData that gives the strongest response for node i. That is, pData->row(v[i]) is the data element that best matches node i. More... | |
std::size_t | bestMatch (const GVec &pIn) const |
Return the index of the node whose weight vector best matches in under the distance metric for this SOM. Assumes there is at least one node. More... | |
unsigned | inputDimensions () const |
Return the number of dimensions this takes as input. More... | |
std::vector< std::size_t > | nearestNeighbors (unsigned nodeIdx, unsigned numNeighbors, double epsilon=1e-8) const |
Inspector giving the indices of the nearest neighbors in output space of the node at index nodeIdx. If any neighbor with a given distance is returned all neighbors at that distance will be returned. Any distances within epsilon of one another are considered equivalent. More... | |
std::vector< SOM::NodeAndDistance > | neighborsInCircle (unsigned nodeIdx, double radius) const |
Return all neighbors of the node at that have a distance from nodeIdx of less than radius. More... | |
const GDistanceMetric * | nodeDistance () const |
Inspector for the distance metric used in output space, that is, between two nodes for their relative influence. More... | |
const std::vector< SOM::Node > & | nodes () const |
Inspector for the nodes making up this map. More... | |
std::vector< SOM::Node > & | nodes () |
Accessor for the nodes making up this map. More... | |
std::vector< double > | outputAxes () const |
Return the maximum for each output axis. More... | |
size_t | outputDimensions () const |
Return the number of dimensions this returns as output. More... | |
virtual GMatrix * | reduce (const GMatrix &in) override |
Transforms pIn after training on it. More... | |
virtual GDomNode * | serialize (GDom *) const override |
Add this map to a dom document and return the pointer to the tree added. WARNING: the current imlementation DOES NOT SERIALIZE THE TRAINING ALGORITHM - do not train a GSelfOrganizingMap obtained from a dom file. More... | |
virtual GRelation * | trainInner (const GMatrix &in) override |
see comment on GIncrementalTransform::train(GMatrix&) More... | |
virtual GRelation * | trainInner (const GRelation &in) override |
Throws an exception (because this transform cannot be trained without data) More... | |
void | transform (const GVec &pIn, GVec &pOut) override |
Transform the given vector from input coordinates to map coordinates by finding the best match among the nodes. More... | |
virtual void | untransform (const GVec &pIn, GVec &pOut) override |
Throws an exception (because this transform cannot be reversed). More... | |
virtual void | untransformToDistribution (const GVec &pIn, GPrediction *pOut) override |
Throws an exception (because this transform cannot be reversed). More... | |
const GDistanceMetric * | weightDistance () const |
Inspector for the distance metric used in input space, that is, between an input point and a weight for determining the winner. More... | |
Public Member Functions inherited from GClasses::GIncrementalTransform | |
GIncrementalTransform () | |
GIncrementalTransform (const GDomNode *pNode) | |
virtual | ~GIncrementalTransform () |
const GRelation & | after () const |
Returns a relation object describing the data after it is transformed. More... | |
const GRelation & | before () const |
Returns a relation object describing the data before it is transformed. More... | |
GVec & | innerBuf () |
Returns a buffer of sufficient size to store an inner (transformed) vector. The caller should not to delete the buffer. The same buffer will be returned each time. More... | |
void | setAfter (GRelation *pRel) |
Sets the after relation. Takes ownership of pRel. More... | |
void | setBefore (GRelation *pRel) |
Sets the before relation. Takes ownership of pRel. More... | |
void | train (const GMatrix &data) |
Trains the transform on the data in pData. (This method may be a no-op for transformations that always behave in the same manner.) More... | |
void | train (const GRelation &pRelation) |
"Trains" the transform without any data. More... | |
virtual GMatrix * | transformBatch (const GMatrix &in) |
This assumes that train has already been called, and transforms all the rows in in returning the resulting matrix. The caller is responsible for deleting the new matrix. More... | |
virtual std::unique_ptr< GMatrix > | untransformBatch (const GMatrix &in) |
This assumes train was previously called, and untransforms all the rows in pIn and returns the results. More... | |
Public Member Functions inherited from GClasses::GTransform | |
GTransform () | |
GTransform (const GDomNode *pNode) | |
virtual | ~GTransform () |
Static Public Member Functions | |
static void | test () |
Performs unit tests for this class. Throws an exception if there is a failure. More... | |
Static Public Member Functions inherited from GClasses::GIncrementalTransform | |
static void | test () |
Performs unit tests for this class. Throws an exception if there is a failure. More... | |
Protected Member Functions | |
void | invalidateSortedNeighbors () |
Marks sortedNeighbors as invalid and need of regeneration. More... | |
void | regenerateSortedNeighbors () const |
Eliminates the current contents of m_sortedNeighbors and regenerates it from m_pNodeDistance and m_nodes. sets m_sortedNeighborsIsValid on completion. Note that the members this method changes are mutable, thus it can be marked as a const method and called from const methods. More... | |
const std::vector< std::vector< SOM::NodeAndDistance > > & | sortedNeighbors () const |
Return a sorted neighbor structure in which entry i is a list of the node indices sorted by their distance from node i. More... | |
Protected Member Functions inherited from GClasses::GIncrementalTransform | |
virtual GDomNode * | baseDomNode (GDom *pDoc, const char *szClassName) const |
Child classes should use this in their implementation of serialize. More... | |
Protected Attributes | |
unsigned | m_nInputDims |
Number of input dimensions. More... | |
std::vector< SOM::Node > | m_nodes |
The nodes in the map. Do not change the order of nodes once sortedNeighbors has been created unless you also invalidate sortedNeighbors. Remember to call m_pWeightDistance->init if you change the number of weights. More... | |
std::vector< double > | m_outputAxes |
Vector containing the maximum number in each output axis - so axis #0 would have a range 0..m_vOutputAxes[0]. Remember to call m_pNodeDistance->init whenever you change the number of output axes. More... | |
GDistanceMetric * | m_pNodeDistance |
The distance function to use when calculating the distance between nodes in the map. If you change this, remember to invalidate the neighbor structure. - owned by this object. Remember to call init whenever you change the number of output axes. More... | |
SOM::TrainingAlgorithm * | m_pTrainer |
The algorithm to be used to train this map on the next call to train. This object owns the training algorithm and is responsable for deleting it. More... | |
GDistanceMetric * | m_pWeightDistance |
The distance function to use when finding the closest weight to a given input point - owned by this object. Remember to call init on this whenever you change the number of weights. More... | |
std::vector< std::vector< SOM::NodeAndDistance > > | m_sortedNeighbors |
Each entry sortedNeighbors[i] contains a vector of all the other nodes sorted by their distance from i in outputSpace. More... | |
bool | m_sortedNeighborsIsValid |
Friends | |
class | SOM::TrainingAlgorithm |
GClasses::GSelfOrganizingMap::GSelfOrganizingMap | ( | int | nMapDims, |
int | nMapWidth, | ||
GRand & | rand, | ||
SOM::Reporter * | r = new SOM::NoReporting() |
||
) |
Creates a map whose nodes are on a grid that uses euclidean distance to find the nearest point and is trained with the batch SOM algorithm with neighborhood decreasing exponentially from 2*nMapWidth to 1 in 100 super-epochs with each super-epoch involving waiting up to 100 iterations for convergence at that neighborhood width. The neighbors affect one another through a unit height Gaussian with sigma=neighborhood size. Every epoch and super-epoch, the reporter is notified. This map will be responsible for deleting the dynamically allocated reporter.
nMapDims specifies the number of dimensions for the map.
nMapWidth specifies the size in one dimension of the map.
(so if nMapDims is 3 and nMapWidth is 10, the map will contain 1000 (10^3) nodes.)
GClasses::GSelfOrganizingMap::GSelfOrganizingMap | ( | const std::vector< double > & | outputAxes, |
std::size_t | numNodes, | ||
SOM::NodeLocationInitialization * | topology, | ||
SOM::TrainingAlgorithm * | trainer, | ||
GDistanceMetric * | weightDistance, | ||
GDistanceMetric * | nodeDistance | ||
) |
Create a self-organizing map.
outputAxes | each node in the network will have a vector location and component i of that vector will range from 0..outputAxes[i] |
numNodes | the number of nodes in the network |
topology | determines the locations assigned to each node |
trainer | algorithm to train the network on data |
weightDistance | the distance used in determining which node's weight-set is closest to an input point |
nodeDistance | the distance used to determine the influence of two nodes with different coordinates on one another. |
GClasses::GSelfOrganizingMap::GSelfOrganizingMap | ( | const GDomNode * | pNode | ) |
Reconstruct this self-organizing map from its serialized form in a dom document.
|
virtual |
std::vector<std::size_t> GClasses::GSelfOrganizingMap::bestData | ( | const GMatrix * | pData | ) | const |
Given a matrix containing input data of the correct dimensions, returns a vector v of nodes.size() indices into that matrix. v[i] is the index of the data in pData that gives the strongest response for node i. That is, pData->row(v[i]) is the data element that best matches node i.
Assumes there is at least one row in pData
std::size_t GClasses::GSelfOrganizingMap::bestMatch | ( | const GVec & | pIn | ) | const |
Return the index of the node whose weight vector best matches in under the distance metric for this SOM. Assumes there is at least one node.
|
inline |
Return the number of dimensions this takes as input.
|
inlineprotected |
Marks sortedNeighbors as invalid and need of regeneration.
std::vector<std::size_t> GClasses::GSelfOrganizingMap::nearestNeighbors | ( | unsigned | nodeIdx, |
unsigned | numNeighbors, | ||
double | epsilon = 1e-8 |
||
) | const |
Inspector giving the indices of the nearest neighbors in output space of the node at index nodeIdx. If any neighbor with a given distance is returned all neighbors at that distance will be returned. Any distances within epsilon of one another are considered equivalent.
If there are enough nodes in the network, at least numNeigbors neighbors will be returned. If there are fewer nodes in the network than the number of neighbors, a list of all indices will be returned.
std::vector<SOM::NodeAndDistance> GClasses::GSelfOrganizingMap::neighborsInCircle | ( | unsigned | nodeIdx, |
double | radius | ||
) | const |
Return all neighbors of the node at that have a distance from nodeIdx of less than radius.
|
inline |
Inspector for the distance metric used in output space, that is, between two nodes for their relative influence.
|
inline |
Inspector for the nodes making up this map.
|
inline |
Accessor for the nodes making up this map.
|
inline |
Return the maximum for each output axis.
|
inline |
Return the number of dimensions this returns as output.
Transforms pIn after training on it.
Reimplemented from GClasses::GIncrementalTransform.
|
protected |
Eliminates the current contents of m_sortedNeighbors and regenerates it from m_pNodeDistance and m_nodes. sets m_sortedNeighborsIsValid on completion. Note that the members this method changes are mutable, thus it can be marked as a const method and called from const methods.
Add this map to a dom document and return the pointer to the tree added.
WARNING: the current imlementation DOES NOT SERIALIZE THE TRAINING ALGORITHM - do not train a GSelfOrganizingMap obtained from a dom file.
TODO: make all serialize's const
Implements GClasses::GIncrementalTransform.
|
inlineprotected |
Return a sorted neighbor structure in which entry i is a list of the node indices sorted by their distance from node i.
|
static |
Performs unit tests for this class. Throws an exception if there is a failure.
|
inlineoverridevirtual |
see comment on GIncrementalTransform::train(GMatrix&)
Implements GClasses::GIncrementalTransform.
|
inlineoverridevirtual |
Throws an exception (because this transform cannot be trained without data)
Implements GClasses::GIncrementalTransform.
Transform the given vector from input coordinates to map coordinates by finding the best match among the nodes.
see comment on GIncrementalTransform::(const GVec& pIn, GVec& pOut)
Implements GClasses::GIncrementalTransform.
|
inlineoverridevirtual |
Throws an exception (because this transform cannot be reversed).
Implements GClasses::GIncrementalTransform.
|
inlineoverridevirtual |
Throws an exception (because this transform cannot be reversed).
Implements GClasses::GIncrementalTransform.
|
inline |
Inspector for the distance metric used in input space, that is, between an input point and a weight for determining the winner.
|
friend |
|
protected |
Number of input dimensions.
|
protected |
The nodes in the map. Do not change the order of nodes once sortedNeighbors has been created unless you also invalidate sortedNeighbors. Remember to call m_pWeightDistance->init if you change the number of weights.
|
protected |
Vector containing the maximum number in each output axis - so axis #0 would have a range 0..m_vOutputAxes[0]. Remember to call m_pNodeDistance->init whenever you change the number of output axes.
|
protected |
The distance function to use when calculating the distance between nodes in the map. If you change this, remember to invalidate the neighbor structure. - owned by this object. Remember to call init whenever you change the number of output axes.
|
protected |
The algorithm to be used to train this map on the next call to train. This object owns the training algorithm and is responsable for deleting it.
|
protected |
The distance function to use when finding the closest weight to a given input point - owned by this object. Remember to call init on this whenever you change the number of weights.
|
mutableprotected |
Each entry sortedNeighbors[i] contains a vector of all the other nodes sorted by their distance from i in outputSpace.
|
mutableprotected |