GClasses
GClasses::GKNN Class Reference

Detailed Description

The k-Nearest Neighbor learning algorithm.

#include <GKNN.h>

Inheritance diagram for GClasses::GKNN:
GClasses::GIncrementalLearner GClasses::GSupervisedLearner GClasses::GTransducer

Public Types

Public Member Functions

 GKNN ()
 General-purpose constructor. More...
 
 GKNN (const GDomNode *pNode)
 Load from a DOM. More...
 
virtual ~GKNN ()
 
size_t addVector (const GVec &in, const GVec &out)
 Adds a copy of pVector to the internal set. More...
 
void autoTune (GMatrix &features, GMatrix &labels)
 Uses cross-validation to find a set of parameters that works well with the provided data. More...
 
virtual void clear ()
 Discard any training (but not any settings) so it can be trained again. More...
 
void drawRandom (size_t n)
 Specify to train by drawing 'n' random patterns from the training set. More...
 
GMatrixfeatures ()
 Returns the internal feature set. More...
 
GMatrixlabels ()
 Returns the internal label set. More...
 
GDistanceMetricmetric ()
 Returns the dissimilarity metric. More...
 
size_t neighborCount ()
 Returns the number of neighbors. More...
 
virtual void predict (const GVec &in, GVec &out)
 See the comment for GSupervisedLearner::predict. More...
 
virtual void predictDistribution (const GVec &in, GPrediction *pOut)
 See the comment for GSupervisedLearner::predictDistribution. More...
 
virtual GDomNodeserialize (GDom *pDoc) const
 Marshal this object into a DOM, which can then be converted to a variety of serial formats. More...
 
void setInterpolationLearner (GSupervisedLearner *pLearner, bool bTakeOwnership)
 Sets the interpolation method to "Learner" and sets the learner to use. If bTakeOwnership is true, it will delete the learner when this object is deleted. More...
 
void setInterpolationMethod (InterpolationMethod eMethod)
 Sets the technique for interpolation. (If you want to use the "Learner" method, you should call SetInterpolationLearner instead of this method.) More...
 
void setMetric (GDistanceMetric *pMetric, bool own)
 Sets the distance metric to use for finding neighbors. If own is true, then this object will delete pMetric when it is done with it. More...
 
void setMetric (GSparseSimilarity *pMetric, bool own)
 Sets the sparse similarity metric to use for finding neighbors. If own is true, then this object will delete pMetric when it is done with it. More...
 
void setNeighborCount (size_t k)
 Specify the number of neighbors to use. (The default is 1.) More...
 
void setNormalizeScaleFactors (bool b)
 Specify whether to normalize the scaling of each attribute. (The default is to normalize.) More...
 
void setOptimizeScaleFactors (bool b)
 If you set this to true, it will use a hill-climber to optimize the attribute scaling factors. If you set it to false (the default), it won't. More...
 
GSparseMatrixsparseFeatures ()
 Returns the internal set of sparse features. More...
 
virtual void trainSparse (GSparseMatrix &features, GMatrix &labels)
 See the comment for GIncrementalLearner::trainSparse. More...
 
- Public Member Functions inherited from GClasses::GIncrementalLearner
 GIncrementalLearner ()
 General-purpose constructor. More...
 
 GIncrementalLearner (const GDomNode *pNode)
 Deserialization constructor. More...
 
virtual ~GIncrementalLearner ()
 Destructor. More...
 
void beginIncrementalLearning (const GRelation &featureRel, const GRelation &labelRel)
 You must call this method before you call trainIncremental. More...
 
void beginIncrementalLearning (const GMatrix &features, const GMatrix &labels)
 A version of beginIncrementalLearning that supports data-dependent filters. More...
 
virtual bool canTrainIncrementally ()
 Returns true. More...
 
virtual bool isFilter ()
 Only the GFilter class should return true to this method. More...
 
- Public Member Functions inherited from GClasses::GSupervisedLearner
 GSupervisedLearner ()
 General-purpose constructor. More...
 
 GSupervisedLearner (const GDomNode *pNode)
 Deserialization constructor. More...
 
virtual ~GSupervisedLearner ()
 Destructor. More...
 
void basicTest (double minAccuracy1, double minAccuracy2, double deviation=1e-6, bool printAccuracy=false, double warnRange=0.035)
 This is a helper method used by the unit tests of several model learners. More...
 
virtual bool canGeneralize ()
 Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows. More...
 
void confusion (GMatrix &features, GMatrix &labels, std::vector< GMatrix * > &stats)
 Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL. More...
 
void precisionRecall (double *pOutPrecision, size_t nPrecisionSize, GMatrix &features, GMatrix &labels, size_t label, size_t nReps)
 label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision. More...
 
const GRelationrelFeatures ()
 Returns a reference to the feature relation (meta-data about the input attributes). More...
 
const GRelationrelLabels ()
 Returns a reference to the label relation (meta-data about the output attributes). More...
 
double sumSquaredError (const GMatrix &features, const GMatrix &labels, double *pOutSAE=NULL)
 Computes the sum-squared-error for predicting the labels from the features. For categorical labels, Hamming distance is used. More...
 
void train (const GMatrix &features, const GMatrix &labels)
 Call this method to train the model. More...
 
virtual double trainAndTest (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels, double *pOutSAE=NULL)
 Trains and tests this learner. Returns sum-squared-error. More...
 
- Public Member Functions inherited from GClasses::GTransducer
 GTransducer ()
 General-purpose constructor. More...
 
 GTransducer (const GTransducer &that)
 Copy-constructor. Throws an exception to prevent models from being copied by value. More...
 
virtual ~GTransducer ()
 
virtual bool canImplicitlyHandleContinuousFeatures ()
 Returns true iff this algorithm can implicitly handle continuous features. If it cannot, then the GDiscretize transform will be used to convert continuous features to nominal values before passing them to it. More...
 
virtual bool canImplicitlyHandleContinuousLabels ()
 Returns true iff this algorithm can implicitly handle continuous labels (a.k.a. regression). If it cannot, then the GDiscretize transform will be used during training to convert nominal labels to continuous values, and to convert nominal predictions back to continuous labels. More...
 
virtual bool canImplicitlyHandleNominalFeatures ()
 Returns true iff this algorithm can implicitly handle nominal features. If it cannot, then the GNominalToCat transform will be used to convert nominal features to continuous values before passing them to it. More...
 
virtual bool canImplicitlyHandleNominalLabels ()
 Returns true iff this algorithm can implicitly handle nominal labels (a.k.a. classification). If it cannot, then the GNominalToCat transform will be used during training to convert nominal labels to continuous values, and to convert categorical predictions back to nominal labels. More...
 
double crossValidate (const GMatrix &features, const GMatrix &labels, size_t nFolds, double *pOutSAE=NULL, RepValidateCallback pCB=NULL, size_t nRep=0, void *pThis=NULL)
 Perform n-fold cross validation on pData. Returns sum-squared error. Uses trainAndTest for each fold. pCB is an optional callback method for reporting intermediate stats. It can be NULL if you don't want intermediate reporting. nRep is just the rep number that will be passed to the callback. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. if pOutSAE is not NULL, the sum absolute error will be placed there. More...
 
GTransduceroperator= (const GTransducer &other)
 Throws an exception to prevent models from being copied by value. More...
 
GRandrand ()
 Returns a reference to the random number generator associated with this object. For example, you could use it to change the random seed, to make this algorithm behave differently. This might be important, for example, in an ensemble of learners. More...
 
double repValidate (const GMatrix &features, const GMatrix &labels, size_t reps, size_t nFolds, double *pOutSAE=NULL, RepValidateCallback pCB=NULL, void *pThis=NULL)
 Perform cross validation "nReps" times and return the average score. pCB is an optional callback method for reporting intermediate stats It can be NULL if you don't want intermediate reporting. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. if pOutSAE is not NULL, the sum absolute error will be placed there. More...
 
virtual bool supportedFeatureRange (double *pOutMin, double *pOutMax)
 Returns true if this algorithm supports any feature value, or if it does not implicitly handle continuous features. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...
 
virtual bool supportedLabelRange (double *pOutMin, double *pOutMax)
 Returns true if this algorithm supports any label value, or if it does not implicitly handle continuous labels. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...
 
std::unique_ptr< GMatrixtransduce (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
 Predicts a set of labels to correspond with features2, such that these labels will be consistent with the patterns exhibited by features1 and labels1. More...
 
void transductiveConfusionMatrix (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels, std::vector< GMatrix * > &stats)
 Makes a confusion matrix for a transduction algorithm. More...
 

Static Public Member Functions

static void test ()
 Performs unit tests for this class. Throws an exception if there is a failure. More...
 
- Static Public Member Functions inherited from GClasses::GSupervisedLearner
static void test ()
 Runs some unit tests related to supervised learning. Throws an exception if any problems are found. More...
 

Protected Member Functions

virtual void beginIncrementalLearningInner (const GRelation &featureRel, const GRelation &labelRel)
 See the comment for GIncrementalLearner::beginIncrementalLearningInner. More...
 
virtual bool canImplicitlyHandleMissingFeatures ()
 See the comment for GTransducer::canImplicitlyHandleMissingFeatures. More...
 
size_t findNeighbors (const GVec &vector)
 Finds the nearest neighbors of pVector. Returns the number of neighbors found. More...
 
void interpolateLearner (size_t nc, const GVec &in, GPrediction *pOut, GVec *pOut2)
 Interpolates with the provided supervised learning algorithm. More...
 
void interpolateLinear (size_t nc, const GVec &in, GPrediction *pOut, GVec *pOut2)
 Interpolate with each neighbor having a linear vote. (Actually it's linear with respect to the squared distance instead of the distance, because this is faster to compute.) More...
 
void interpolateMean (size_t nc, const GVec &in, GPrediction *pOut, GVec *pOut2)
 Interpolate with each neighbor having equal vote. More...
 
virtual void trainIncremental (const GVec &in, const GVec &out)
 Adds a vector to the internal set. Also, if the (k+1)th nearest neighbor of that vector is less than "elbow room" from it, then the closest neighbor is deleted from the internal set. (You might be wondering why the decision to delete the closest neighbor is determined by the distance of the (k+1)th neigbor. This enables a clump of k points to form in the most frequently sampled locations. Also, If you make this decision based on a closer neighbor, then big holes may form in the model if points are sampled in a poor order.) Call SetElbowRoom to specify the elbow room distance. More...
 
virtual void trainInner (const GMatrix &features, const GMatrix &labels)
 See the comment for GSupervisedLearner::trainInner. More...
 
- Protected Member Functions inherited from GClasses::GIncrementalLearner
virtual void beginIncrementalLearningInner (const GMatrix &features, const GMatrix &labels)
 
- Protected Member Functions inherited from GClasses::GSupervisedLearner
GDomNodebaseDomNode (GDom *pDoc, const char *szClassName) const
 Child classes should use this in their implementation of serialize. More...
 
size_t precisionRecallContinuous (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label)
 This is a helper method used by precisionRecall. More...
 
size_t precisionRecallNominal (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label, int value)
 This is a helper method used by precisionRecall. More...
 
void setupFilters (const GMatrix &features, const GMatrix &labels)
 This method determines which data filters (normalize, discretize, and/or nominal-to-cat) are needed and trains them. More...
 
virtual std::unique_ptr< GMatrixtransduceInner (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
 See GTransducer::transduce. More...
 

Protected Attributes

bool m_bOwnLearner
 
InterpolationMethod m_eInterpolationMethod
 
TrainMethod m_eTrainMethod
 
size_t m_nNeighbors
 
bool m_normalizeScaleFactors
 
bool m_optimizeScaleFactors
 
bool m_ownMetric
 
GKnnScaleFactorCritic * m_pCritic
 
GDistanceMetricm_pDistanceMetric
 
GMatrixm_pFeatures
 
GMatrixm_pLabels
 
GSupervisedLearnerm_pLearner
 
GNeighborFinderGeneralizingm_pNeighborFinder
 
GOptimizerm_pScaleFactorOptimizer
 
GSparseMatrixm_pSparseFeatures
 
GSparseSimilaritym_pSparseMetric
 
double m_trainParam
 
GVec m_valueCounts
 
- Protected Attributes inherited from GClasses::GSupervisedLearner
GRelationm_pRelFeatures
 
GRelationm_pRelLabels
 
- Protected Attributes inherited from GClasses::GTransducer
GRand m_rand
 

Additional Inherited Members

- Static Protected Member Functions inherited from GClasses::GSupervisedLearner
static void addInterpolatedFunction (double *pOut, size_t nOutVals, double *pIn, size_t nInVals)
 Adds the function pIn to pOut after interpolating pIn to be the same size as pOut. (This is a helper-function used by precisionRecall.) More...
 

Member Enumeration Documentation

Enumerator
Linear 
Mean 
Learner 
Enumerator
StoreAll 
ValidationPrune 
DrawRandom 

Constructor & Destructor Documentation

GClasses::GKNN::GKNN ( )

General-purpose constructor.

GClasses::GKNN::GKNN ( const GDomNode pNode)

Load from a DOM.

virtual GClasses::GKNN::~GKNN ( )
virtual

Member Function Documentation

size_t GClasses::GKNN::addVector ( const GVec in,
const GVec out 
)

Adds a copy of pVector to the internal set.

void GClasses::GKNN::autoTune ( GMatrix features,
GMatrix labels 
)

Uses cross-validation to find a set of parameters that works well with the provided data.

virtual void GClasses::GKNN::beginIncrementalLearningInner ( const GRelation featureRel,
const GRelation labelRel 
)
protectedvirtual
virtual bool GClasses::GKNN::canImplicitlyHandleMissingFeatures ( )
inlineprotectedvirtual
virtual void GClasses::GKNN::clear ( )
virtual

Discard any training (but not any settings) so it can be trained again.

Implements GClasses::GSupervisedLearner.

void GClasses::GKNN::drawRandom ( size_t  n)
inline

Specify to train by drawing 'n' random patterns from the training set.

GMatrix* GClasses::GKNN::features ( )
inline

Returns the internal feature set.

size_t GClasses::GKNN::findNeighbors ( const GVec vector)
protected

Finds the nearest neighbors of pVector. Returns the number of neighbors found.

void GClasses::GKNN::interpolateLearner ( size_t  nc,
const GVec in,
GPrediction pOut,
GVec pOut2 
)
protected

Interpolates with the provided supervised learning algorithm.

void GClasses::GKNN::interpolateLinear ( size_t  nc,
const GVec in,
GPrediction pOut,
GVec pOut2 
)
protected

Interpolate with each neighbor having a linear vote. (Actually it's linear with respect to the squared distance instead of the distance, because this is faster to compute.)

void GClasses::GKNN::interpolateMean ( size_t  nc,
const GVec in,
GPrediction pOut,
GVec pOut2 
)
protected

Interpolate with each neighbor having equal vote.

GMatrix* GClasses::GKNN::labels ( )
inline

Returns the internal label set.

GDistanceMetric* GClasses::GKNN::metric ( )
inline

Returns the dissimilarity metric.

size_t GClasses::GKNN::neighborCount ( )
inline

Returns the number of neighbors.

virtual void GClasses::GKNN::predict ( const GVec in,
GVec out 
)
virtual
virtual void GClasses::GKNN::predictDistribution ( const GVec in,
GPrediction pOut 
)
virtual
virtual GDomNode* GClasses::GKNN::serialize ( GDom pDoc) const
virtual

Marshal this object into a DOM, which can then be converted to a variety of serial formats.

Implements GClasses::GSupervisedLearner.

void GClasses::GKNN::setInterpolationLearner ( GSupervisedLearner pLearner,
bool  bTakeOwnership 
)

Sets the interpolation method to "Learner" and sets the learner to use. If bTakeOwnership is true, it will delete the learner when this object is deleted.

void GClasses::GKNN::setInterpolationMethod ( InterpolationMethod  eMethod)

Sets the technique for interpolation. (If you want to use the "Learner" method, you should call SetInterpolationLearner instead of this method.)

void GClasses::GKNN::setMetric ( GDistanceMetric pMetric,
bool  own 
)

Sets the distance metric to use for finding neighbors. If own is true, then this object will delete pMetric when it is done with it.

void GClasses::GKNN::setMetric ( GSparseSimilarity pMetric,
bool  own 
)

Sets the sparse similarity metric to use for finding neighbors. If own is true, then this object will delete pMetric when it is done with it.

void GClasses::GKNN::setNeighborCount ( size_t  k)

Specify the number of neighbors to use. (The default is 1.)

void GClasses::GKNN::setNormalizeScaleFactors ( bool  b)

Specify whether to normalize the scaling of each attribute. (The default is to normalize.)

void GClasses::GKNN::setOptimizeScaleFactors ( bool  b)

If you set this to true, it will use a hill-climber to optimize the attribute scaling factors. If you set it to false (the default), it won't.

GSparseMatrix* GClasses::GKNN::sparseFeatures ( )
inline

Returns the internal set of sparse features.

static void GClasses::GKNN::test ( )
static

Performs unit tests for this class. Throws an exception if there is a failure.

virtual void GClasses::GKNN::trainIncremental ( const GVec in,
const GVec out 
)
protectedvirtual

Adds a vector to the internal set. Also, if the (k+1)th nearest neighbor of that vector is less than "elbow room" from it, then the closest neighbor is deleted from the internal set. (You might be wondering why the decision to delete the closest neighbor is determined by the distance of the (k+1)th neigbor. This enables a clump of k points to form in the most frequently sampled locations. Also, If you make this decision based on a closer neighbor, then big holes may form in the model if points are sampled in a poor order.) Call SetElbowRoom to specify the elbow room distance.

Implements GClasses::GIncrementalLearner.

virtual void GClasses::GKNN::trainInner ( const GMatrix features,
const GMatrix labels 
)
protectedvirtual
virtual void GClasses::GKNN::trainSparse ( GSparseMatrix features,
GMatrix labels 
)
virtual

Member Data Documentation

bool GClasses::GKNN::m_bOwnLearner
protected
InterpolationMethod GClasses::GKNN::m_eInterpolationMethod
protected
TrainMethod GClasses::GKNN::m_eTrainMethod
protected
size_t GClasses::GKNN::m_nNeighbors
protected
bool GClasses::GKNN::m_normalizeScaleFactors
protected
bool GClasses::GKNN::m_optimizeScaleFactors
protected
bool GClasses::GKNN::m_ownMetric
protected
GKnnScaleFactorCritic* GClasses::GKNN::m_pCritic
protected
GDistanceMetric* GClasses::GKNN::m_pDistanceMetric
protected
GMatrix* GClasses::GKNN::m_pFeatures
protected
GMatrix* GClasses::GKNN::m_pLabels
protected
GSupervisedLearner* GClasses::GKNN::m_pLearner
protected
GNeighborFinderGeneralizing* GClasses::GKNN::m_pNeighborFinder
protected
GOptimizer* GClasses::GKNN::m_pScaleFactorOptimizer
protected
GSparseMatrix* GClasses::GKNN::m_pSparseFeatures
protected
GSparseSimilarity* GClasses::GKNN::m_pSparseMetric
protected
double GClasses::GKNN::m_trainParam
protected
GVec GClasses::GKNN::m_valueCounts
protected