GClasses
GClasses::GKdTree Class Reference

Detailed Description

An efficient algorithm for finding neighbors.

#include <GNeighborFinder.h>

Inheritance diagram for GClasses::GKdTree:
GClasses::GNeighborFinderGeneralizing GClasses::GNeighborFinder

Public Member Functions

 GKdTree (const GMatrix *pData, GDistanceMetric *pMetric=NULL, bool ownMetric=false)
 
virtual ~GKdTree ()
 
GKdNode * buildTree (size_t count, size_t *pIndexes)
 Build the tree. More...
 
virtual size_t findNearest (size_t k, size_t index)
 See the comment for GNeighborFinder::findNearest. More...
 
virtual size_t findNearest (size_t k, const GVec &vector)
 See the comment for GNeighborFinderGeneralizing::neighbors. More...
 
size_t findWithinRadius (double squaredRadius, size_t index)
 See the comment for GNeighborFinder::findWithinRadius. More...
 
size_t findWithinRadius (double squaredRadius, const GVec &vector)
 See the comment for GNeighborFinderGeneralizing::findWithinRadius. More...
 
bool isGreaterOrEqual (const double *pPat, size_t attr, double pivot)
 Returns true iff the specified point-vector is on the >= side of the specified pivot. More...
 
virtual void reoptimize ()
 Rebuilds the tree to improve subsequent performance. This should be called after a significant number of point-vectors are added to or released from the internal set. More...
 
GKdNode * root ()
 Returns the root node of the kd-tree. More...
 
void setMaxLeafSize (size_t n)
 Specify the max number of point-vectors to store in each leaf node. More...
 
- Public Member Functions inherited from GClasses::GNeighborFinderGeneralizing
 GNeighborFinderGeneralizing (const GMatrix *pData, GDistanceMetric *pMetric=NULL, bool ownMetric=false)
 Create a neighborfinder for finding the neighborCount nearest neighbors under the given metric. If ownMetric is true, then the neighborFinder takes responsibility for deleting the metric, otherwise it is the caller's responsibility. More...
 
virtual ~GNeighborFinderGeneralizing ()
 
virtual bool canGeneralize ()
 Returns true. See the comment for GNeighborFinder::canGeneralize. More...
 
virtual double distance (size_t i)
 See the comment for GNeighborFinder::distance. More...
 
GDistanceMetricmetric ()
 Returns the metric. More...
 
virtual size_t neighbor (size_t i)
 See the comment for GNeighborFinder::neighbor. More...
 
void sortNeighbors (size_t start=0, size_t end=INVALID_INDEX)
 Uses Quick Sort to sort the neighbors from least to most distant. More...
 
- Public Member Functions inherited from GClasses::GNeighborFinder
 GNeighborFinder (const GMatrix *pData)
 
virtual ~GNeighborFinder ()
 
const GMatrixdata ()
 Returns the data passed to the constructor of this object. More...
 
virtual bool isCached ()
 Returns true iff the neighbors and distances are pre-computed. More...
 

Static Public Member Functions

static double medianDistanceToNeighbor (GMatrix &data, size_t n)
 Computes the median distance to the n^th closest neighbor of each row in data. More...
 
static void test ()
 Performs unit tests for this class. Throws an exception if there is a failure. More...
 

Protected Member Functions

void computePivotAndGoodness (size_t count, size_t *pIndexes, size_t attr, double *pOutPivot, double *pOutGoodness)
 Computes a good pivot for the specified attribute, and the goodness of splitting on that attribute. For continuous attributes, the pivot is the (not scaled) mean and the goodness is the scaled variance. For nominal attributes, the pivot is the most common value and the goodness is scaled entropy. More...
 
size_t findNearest (size_t k, const GVec &vec, size_t nExclude)
 This is a helper method that finds the nearest neighbors. More...
 
size_t findWithinRadius (double squaredRadius, const GVec &vec, size_t nExclude)
 This is a helper method that finds neighbors within a radius. More...
 
size_t splitIndexes (size_t count, size_t *pIndexes, size_t attr, double pivot)
 Moves all the indexes that refer to rows that have a value less than pivot in the specified attribute to the beginning of the list, and the rest to the end. Returns the number of rows with a value less than the pivot. For nominal values, not-equal values are moved to the beginning, and equal values are moved to the end. More...
 
- Protected Member Functions inherited from GClasses::GNeighborFinderGeneralizing
void insertionSortNeighbors (size_t start, size_t end)
 A helper method used by sortNeighbors when the remaining portion to sort is small. More...
 

Protected Attributes

size_t m_maxLeafSize
 
GKdNode * m_pRoot
 
size_t m_size
 
- Protected Attributes inherited from GClasses::GNeighborFinderGeneralizing
std::vector< double > m_dists
 
std::vector< size_t > m_neighs
 
bool m_ownMetric
 
GDistanceMetricm_pMetric
 
- Protected Attributes inherited from GClasses::GNeighborFinder
const GMatrixm_pData
 

Constructor & Destructor Documentation

GClasses::GKdTree::GKdTree ( const GMatrix pData,
GDistanceMetric pMetric = NULL,
bool  ownMetric = false 
)
virtual GClasses::GKdTree::~GKdTree ( )
virtual

Member Function Documentation

GKdNode* GClasses::GKdTree::buildTree ( size_t  count,
size_t *  pIndexes 
)

Build the tree.

void GClasses::GKdTree::computePivotAndGoodness ( size_t  count,
size_t *  pIndexes,
size_t  attr,
double *  pOutPivot,
double *  pOutGoodness 
)
protected

Computes a good pivot for the specified attribute, and the goodness of splitting on that attribute. For continuous attributes, the pivot is the (not scaled) mean and the goodness is the scaled variance. For nominal attributes, the pivot is the most common value and the goodness is scaled entropy.

virtual size_t GClasses::GKdTree::findNearest ( size_t  k,
size_t  index 
)
virtual

See the comment for GNeighborFinder::findNearest.

Implements GClasses::GNeighborFinder.

virtual size_t GClasses::GKdTree::findNearest ( size_t  k,
const GVec vector 
)
virtual

See the comment for GNeighborFinderGeneralizing::neighbors.

Implements GClasses::GNeighborFinderGeneralizing.

size_t GClasses::GKdTree::findNearest ( size_t  k,
const GVec vec,
size_t  nExclude 
)
protected

This is a helper method that finds the nearest neighbors.

size_t GClasses::GKdTree::findWithinRadius ( double  squaredRadius,
size_t  index 
)
virtual
size_t GClasses::GKdTree::findWithinRadius ( double  squaredRadius,
const GVec vector 
)
virtual
size_t GClasses::GKdTree::findWithinRadius ( double  squaredRadius,
const GVec vec,
size_t  nExclude 
)
protected

This is a helper method that finds neighbors within a radius.

bool GClasses::GKdTree::isGreaterOrEqual ( const double *  pPat,
size_t  attr,
double  pivot 
)

Returns true iff the specified point-vector is on the >= side of the specified pivot.

static double GClasses::GKdTree::medianDistanceToNeighbor ( GMatrix data,
size_t  n 
)
static

Computes the median distance to the n^th closest neighbor of each row in data.

virtual void GClasses::GKdTree::reoptimize ( )
virtual

Rebuilds the tree to improve subsequent performance. This should be called after a significant number of point-vectors are added to or released from the internal set.

Implements GClasses::GNeighborFinderGeneralizing.

GKdNode* GClasses::GKdTree::root ( )
inline

Returns the root node of the kd-tree.

void GClasses::GKdTree::setMaxLeafSize ( size_t  n)
inline

Specify the max number of point-vectors to store in each leaf node.

size_t GClasses::GKdTree::splitIndexes ( size_t  count,
size_t *  pIndexes,
size_t  attr,
double  pivot 
)
protected

Moves all the indexes that refer to rows that have a value less than pivot in the specified attribute to the beginning of the list, and the rest to the end. Returns the number of rows with a value less than the pivot. For nominal values, not-equal values are moved to the beginning, and equal values are moved to the end.

static void GClasses::GKdTree::test ( )
static

Performs unit tests for this class. Throws an exception if there is a failure.

Member Data Documentation

size_t GClasses::GKdTree::m_maxLeafSize
protected
GKdNode* GClasses::GKdTree::m_pRoot
protected
size_t GClasses::GKdTree::m_size
protected