GClasses
GClasses::GMatrix Class Reference

Detailed Description

Represents a matrix or a database table.

Elements can be discrete or continuous.

References a GRelation object, which stores the meta-information about each column.

#include <GMatrix.h>

Public Member Functions

 GMatrix ()
 Makes an empty 0x0 matrix. More...
 
 GMatrix (size_t rows, size_t cols)
 Construct a rows x cols matrix with all elements of the matrix assumed to be continuous. More...
 
 GMatrix (std::vector< size_t > &attrValues)
 Construct a matrix with a mixed relation. That is, one with some continuous attributes (columns), and some nominal attributes (columns). More...
 
 GMatrix (GRelation *pRelation)
 Create an empty matrix whose attributes/column types are specified by pRelation. More...
 
 GMatrix (const GMatrix &orig, size_t rowStart=0, size_t colStart=0, size_t rowCount=(size_t)-1, size_t colCount=(size_t)-1)
 Copy-constructor. More...
 
 GMatrix (const GDomNode *pNode)
 Load from a DOM. More...
 
 ~GMatrix ()
 
void add (const GMatrix *pThat, bool transpose=false, double scalar=1.0)
 Matrix add. More...
 
GVecback (size_t reverse_index=0)
 Returns a pointer to a row indexed from the back of the matrix. index 0 (default) is the last row, index 1 is the second-to-last row, etc. More...
 
const GVecback (size_t reverse_index=0) const
 
double baselineValue (size_t nAttribute) const
 Returns the mean if the specified attribute is continuous, otherwise returns the most common nominal value in the attribute. More...
 
double boundingSphere (GVec &outCenter, size_t *pIndexes, size_t indexCount, GDistanceMetric *pMetric) const
 Finds a sphere that tightly bounds all the points in the specified vector of row-indexes. More...
 
void centerMeanAtOrigin ()
 Shifts the data such that the mean occurs at the origin. Only continuous values are affected. Nominal values are left unchanged. More...
 
void centroid (GVec &outCentroid, const double *pWeights=NULL) const
 Computes the arithmetic means of all attributes If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. More...
 
GMatrixcholesky (bool tolerant=false)
 This computes the square root of this matrix. (If you take the matrix that this returns and multiply it by its transpose, you should get the original dataset again.) (Returns a lower-triangular matrix.) More...
 
void clipColumn (size_t col, double dMin, double dMax)
 Clips the values in the specified column to fall beween dMin and dMax (inclusively). More...
 
void col (size_t index, double *pOutVector)
 Copies the specified column into pOutVector. More...
 
size_t cols () const
 Returns the number of columns in the dataset. More...
 
double columnMax (size_t nAttribute) const
 Returns the maximum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns -1e300 if there are no known values in the column. More...
 
double columnMean (size_t nAttribute, const double *pWeights=NULL, bool throwIfEmpty=true) const
 Computes the arithmetic mean of the values in the specified column If pWeights is NULL, then each row is given equal weight. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. If there are no values in this column with any weight, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE. More...
 
double columnMedian (size_t nAttribute, bool throwIfEmpty=true) const
 Computes the median of the values in the specified column If there are no values in this column, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE. More...
 
double columnMin (size_t nAttribute) const
 Returns the minimum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns 1e300 if there are no known values in the column. More...
 
double columnSquaredMagnitude (size_t col) const
 Returns the squared magnitude of the vector in the specified column. More...
 
double columnSum (size_t col) const
 Returns the sum of the values in the specified column. More...
 
double columnSumSquaredDifference (const GMatrix &that, size_t col, double *pOutSAE=NULL) const
 Computes the sum-squared distance between the specified column of this and that. If the column is a nominal attribute, then Hamming distance is used. if pOutSAE is not NULL, the sum absolute error will be placed there. More...
 
double columnVariance (size_t nAttr, double mean) const
 Computes the sample variance of a single attribute. More...
 
void copy (const GMatrix &that, size_t rowStart=0, size_t colStart=0, size_t rowCount=(size_t)-1, size_t colCount=(size_t)-1)
 Copies (deep) all the data and metadata from pThat. More...
 
void copyBlock (const GMatrix &source, size_t srcRow=0, size_t srcCol=0, size_t hgt=INVALID_INDEX, size_t wid=INVALID_INDEX, size_t destRow=0, size_t destCol=0, bool checkMetaData=true)
 Copies values from a rectangular region of the source matrix into this matrix. The wid and hgt values are clipped if they exceed the size of the source matrix. An exception is thrown if the destination is not big enough to hold the values at the specified location. If checkMetaData is true, then this will throw an exception if the data types are incompatible. More...
 
void copyCols (const GMatrix &that, size_t firstCol, size_t colCount)
 Copies the specified range of columns (including meta-data) from that matrix into this matrix, replacing all data currently in this matrix. More...
 
size_t countPrincipalComponents (double d, GRand *pRand) const
 Computes the minimum number of principal components necessary so that less than the specified portion of the deviation in the data is unaccounted for. More...
 
size_t countUniqueValues (size_t col, size_t maxCount=(size_t)-1) const
 Counts the number of unique values in the specified column. If maxCount unique values are found, it immediately returns maxCount. More...
 
size_t countValue (size_t attribute, double value) const
 Returns the number of ocurrences of the specified value in the specified attribute. More...
 
double covariance (size_t nAttr1, double dMean1, size_t nAttr2, double dMean2, const double *pWeights=NULL) const
 Computes the covariance between two attributes. If pWeights is NULL, each row is given a weight of 1. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. More...
 
GMatrixcovarianceMatrix () const
 Computes the covariance matrix of the data. More...
 
void deleteColumns (size_t index, size_t count)
 Deletes some columns. This does not reallocate the rows, but it does shift the elements, which is a slow operation, especially if there are many columns that follow those being deleted. More...
 
void deleteRow (size_t index)
 Swaps the specified row with the last row, and then deletes it. More...
 
void deleteRowPreserveOrder (size_t index)
 Deletes the specified row and shifts everything after it up one slot. More...
 
double determinant ()
 Computes the determinant of this matrix. More...
 
double dihedralCorrelation (const GMatrix *pThat, GRand *pRand) const
 Computes the cosine of the dihedral angle between this subspace and pThat subspace. More...
 
bool doesHaveAnyMissingValues () const
 Returns true iff this matrix is missing any values. More...
 
void dropValue (size_t attr, int val)
 Drops any occurrences of the specified value, and removes it as a possible value. More...
 
double eigenValue (const GVec &eigenVector)
 Computes the eigenvalue that corresponds to the specified eigenvector of this matrix. More...
 
double eigenValue (const double *pMean, const double *pEigenVector, GRand *pRand) const
 Computes the eigenvalue that corresponds to *pEigenvector. More...
 
void eigenVector (double eigenvalue, GVec &outVector)
 Computes the eigenvector that corresponds to the specified eigenvalue of this matrix. Note that this method trashes this matrix, so make a copy first if you care. More...
 
GMatrixeigs (size_t nCount, GVec &eigenVals, GRand *pRand, bool mostSignificant)
 Computes nCount eigenvectors and the corresponding eigenvalues using the power method (which is only accurate if a small number of eigenvalues/vectors are needed.) More...
 
void ensureDataHasNoMissingNominals () const
 Throws an exception if this data contains any missing values in a nominal attribute. More...
 
void ensureDataHasNoMissingReals () const
 Throws an exception if this data contains any missing values in a continuous attribute. More...
 
double entropy (size_t nColumn) const
 Measures the entropy of the specified attribute. More...
 
void fill (double val, size_t colStart=0, size_t colCount=INVALID_INDEX)
 Fills all elements in the specified range of columns with the specified value. If no column ranges are specified, the default is to set all of them. More...
 
void fillNormal (GRand &rand, double deviation=1.0)
 Fills all elements with random values from a Normal distribution. More...
 
void fillUniform (GRand &rand, double min=0.0, double max=1.0)
 Fills all elements with random values from a uniform distribution. More...
 
void fixNans ()
 Replaces any occurrences of NAN in the matrix with the corresponding values from an identity matrix. More...
 
void flush ()
 Deletes all the rows in this matrix. More...
 
void fromVector (const double *pVector, size_t nRows)
 Copies the data from pVector over this dataset. More...
 
GVecfront ()
 Returns a pointer to the first row. More...
 
const GVecfront () const
 
bool gaussianElimination (double *pVector)
 Computes y in the equation M*y=x (or y=M^(-1)x), where M is this dataset, which must be a square matrix, and x is pVector as passed in, and y is pVector after the call. More...
 
bool isAttrHomogenous (size_t col) const
 Returns true iff the specified attribute contains homogenous values. (Unknowns are counted as homogenous with anything) More...
 
bool isHomogenous () const
 Returns true iff each of the last labelDims columns in the data are homogenous. More...
 
bool leastCorrelatedVector (GVec &out, const GMatrix *pThat, GRand *pRand) const
 Computes the vector in this subspace that has the greatest distance from its projection into pThat subspace. More...
 
double linearCorrelationCoefficient (size_t attr1, double attr1Origin, size_t attr2, double attr2Origin) const
 Computes the linear coefficient between the two specified attributes. More...
 
void load (const char *szFilename)
 Loads a file and automatically detects ARFF or raw (binary) More...
 
void loadArff (const char *szFilename)
 Loads an ARFF file and replaces the contents of this matrix with it. More...
 
void loadRaw (const char *szFilename)
 Loads a raw (binary) file and replaces the contents of this matrix with it. More...
 
void LUDecomposition ()
 Performs an in-place LU-decomposition, such that the lower triangle of this matrix (including the diagonal) specifies L, and the uppoer triangle of this matrix (not including the diagonal) specifies U, and all values of U along the diagonal are ones. (The upper triangle of L and the lower triangle of U are all zeros.) More...
 
void makeIdentity ()
 Sets this dataset to an identity matrix. (It doesn't change the number of columns or rows. It just stomps over existing values.) More...
 
double measureInfo () const
 Computes the sum entropy of the data (or the sum variance for continuous attributes) More...
 
void mergeVert (GMatrix *pData, bool ignoreMismatchingName=false)
 Steals all the rows from pData and adds them to this set. (You still have to delete pData.) Both datasets must have the same number of columns. More...
 
void mirrorTriangle (bool upperToLower)
 copies one of the triangular submatrices over the other, making a symmetric matrix. More...
 
void multiply (double scalar)
 Multiplies every element in the dataset by scalar. Behavior is undefined for nominal columns. More...
 
void multiply (const GVec &vectorIn, GVec &vectorOut, bool transpose=false) const
 Multiplies this matrix by the column vector pVectorIn to get pVectorOut. More...
 
void newColumns (size_t n)
 Adds 'n' new columns to the matrix. (This resizes every row and copies all the existing data, which is rather inefficient.) The values in the new columns are not initialized. More...
 
GVecnewRow ()
 Adds a new row to the matrix. (The values in the row are not initialized.) Returns a reference to the new row. More...
 
void newRows (size_t nRows)
 Adds "nRows" uninitialized rows to this matrix. More...
 
void normalizeColumn (size_t col, double dInMin, double dInMax, double dOutMin=0.0, double dOutMax=1.0)
 Normalizes the specified column. More...
 
GMatrixoperator= (const GMatrix &orig)
 Make *this into a copy of orig. More...
 
bool operator== (const GMatrix &other) const
 Returns true iff all the entries in *this and other are identical and their relations are compatible, and they are the same size. More...
 
GVecoperator[] (size_t index)
 Returns a pointer to the specified row. More...
 
const GVecoperator[] (size_t index) const
 Returns a const pointer to the specified row. More...
 
void pairedTTest (size_t *pOutV, double *pOutT, size_t attr1, size_t attr2, bool normalize) const
 Performs a paired T-Test with data from the two specified attributes. More...
 
void parseArff (const char *szFile, size_t nLen)
 Parses an ARFF file and replaces the contents of this matrix with it. More...
 
void parseArff (GArffTokenizer &tok)
 Parses an ARFF file and replaces the contents of this matrix with it. More...
 
void principalComponent (GVec &outVector, const GVec &centroid, GRand *pRand) const
 This is an efficient algorithm for iteratively computing the principal component vector (the eigenvector of the covariance matrix) of the data. More...
 
void principalComponentAboutOrigin (GVec &outVector, GRand *pRand) const
 Computes the first principal component assuming the mean is already subtracted out of the data. More...
 
void principalComponentIgnoreUnknowns (GVec &outVector, const GVec &centroid, GRand *pRand) const
 Computes principal components, while ignoring missing values. More...
 
void print (std::ostream &stream=std::cout, char separator= ',') const
 Prints this matrix in ARFF format to the specified stream. More...
 
GMatrixpseudoInverse ()
 Computes the Moore-Penrose pseudoinverse of this matrix (using the SVD method). You are responsible to delete the matrix this returns. More...
 
const GRelationrelation () const
 Returns a const pointer to the relation object, which holds meta-data about the attributes (columns) More...
 
void releaseAllRows ()
 Abandons (leaks) all the rows in this matrix. More...
 
GVecreleaseRow (size_t index)
 Swaps the specified row with the last row, and then releases it from the dataset. More...
 
GVecreleaseRowPreserveOrder (size_t index)
 Releases the specified row from the dataset and shifts everything after it up one slot. More...
 
void removeComponent (const GVec &centroid, const GVec &component)
 Removes the component specified by pComponent from the data. (pComponent should already be normalized.) More...
 
void removeComponentAboutOrigin (const GVec &component)
 Removes the specified component assuming the mean is zero. More...
 
void replaceMissingValuesRandomly (size_t nAttr, GRand *pRand)
 Replaces all missing values by copying a randomly selected non-missing value in the same attribute. More...
 
void replaceMissingValuesWithBaseline (size_t nAttr)
 Replace missing values with the appropriate measure of central tendency. More...
 
void reserve (size_t n)
 Allocates space for the specified number of patterns (to avoid superfluous resizing) More...
 
void resize (size_t rows, size_t cols)
 Resizes this matrix. Assigns all columns to be continuous, and replaces all element values with garbage. More...
 
void reverseRows ()
 Reverses the row order. More...
 
GVecrow (size_t index)
 Returns a pointer to the specified row. More...
 
const GVecrow (size_t index) const
 Returns a const pointer to the specified row. More...
 
size_t rows () const
 Returns the number of rows in the dataset. More...
 
void saveArff (const char *szFilename)
 Saves the dataset to a file in ARFF format. More...
 
void saveRaw (const char *szFilename)
 Saves the dataset to a file in raw (binary) format. More...
 
void scaleColumn (size_t col, double scalar)
 Scales the column by the specified scalar. More...
 
GDomNodeserialize (GDom *pDoc) const
 Marshalls this object to a DOM, which may be saved to a variety of serial formats. More...
 
void setCol (size_t index, const double *pVector)
 Copies pVector over the specified column. More...
 
void setRelation (GRelation *pRelation)
 Sets the relation for this dataset, which specifies the number of columns, and their data types. If there are one or more rows in this matrix, and the new relation does not have the same number of columns as the old relation, then this will throw an exception. Takes ownership of pRelation. That is, the destructor will delete it. More...
 
void shuffle (GRand &rand, GMatrix *pExtension=NULL)
 Randomizes the order of the rows. More...
 
void shuffle2 (GRand &rand, GMatrix &other)
 Shuffles the order of the rows. Also shuffles the rows in "other" in the same way, such that corresponding rows are preserved. More...
 
void shuffleLikeCards ()
 This is an inferior way to shuffle the data. More...
 
void singularValueDecomposition (GMatrix **ppU, double **ppDiag, GMatrix **ppV, bool throwIfNoConverge=false, size_t maxIters=80)
 Performs SVD on A, where A is this m-by-n matrix. More...
 
void sort (size_t nDimension)
 Sorts the data from smallest to largest in the specified dimension. More...
 
template<typename CompareFunc >
void sort (CompareFunc &pComparator)
 Sorts rows according to the specified compare function. (Return true to indicate that the first row comes before the second row.) More...
 
void sortPartial (size_t row, size_t col)
 This partially sorts the specified column, such that the specified row will contain the same row as if it were fully sorted, and previous rows will contain a value <= to it in that column, and later rows will contain a value >= to it in that column. Unlike sort, which has O(m*log(m)) complexity, this method has O(m) complexity. This might be useful, for example, for efficiently finding the row with a median value in some attribute, or for separating data by a threshold in some value. More...
 
void splitByPivot (GMatrix *pGreaterOrEqual, size_t nAttribute, double dPivot, GMatrix *pExtensionA=NULL, GMatrix *pExtensionB=NULL)
 Splits this set of data into two sets. Values greater-than-or-equal-to dPivot stay in this data set. Values less than dPivot go into pLessThanPivot. More...
 
void splitBySize (GMatrix &other, size_t nOtherRows)
 Removes the last nOtherRows rows from this data set and puts them in "other". (Order is preserved.) More...
 
void splitCategoricalKeepIfEqual (GMatrix *pOtherValues, size_t nAttr, int nValue, GMatrix *pExtensionA=NULL, GMatrix *pExtensionB=NULL)
 Moves all rows with the specified value in the specified attribute into pOtherValues. More...
 
void splitCategoricalKeepIfNotEqual (GMatrix *pSingleClass, size_t nAttr, int nValue, GMatrix *pExtensionA=NULL, GMatrix *pExtensionB=NULL)
 Moves all rows with the specified value in the specified attribute into pSingleClass. More...
 
void subtract (const GMatrix *pThat, bool transpose)
 Matrix subtract. Subtracts the values in *pThat from *this. More...
 
double sumSquaredDifference (const GMatrix &that, bool transpose=false) const
 Computes the squared distance between this and that. More...
 
double sumSquaredDiffWithIdentity ()
 Returns the sum squared difference between this matrix and an identity matrix. More...
 
double sumSquaredDistance (const GVec &point) const
 Computes the sum-squared distance between pPoint and all of the points in the dataset. More...
 
void swapColumns (size_t nAttr1, size_t nAttr2)
 Swaps two columns. More...
 
GVecswapRow (size_t i, GVec *pNewRow)
 Swap pNewRow in for row i, and return row i. The caller is then responsible to delete the row that is returned. More...
 
void swapRows (size_t a, size_t b)
 Swaps the two specified rows. More...
 
void takeRow (GVec *pRow, size_t pos=(size_t)-1)
 Adds an already-allocated row to this dataset. If pos is specified, the new row will be inserted and the speicified position. More...
 
size_t toReducedRowEchelonForm ()
 Converts the matrix to reduced row echelon form. More...
 
void toVector (double *pVector) const
 Copies all the data from this dataset into pVector. More...
 
double trace ()
 Returns the sum of the diagonal elements. More...
 
GMatrixtranspose ()
 Returns a pointer to a new dataset that is this dataset transposed. (All columns in the returned dataset will be continuous.) More...
 
void weightedPrincipalComponent (GVec &outVector, const GVec &centroid, const double *pWeights, GRand *pRand) const
 Computes the first principal component of the data with each row weighted according to the vector pWeights. (pWeights must have an element for each row.) More...
 
void wilcoxonSignedRanksTest (size_t attr1, size_t attr2, double tolerance, int *pNum, double *pWMinus, double *pWPlus) const
 Performs the Wilcoxon signed ranks test from the two specified attributes. More...
 

Static Public Member Functions

static GMatrixalign (GMatrix *pA, GMatrix *pB)
 This uses the Kabsch algorithm to rotate and translate pB in order to minimize RMS with pA. (pA and pB must have the same number of rows and columns.) More...
 
static GSimpleAssignment bipartiteMatching (GMatrix &a, GMatrix &b, GDistanceMetric &metric)
 Projects pPoint onto this hyperplane (where each row defines one of the orthonormal basis vectors of this hyperplane) More...
 
static GMatrixkabsch (GMatrix *pA, GMatrix *pB)
 This computes K=kabsch(A,B), such that K is an n-by-n matrix, where n is pA->cols(). K is the optimal orthonormal rotation matrix to align A and B, such that A(K^T) minimizes sum-squared error with B, and BK minimizes sum-squared error with A. (This rotates around the origin, so typically you will want to subtract the centroid from both pA and pB before calling this.) More...
 
static GMatrixmergeHoriz (const GMatrix *pSetA, const GMatrix *pSetB)
 Merges two datasets side-by-side. The resulting dataset will contain the attributes of both datasets. Both pSetA and pSetB (and the resulting dataset) must have the same number of rows. More...
 
static GMatrixmultiply (const GMatrix &a, const GMatrix &b, bool transposeA, bool transposeB)
 Matrix multiply. More...
 
static double normalizeValue (double dVal, double dInMin, double dInMax, double dOutMin=0.0, double dOutMax=1.0)
 Normalize a value from the input min and max to the output min and max. More...
 
static void test ()
 Performs unit tests for this class. Throws an exception if there is a failure. More...
 

Protected Member Functions

double determinantHelper (size_t nEndRow, size_t *pColumnList)
 
void inPlaceSquareTranspose ()
 
void singularValueDecompositionHelper (GMatrix **ppU, double **ppDiag, GMatrix **ppV, bool throwIfNoConverge, size_t maxIters)
 

Protected Attributes

GRelationm_pRelation
 
std::vector< GVec * > m_rows
 

Constructor & Destructor Documentation

GClasses::GMatrix::GMatrix ( )

Makes an empty 0x0 matrix.

GClasses::GMatrix::GMatrix ( size_t  rows,
size_t  cols 
)

Construct a rows x cols matrix with all elements of the matrix assumed to be continuous.

It is okay to initially set rows to 0 and later call newRow to add each row. Adding columns later, however, is not very computationally efficient.)

GClasses::GMatrix::GMatrix ( std::vector< size_t > &  attrValues)

Construct a matrix with a mixed relation. That is, one with some continuous attributes (columns), and some nominal attributes (columns).

attrValues specifies the number of nominal values suppored in each attribute (column), or 0 for a continuous attribute.

Initially, this matrix will have 0 rows, but you can add more rows by calling newRow or newRows.

GClasses::GMatrix::GMatrix ( GRelation pRelation)

Create an empty matrix whose attributes/column types are specified by pRelation.

Takes ownership of pRelation. That is, the destructor will delete pRelation.

Initially, this matrix will have 0 rows, but you can add more rows by calling newRow or newRows.

GClasses::GMatrix::GMatrix ( const GMatrix orig,
size_t  rowStart = 0,
size_t  colStart = 0,
size_t  rowCount = (size_t)-1,
size_t  colCount = (size_t)-1 
)

Copy-constructor.

Copies orig, making a new relation object and new storage for the rows (with the same content).

Parameters
origthe GMatrix object to copy
GClasses::GMatrix::GMatrix ( const GDomNode pNode)

Load from a DOM.

GClasses::GMatrix::~GMatrix ( )

Member Function Documentation

void GClasses::GMatrix::add ( const GMatrix pThat,
bool  transpose = false,
double  scalar = 1.0 
)

Matrix add.

Adds scalar * pThat to this. (If transpose is true, adds scalar * the transpose of pThat to this.) Both datasets must have the same dimensions. Behavior is undefined for nominal columns.

static GMatrix* GClasses::GMatrix::align ( GMatrix pA,
GMatrix pB 
)
static

This uses the Kabsch algorithm to rotate and translate pB in order to minimize RMS with pA. (pA and pB must have the same number of rows and columns.)

GVec& GClasses::GMatrix::back ( size_t  reverse_index = 0)
inline

Returns a pointer to a row indexed from the back of the matrix. index 0 (default) is the last row, index 1 is the second-to-last row, etc.

const GVec& GClasses::GMatrix::back ( size_t  reverse_index = 0) const
inline
double GClasses::GMatrix::baselineValue ( size_t  nAttribute) const

Returns the mean if the specified attribute is continuous, otherwise returns the most common nominal value in the attribute.

static GSimpleAssignment GClasses::GMatrix::bipartiteMatching ( GMatrix a,
GMatrix b,
GDistanceMetric metric 
)
static

Projects pPoint onto this hyperplane (where each row defines one of the orthonormal basis vectors of this hyperplane)

This computes (A^T)Ap, where A is this matrix, and p is pPoint. Projects pPoint onto this hyperplane (where each row defines one of the orthonormal basis vectors of this hyperplane) Performs a bipartite matching between the rows of a and b using the Linear Assignment Problem (LAP) optimizer

Treats the rows of the matrices a and b as vectors and calculates the distances between these vectors using cost. Returns an optimal assignment from rows of a to rows of b that minimizes sum of the costs of the assignments.

Each row is considered to be a vector in multidimensional space. The cost is the distance given by cost when called on each row of a and row of b in turn. The cost must not be $-\infty$ for any pair of rows. Other than that, there are no limitations on the cost function.

Because of the limitations of GDistanceMetric, a and b must have the same number of columns.

If $m$ is $\max(rows(a), rows(b))$ then this routine requires $\Theta(rows(a) \cdot rows(b))$ memory and $O(m^3)$ time.

Parameters
athe matrix containing the vectors of set a. Must have the same number of columns as the matrix containing the vectors of set b. Each row is considered to be a vector in multidimensional space.
bthe matrix containing the vectors of set b. Must have the same number of columns as the matrix containing the vectors of set a. Each row is considered to be a vector in multidimensional space.
metricgiven a row of a and a row of b, returns the cost of assigning a to b.
Returns
the optimal assignment in which each of the rows of a or b (whichever has fewer rows) is assigned to a row of the other matrix
double GClasses::GMatrix::boundingSphere ( GVec outCenter,
size_t *  pIndexes,
size_t  indexCount,
GDistanceMetric pMetric 
) const

Finds a sphere that tightly bounds all the points in the specified vector of row-indexes.

Returns the squared radius of the sphere, and stores its center in pOutCenter.

void GClasses::GMatrix::centerMeanAtOrigin ( )

Shifts the data such that the mean occurs at the origin. Only continuous values are affected. Nominal values are left unchanged.

void GClasses::GMatrix::centroid ( GVec outCentroid,
const double *  pWeights = NULL 
) const

Computes the arithmetic means of all attributes If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix.

GMatrix* GClasses::GMatrix::cholesky ( bool  tolerant = false)

This computes the square root of this matrix. (If you take the matrix that this returns and multiply it by its transpose, you should get the original dataset again.) (Returns a lower-triangular matrix.)

Behavior is undefined if there are nominal attributes. If tolerant is true, it will return even if it cannot compute accurate results. If tolerant is false (the default) and this matrix is not positive definite, it will throw an exception.

void GClasses::GMatrix::clipColumn ( size_t  col,
double  dMin,
double  dMax 
)

Clips the values in the specified column to fall beween dMin and dMax (inclusively).

void GClasses::GMatrix::col ( size_t  index,
double *  pOutVector 
)

Copies the specified column into pOutVector.

size_t GClasses::GMatrix::cols ( ) const
inline

Returns the number of columns in the dataset.

double GClasses::GMatrix::columnMax ( size_t  nAttribute) const

Returns the maximum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns -1e300 if there are no known values in the column.

double GClasses::GMatrix::columnMean ( size_t  nAttribute,
const double *  pWeights = NULL,
bool  throwIfEmpty = true 
) const

Computes the arithmetic mean of the values in the specified column If pWeights is NULL, then each row is given equal weight. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. If there are no values in this column with any weight, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE.

double GClasses::GMatrix::columnMedian ( size_t  nAttribute,
bool  throwIfEmpty = true 
) const

Computes the median of the values in the specified column If there are no values in this column, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE.

double GClasses::GMatrix::columnMin ( size_t  nAttribute) const

Returns the minimum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns 1e300 if there are no known values in the column.

double GClasses::GMatrix::columnSquaredMagnitude ( size_t  col) const

Returns the squared magnitude of the vector in the specified column.

double GClasses::GMatrix::columnSum ( size_t  col) const

Returns the sum of the values in the specified column.

double GClasses::GMatrix::columnSumSquaredDifference ( const GMatrix that,
size_t  col,
double *  pOutSAE = NULL 
) const

Computes the sum-squared distance between the specified column of this and that. If the column is a nominal attribute, then Hamming distance is used. if pOutSAE is not NULL, the sum absolute error will be placed there.

double GClasses::GMatrix::columnVariance ( size_t  nAttr,
double  mean 
) const

Computes the sample variance of a single attribute.

void GClasses::GMatrix::copy ( const GMatrix that,
size_t  rowStart = 0,
size_t  colStart = 0,
size_t  rowCount = (size_t)-1,
size_t  colCount = (size_t)-1 
)

Copies (deep) all the data and metadata from pThat.

void GClasses::GMatrix::copyBlock ( const GMatrix source,
size_t  srcRow = 0,
size_t  srcCol = 0,
size_t  hgt = INVALID_INDEX,
size_t  wid = INVALID_INDEX,
size_t  destRow = 0,
size_t  destCol = 0,
bool  checkMetaData = true 
)

Copies values from a rectangular region of the source matrix into this matrix. The wid and hgt values are clipped if they exceed the size of the source matrix. An exception is thrown if the destination is not big enough to hold the values at the specified location. If checkMetaData is true, then this will throw an exception if the data types are incompatible.

void GClasses::GMatrix::copyCols ( const GMatrix that,
size_t  firstCol,
size_t  colCount 
)

Copies the specified range of columns (including meta-data) from that matrix into this matrix, replacing all data currently in this matrix.

size_t GClasses::GMatrix::countPrincipalComponents ( double  d,
GRand pRand 
) const

Computes the minimum number of principal components necessary so that less than the specified portion of the deviation in the data is unaccounted for.

For example, if the data projected onto the first 3 principal components contains 90 percent of the deviation that the original data contains, then if you pass the value 0.1 to this method, it will return 3.

size_t GClasses::GMatrix::countUniqueValues ( size_t  col,
size_t  maxCount = (size_t)-1 
) const

Counts the number of unique values in the specified column. If maxCount unique values are found, it immediately returns maxCount.

size_t GClasses::GMatrix::countValue ( size_t  attribute,
double  value 
) const

Returns the number of ocurrences of the specified value in the specified attribute.

double GClasses::GMatrix::covariance ( size_t  nAttr1,
double  dMean1,
size_t  nAttr2,
double  dMean2,
const double *  pWeights = NULL 
) const

Computes the covariance between two attributes. If pWeights is NULL, each row is given a weight of 1. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix.

GMatrix* GClasses::GMatrix::covarianceMatrix ( ) const

Computes the covariance matrix of the data.

void GClasses::GMatrix::deleteColumns ( size_t  index,
size_t  count 
)

Deletes some columns. This does not reallocate the rows, but it does shift the elements, which is a slow operation, especially if there are many columns that follow those being deleted.

void GClasses::GMatrix::deleteRow ( size_t  index)

Swaps the specified row with the last row, and then deletes it.

void GClasses::GMatrix::deleteRowPreserveOrder ( size_t  index)

Deletes the specified row and shifts everything after it up one slot.

double GClasses::GMatrix::determinant ( )

Computes the determinant of this matrix.

double GClasses::GMatrix::determinantHelper ( size_t  nEndRow,
size_t *  pColumnList 
)
protected
double GClasses::GMatrix::dihedralCorrelation ( const GMatrix pThat,
GRand pRand 
) const

Computes the cosine of the dihedral angle between this subspace and pThat subspace.

bool GClasses::GMatrix::doesHaveAnyMissingValues ( ) const

Returns true iff this matrix is missing any values.

void GClasses::GMatrix::dropValue ( size_t  attr,
int  val 
)

Drops any occurrences of the specified value, and removes it as a possible value.

double GClasses::GMatrix::eigenValue ( const GVec eigenVector)

Computes the eigenvalue that corresponds to the specified eigenvector of this matrix.

double GClasses::GMatrix::eigenValue ( const double *  pMean,
const double *  pEigenVector,
GRand pRand 
) const

Computes the eigenvalue that corresponds to *pEigenvector.

After you compute the principal component, you can call this to obtain the eigenvalue that corresponds to that principal component vector (eigenvector).

void GClasses::GMatrix::eigenVector ( double  eigenvalue,
GVec outVector 
)

Computes the eigenvector that corresponds to the specified eigenvalue of this matrix. Note that this method trashes this matrix, so make a copy first if you care.

GMatrix* GClasses::GMatrix::eigs ( size_t  nCount,
GVec eigenVals,
GRand pRand,
bool  mostSignificant 
)

Computes nCount eigenvectors and the corresponding eigenvalues using the power method (which is only accurate if a small number of eigenvalues/vectors are needed.)

If mostSignificant is true, the largest eigenvalues are found. If mostSignificant is false, the smallest eigenvalues are found.

void GClasses::GMatrix::ensureDataHasNoMissingNominals ( ) const

Throws an exception if this data contains any missing values in a nominal attribute.

void GClasses::GMatrix::ensureDataHasNoMissingReals ( ) const

Throws an exception if this data contains any missing values in a continuous attribute.

double GClasses::GMatrix::entropy ( size_t  nColumn) const

Measures the entropy of the specified attribute.

void GClasses::GMatrix::fill ( double  val,
size_t  colStart = 0,
size_t  colCount = INVALID_INDEX 
)

Fills all elements in the specified range of columns with the specified value. If no column ranges are specified, the default is to set all of them.

void GClasses::GMatrix::fillNormal ( GRand rand,
double  deviation = 1.0 
)

Fills all elements with random values from a Normal distribution.

void GClasses::GMatrix::fillUniform ( GRand rand,
double  min = 0.0,
double  max = 1.0 
)

Fills all elements with random values from a uniform distribution.

void GClasses::GMatrix::fixNans ( )

Replaces any occurrences of NAN in the matrix with the corresponding values from an identity matrix.

void GClasses::GMatrix::flush ( )

Deletes all the rows in this matrix.

void GClasses::GMatrix::fromVector ( const double *  pVector,
size_t  nRows 
)

Copies the data from pVector over this dataset.

nRows specifies the number of rows of data in pVector.

GVec& GClasses::GMatrix::front ( )
inline

Returns a pointer to the first row.

const GVec& GClasses::GMatrix::front ( ) const
inline
bool GClasses::GMatrix::gaussianElimination ( double *  pVector)

Computes y in the equation M*y=x (or y=M^(-1)x), where M is this dataset, which must be a square matrix, and x is pVector as passed in, and y is pVector after the call.

If there are multiple solutions, it finds the one for which all the variables in the null-space have a value of 1. If there are no solutions, it returns false. Note that this method trashes this dataset (so make a copy first if you care).

void GClasses::GMatrix::inPlaceSquareTranspose ( )
protected
bool GClasses::GMatrix::isAttrHomogenous ( size_t  col) const

Returns true iff the specified attribute contains homogenous values. (Unknowns are counted as homogenous with anything)

bool GClasses::GMatrix::isHomogenous ( ) const

Returns true iff each of the last labelDims columns in the data are homogenous.

static GMatrix* GClasses::GMatrix::kabsch ( GMatrix pA,
GMatrix pB 
)
static

This computes K=kabsch(A,B), such that K is an n-by-n matrix, where n is pA->cols(). K is the optimal orthonormal rotation matrix to align A and B, such that A(K^T) minimizes sum-squared error with B, and BK minimizes sum-squared error with A. (This rotates around the origin, so typically you will want to subtract the centroid from both pA and pB before calling this.)

bool GClasses::GMatrix::leastCorrelatedVector ( GVec out,
const GMatrix pThat,
GRand pRand 
) const

Computes the vector in this subspace that has the greatest distance from its projection into pThat subspace.

Returns true if the results are computed.

Returns false if the subspaces are so nearly parallel that pOut cannot be computed with accuracy.

double GClasses::GMatrix::linearCorrelationCoefficient ( size_t  attr1,
double  attr1Origin,
size_t  attr2,
double  attr2Origin 
) const

Computes the linear coefficient between the two specified attributes.

Usually you will want to pass the mean values for attr1Origin and attr2Origin.

void GClasses::GMatrix::load ( const char *  szFilename)

Loads a file and automatically detects ARFF or raw (binary)

void GClasses::GMatrix::loadArff ( const char *  szFilename)

Loads an ARFF file and replaces the contents of this matrix with it.

void GClasses::GMatrix::loadRaw ( const char *  szFilename)

Loads a raw (binary) file and replaces the contents of this matrix with it.

void GClasses::GMatrix::LUDecomposition ( )

Performs an in-place LU-decomposition, such that the lower triangle of this matrix (including the diagonal) specifies L, and the uppoer triangle of this matrix (not including the diagonal) specifies U, and all values of U along the diagonal are ones. (The upper triangle of L and the lower triangle of U are all zeros.)

void GClasses::GMatrix::makeIdentity ( )

Sets this dataset to an identity matrix. (It doesn't change the number of columns or rows. It just stomps over existing values.)

double GClasses::GMatrix::measureInfo ( ) const

Computes the sum entropy of the data (or the sum variance for continuous attributes)

static GMatrix* GClasses::GMatrix::mergeHoriz ( const GMatrix pSetA,
const GMatrix pSetB 
)
static

Merges two datasets side-by-side. The resulting dataset will contain the attributes of both datasets. Both pSetA and pSetB (and the resulting dataset) must have the same number of rows.

void GClasses::GMatrix::mergeVert ( GMatrix pData,
bool  ignoreMismatchingName = false 
)

Steals all the rows from pData and adds them to this set. (You still have to delete pData.) Both datasets must have the same number of columns.

void GClasses::GMatrix::mirrorTriangle ( bool  upperToLower)

copies one of the triangular submatrices over the other, making a symmetric matrix.

Parameters
upperToLowerIf true, copies the upper triangle of this matrix over the lower triangle. Otherwise, copies the lower triangle of this matrix over the upper triangle
void GClasses::GMatrix::multiply ( double  scalar)

Multiplies every element in the dataset by scalar. Behavior is undefined for nominal columns.

void GClasses::GMatrix::multiply ( const GVec vectorIn,
GVec vectorOut,
bool  transpose = false 
) const

Multiplies this matrix by the column vector pVectorIn to get pVectorOut.

(If transpose is true, then it multiplies the transpose of this matrix by pVectorIn to get pVectorOut.)

pVectorIn should have the same number of elements as columns (or rows if transpose is true)

pVectorOut should have the same number of elements as rows (or cols, if transpose is true.)

Note
if transpose is true, then pVectorIn is treated as a row vector and is multiplied by this matrix to get pVectorOut.
static GMatrix* GClasses::GMatrix::multiply ( const GMatrix a,
const GMatrix b,
bool  transposeA,
bool  transposeB 
)
static

Matrix multiply.

For convenience, you can also specify that neither, one, or both of the inputs are virtually transposed prior to the multiplication. (If you want the results to come out transposed, you can use the equality (AB)^T=(B^T)(A^T) to figure out how to specify the parameters.)

void GClasses::GMatrix::newColumns ( size_t  n)

Adds 'n' new columns to the matrix. (This resizes every row and copies all the existing data, which is rather inefficient.) The values in the new columns are not initialized.

GVec& GClasses::GMatrix::newRow ( )

Adds a new row to the matrix. (The values in the row are not initialized.) Returns a reference to the new row.

void GClasses::GMatrix::newRows ( size_t  nRows)

Adds "nRows" uninitialized rows to this matrix.

void GClasses::GMatrix::normalizeColumn ( size_t  col,
double  dInMin,
double  dInMax,
double  dOutMin = 0.0,
double  dOutMax = 1.0 
)

Normalizes the specified column.

static double GClasses::GMatrix::normalizeValue ( double  dVal,
double  dInMin,
double  dInMax,
double  dOutMin = 0.0,
double  dOutMax = 1.0 
)
static

Normalize a value from the input min and max to the output min and max.

GMatrix& GClasses::GMatrix::operator= ( const GMatrix orig)

Make *this into a copy of orig.

Copies orig, making a new relation object and new storage for the rows (with the same content).

Parameters
origthe GMatrix object to copy
Returns
a reference to this GMatrix object
bool GClasses::GMatrix::operator== ( const GMatrix other) const

Returns true iff all the entries in *this and other are identical and their relations are compatible, and they are the same size.

Returns
true iff all the entries in *this and other are identical, their relations are compatible, and they are the same size
GVec& GClasses::GMatrix::operator[] ( size_t  index)
inline

Returns a pointer to the specified row.

const GVec& GClasses::GMatrix::operator[] ( size_t  index) const
inline

Returns a const pointer to the specified row.

void GClasses::GMatrix::pairedTTest ( size_t *  pOutV,
double *  pOutT,
size_t  attr1,
size_t  attr2,
bool  normalize 
) const

Performs a paired T-Test with data from the two specified attributes.

pOutV will hold the degrees of freedom. pOutT will hold the T-value. You can use GMath::tTestAlphaValue to convert these to a P-value.

void GClasses::GMatrix::parseArff ( const char *  szFile,
size_t  nLen 
)

Parses an ARFF file and replaces the contents of this matrix with it.

void GClasses::GMatrix::parseArff ( GArffTokenizer &  tok)

Parses an ARFF file and replaces the contents of this matrix with it.

void GClasses::GMatrix::principalComponent ( GVec outVector,
const GVec centroid,
GRand pRand 
) const

This is an efficient algorithm for iteratively computing the principal component vector (the eigenvector of the covariance matrix) of the data.

See "EM Algorithms for PCA and SPCA" by Sam Roweis, 1998 NIPS.

The size of pOutVector will be the number of columns in this matrix. (To compute the next principal component, call RemoveComponent, then call this method again.)

void GClasses::GMatrix::principalComponentAboutOrigin ( GVec outVector,
GRand pRand 
) const

Computes the first principal component assuming the mean is already subtracted out of the data.

void GClasses::GMatrix::principalComponentIgnoreUnknowns ( GVec outVector,
const GVec centroid,
GRand pRand 
) const

Computes principal components, while ignoring missing values.

void GClasses::GMatrix::print ( std::ostream &  stream = std::cout,
char  separator = ',' 
) const

Prints this matrix in ARFF format to the specified stream.

GMatrix* GClasses::GMatrix::pseudoInverse ( )

Computes the Moore-Penrose pseudoinverse of this matrix (using the SVD method). You are responsible to delete the matrix this returns.

const GRelation& GClasses::GMatrix::relation ( ) const
inline

Returns a const pointer to the relation object, which holds meta-data about the attributes (columns)

void GClasses::GMatrix::releaseAllRows ( )

Abandons (leaks) all the rows in this matrix.

GVec* GClasses::GMatrix::releaseRow ( size_t  index)

Swaps the specified row with the last row, and then releases it from the dataset.

The caller is responsible to delete the row (array of doubles) this method returns.

GVec* GClasses::GMatrix::releaseRowPreserveOrder ( size_t  index)

Releases the specified row from the dataset and shifts everything after it up one slot.

The caller is responsible to delete the row this method returns.

void GClasses::GMatrix::removeComponent ( const GVec centroid,
const GVec component 
)

Removes the component specified by pComponent from the data. (pComponent should already be normalized.)

This might be useful, for example, to remove the first principal component from the data so you can then proceed to compute the second principal component, and so forth.

void GClasses::GMatrix::removeComponentAboutOrigin ( const GVec component)

Removes the specified component assuming the mean is zero.

void GClasses::GMatrix::replaceMissingValuesRandomly ( size_t  nAttr,
GRand pRand 
)

Replaces all missing values by copying a randomly selected non-missing value in the same attribute.

void GClasses::GMatrix::replaceMissingValuesWithBaseline ( size_t  nAttr)

Replace missing values with the appropriate measure of central tendency.

If the specified attribute is continuous, replaces all missing values in that attribute with the mean. If the specified attribute is nominal, replaces all missing values in that attribute with the most common value.

void GClasses::GMatrix::reserve ( size_t  n)
inline

Allocates space for the specified number of patterns (to avoid superfluous resizing)

void GClasses::GMatrix::resize ( size_t  rows,
size_t  cols 
)

Resizes this matrix. Assigns all columns to be continuous, and replaces all element values with garbage.

void GClasses::GMatrix::reverseRows ( )

Reverses the row order.

GVec& GClasses::GMatrix::row ( size_t  index)
inline

Returns a pointer to the specified row.

const GVec& GClasses::GMatrix::row ( size_t  index) const
inline

Returns a const pointer to the specified row.

size_t GClasses::GMatrix::rows ( ) const
inline

Returns the number of rows in the dataset.

void GClasses::GMatrix::saveArff ( const char *  szFilename)

Saves the dataset to a file in ARFF format.

void GClasses::GMatrix::saveRaw ( const char *  szFilename)

Saves the dataset to a file in raw (binary) format.

void GClasses::GMatrix::scaleColumn ( size_t  col,
double  scalar 
)

Scales the column by the specified scalar.

GDomNode* GClasses::GMatrix::serialize ( GDom pDoc) const

Marshalls this object to a DOM, which may be saved to a variety of serial formats.

void GClasses::GMatrix::setCol ( size_t  index,
const double *  pVector 
)

Copies pVector over the specified column.

void GClasses::GMatrix::setRelation ( GRelation pRelation)

Sets the relation for this dataset, which specifies the number of columns, and their data types. If there are one or more rows in this matrix, and the new relation does not have the same number of columns as the old relation, then this will throw an exception. Takes ownership of pRelation. That is, the destructor will delete it.

void GClasses::GMatrix::shuffle ( GRand rand,
GMatrix pExtension = NULL 
)

Randomizes the order of the rows.

If pExtension is non-NULL, then it will also be shuffled such that corresponding rows are preserved.

void GClasses::GMatrix::shuffle2 ( GRand rand,
GMatrix other 
)

Shuffles the order of the rows. Also shuffles the rows in "other" in the same way, such that corresponding rows are preserved.

void GClasses::GMatrix::shuffleLikeCards ( )

This is an inferior way to shuffle the data.

void GClasses::GMatrix::singularValueDecomposition ( GMatrix **  ppU,
double **  ppDiag,
GMatrix **  ppV,
bool  throwIfNoConverge = false,
size_t  maxIters = 80 
)

Performs SVD on A, where A is this m-by-n matrix.

You are responsible to delete(*ppU), delete(*ppV), and delete[] *ppDiag.

Parameters
ppU*ppU will be set to an m-by-m matrix where the columns are the *eigenvectors of A(A^T).
ppDiag*ppDiag will be set to an array of n doubles holding the square roots of the corresponding eigenvalues.
ppV*ppV will be set to an n-by-n matrix where the rows are the eigenvectors of (A^T)A.
throwIfNoConvergeif true, throws an exception if the SVD solver does not converge. does nothing otherwise
maxItersthe maximum number of iterations to perform in the SVD solver
void GClasses::GMatrix::singularValueDecompositionHelper ( GMatrix **  ppU,
double **  ppDiag,
GMatrix **  ppV,
bool  throwIfNoConverge,
size_t  maxIters 
)
protected
void GClasses::GMatrix::sort ( size_t  nDimension)

Sorts the data from smallest to largest in the specified dimension.

template<typename CompareFunc >
void GClasses::GMatrix::sort ( CompareFunc &  pComparator)
inline

Sorts rows according to the specified compare function. (Return true to indicate that the first row comes before the second row.)

void GClasses::GMatrix::sortPartial ( size_t  row,
size_t  col 
)

This partially sorts the specified column, such that the specified row will contain the same row as if it were fully sorted, and previous rows will contain a value <= to it in that column, and later rows will contain a value >= to it in that column. Unlike sort, which has O(m*log(m)) complexity, this method has O(m) complexity. This might be useful, for example, for efficiently finding the row with a median value in some attribute, or for separating data by a threshold in some value.

void GClasses::GMatrix::splitByPivot ( GMatrix pGreaterOrEqual,
size_t  nAttribute,
double  dPivot,
GMatrix pExtensionA = NULL,
GMatrix pExtensionB = NULL 
)

Splits this set of data into two sets. Values greater-than-or-equal-to dPivot stay in this data set. Values less than dPivot go into pLessThanPivot.

If pExtensionA is non-NULL, then it will also split pExtensionA such that corresponding rows are preserved.

void GClasses::GMatrix::splitBySize ( GMatrix other,
size_t  nOtherRows 
)

Removes the last nOtherRows rows from this data set and puts them in "other". (Order is preserved.)

void GClasses::GMatrix::splitCategoricalKeepIfEqual ( GMatrix pOtherValues,
size_t  nAttr,
int  nValue,
GMatrix pExtensionA = NULL,
GMatrix pExtensionB = NULL 
)

Moves all rows with the specified value in the specified attribute into pOtherValues.

If pExtensionA is non-NULL, then it will also split pExtensionA such that corresponding rows are preserved.

void GClasses::GMatrix::splitCategoricalKeepIfNotEqual ( GMatrix pSingleClass,
size_t  nAttr,
int  nValue,
GMatrix pExtensionA = NULL,
GMatrix pExtensionB = NULL 
)

Moves all rows with the specified value in the specified attribute into pSingleClass.

If pExtensionA is non-NULL, then it will also split pExtensionA such that corresponding rows are preserved.

void GClasses::GMatrix::subtract ( const GMatrix pThat,
bool  transpose 
)

Matrix subtract. Subtracts the values in *pThat from *this.

(If transpose is true, subtracts the transpose of *pThat from this.) Both datasets must have the same dimensions. Behavior is undefined for nominal columns.

Parameters
pThatpointer to the matrix to subtract from *this
transposeIf true, the transpose of *pThat is subtracted. If false, *pThat is subtracted
double GClasses::GMatrix::sumSquaredDifference ( const GMatrix that,
bool  transpose = false 
) const

Computes the squared distance between this and that.

If transpose is true, computes the difference between this and the transpose of that.

double GClasses::GMatrix::sumSquaredDiffWithIdentity ( )

Returns the sum squared difference between this matrix and an identity matrix.

double GClasses::GMatrix::sumSquaredDistance ( const GVec point) const

Computes the sum-squared distance between pPoint and all of the points in the dataset.

If pPoint is NULL, it computes the sum-squared distance with the origin.

Note
that this is equal to the sum of all the eigenvalues times the number of dimensions, so you can efficiently compute eigenvalues as the difference in sumSquaredDistance with the mean after removing the corresponding component, and then dividing by the number of dimensions. This is more efficient than calling eigenValue.
void GClasses::GMatrix::swapColumns ( size_t  nAttr1,
size_t  nAttr2 
)

Swaps two columns.

GVec* GClasses::GMatrix::swapRow ( size_t  i,
GVec pNewRow 
)

Swap pNewRow in for row i, and return row i. The caller is then responsible to delete the row that is returned.

void GClasses::GMatrix::swapRows ( size_t  a,
size_t  b 
)

Swaps the two specified rows.

void GClasses::GMatrix::takeRow ( GVec pRow,
size_t  pos = (size_t)-1 
)

Adds an already-allocated row to this dataset. If pos is specified, the new row will be inserted and the speicified position.

static void GClasses::GMatrix::test ( )
static

Performs unit tests for this class. Throws an exception if there is a failure.

size_t GClasses::GMatrix::toReducedRowEchelonForm ( )

Converts the matrix to reduced row echelon form.

void GClasses::GMatrix::toVector ( double *  pVector) const

Copies all the data from this dataset into pVector.

pVector must be big enough to hold rows() * cols() doubles.

double GClasses::GMatrix::trace ( )

Returns the sum of the diagonal elements.

GMatrix* GClasses::GMatrix::transpose ( )

Returns a pointer to a new dataset that is this dataset transposed. (All columns in the returned dataset will be continuous.)

The returned matrix must be deleted by the caller.

Returns
A pointer to a new dataset that is this dataset transposed. All columns in the returned dataset will be continuous. The caller is responsible for deleting the returned dataset.
void GClasses::GMatrix::weightedPrincipalComponent ( GVec outVector,
const GVec centroid,
const double *  pWeights,
GRand pRand 
) const

Computes the first principal component of the data with each row weighted according to the vector pWeights. (pWeights must have an element for each row.)

void GClasses::GMatrix::wilcoxonSignedRanksTest ( size_t  attr1,
size_t  attr2,
double  tolerance,
int *  pNum,
double *  pWMinus,
double *  pWPlus 
) const

Performs the Wilcoxon signed ranks test from the two specified attributes.

If two values are closer than tolerance, they are considered to be equal.

Member Data Documentation

GRelation* GClasses::GMatrix::m_pRelation
protected
std::vector<GVec*> GClasses::GMatrix::m_rows
protected