Detailed Description

Represents a matrix or a database table.

Elements can be discrete or continuous.

References a GRelation object, which stores the meta-information about each column.

#include <GMatrix.h>

Public Member Functions
	GMatrix ()
	Makes an empty 0x0 matrix. More...

	GMatrix (size_t rows, size_t cols)
	Construct a rows x cols matrix with all elements of the matrix assumed to be continuous. More...

	GMatrix (std::vector< size_t > &attrValues)
	Construct a matrix with a mixed relation. That is, one with some continuous attributes (columns), and some nominal attributes (columns). More...

	GMatrix (GRelation *pRelation)
	Create an empty matrix whose attributes/column types are specified by pRelation. More...

	GMatrix (const GMatrix &orig, size_t rowStart=0, size_t colStart=0, size_t rowCount=(size_t)-1, size_t colCount=(size_t)-1)
	Copy-constructor. More...

	GMatrix (const GDomNode *pNode)
	Load from a DOM. More...

	~GMatrix ()

void	add (const GMatrix *pThat, bool transpose=false, double scalar=1.0)
	Matrix add. More...

GVec &	back (size_t reverse_index=0)
	Returns a pointer to a row indexed from the back of the matrix. index 0 (default) is the last row, index 1 is the second-to-last row, etc. More...

const GVec &	back (size_t reverse_index=0) const

double	baselineValue (size_t nAttribute) const
	Returns the mean if the specified attribute is continuous, otherwise returns the most common nominal value in the attribute. More...

double	boundingSphere (GVec &outCenter, size_t pIndexes, size_t indexCount, GDistanceMetric pMetric) const
	Finds a sphere that tightly bounds all the points in the specified vector of row-indexes. More...

void	centerMeanAtOrigin ()
	Shifts the data such that the mean occurs at the origin. Only continuous values are affected. Nominal values are left unchanged. More...

void	centroid (GVec &outCentroid, const double *pWeights=NULL) const
	Computes the arithmetic means of all attributes If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. More...

GMatrix *	cholesky (bool tolerant=false)
	This computes the square root of this matrix. (If you take the matrix that this returns and multiply it by its transpose, you should get the original dataset again.) (Returns a lower-triangular matrix.) More...

void	clipColumn (size_t col, double dMin, double dMax)
	Clips the values in the specified column to fall beween dMin and dMax (inclusively). More...

void	col (size_t index, double *pOutVector)
	Copies the specified column into pOutVector. More...

size_t	cols () const
	Returns the number of columns in the dataset. More...

double	columnMax (size_t nAttribute) const
	Returns the maximum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns -1e300 if there are no known values in the column. More...

double	columnMean (size_t nAttribute, const double *pWeights=NULL, bool throwIfEmpty=true) const
	Computes the arithmetic mean of the values in the specified column If pWeights is NULL, then each row is given equal weight. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. If there are no values in this column with any weight, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE. More...

double	columnMedian (size_t nAttribute, bool throwIfEmpty=true) const
	Computes the median of the values in the specified column If there are no values in this column, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE. More...

double	columnMin (size_t nAttribute) const
	Returns the minimum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns 1e300 if there are no known values in the column. More...

double	columnSquaredMagnitude (size_t col) const
	Returns the squared magnitude of the vector in the specified column. More...

double	columnSum (size_t col) const
	Returns the sum of the values in the specified column. More...

double	columnSumSquaredDifference (const GMatrix &that, size_t col, double *pOutSAE=NULL) const
	Computes the sum-squared distance between the specified column of this and that. If the column is a nominal attribute, then Hamming distance is used. if pOutSAE is not NULL, the sum absolute error will be placed there. More...

double	columnVariance (size_t nAttr, double mean) const
	Computes the sample variance of a single attribute. More...

void	copy (const GMatrix &that, size_t rowStart=0, size_t colStart=0, size_t rowCount=(size_t)-1, size_t colCount=(size_t)-1)
	Copies (deep) all the data and metadata from pThat. More...

void	copyBlock (const GMatrix &source, size_t srcRow=0, size_t srcCol=0, size_t hgt=INVALID_INDEX, size_t wid=INVALID_INDEX, size_t destRow=0, size_t destCol=0, bool checkMetaData=true)
	Copies values from a rectangular region of the source matrix into this matrix. The wid and hgt values are clipped if they exceed the size of the source matrix. An exception is thrown if the destination is not big enough to hold the values at the specified location. If checkMetaData is true, then this will throw an exception if the data types are incompatible. More...

void	copyCols (const GMatrix &that, size_t firstCol, size_t colCount)
	Copies the specified range of columns (including meta-data) from that matrix into this matrix, replacing all data currently in this matrix. More...

size_t	countPrincipalComponents (double d, GRand *pRand) const
	Computes the minimum number of principal components necessary so that less than the specified portion of the deviation in the data is unaccounted for. More...

size_t	countUniqueValues (size_t col, size_t maxCount=(size_t)-1) const
	Counts the number of unique values in the specified column. If maxCount unique values are found, it immediately returns maxCount. More...

size_t	countValue (size_t attribute, double value) const
	Returns the number of ocurrences of the specified value in the specified attribute. More...

double	covariance (size_t nAttr1, double dMean1, size_t nAttr2, double dMean2, const double *pWeights=NULL) const
	Computes the covariance between two attributes. If pWeights is NULL, each row is given a weight of 1. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. More...

GMatrix *	covarianceMatrix () const
	Computes the covariance matrix of the data. More...

void	deleteColumns (size_t index, size_t count)
	Deletes some columns. This does not reallocate the rows, but it does shift the elements, which is a slow operation, especially if there are many columns that follow those being deleted. More...

void	deleteRow (size_t index)
	Swaps the specified row with the last row, and then deletes it. More...

void	deleteRowPreserveOrder (size_t index)
	Deletes the specified row and shifts everything after it up one slot. More...

double	determinant ()
	Computes the determinant of this matrix. More...

double	dihedralCorrelation (const GMatrix pThat, GRand pRand) const
	Computes the cosine of the dihedral angle between this subspace and pThat subspace. More...

bool	doesHaveAnyMissingValues () const
	Returns true iff this matrix is missing any values. More...

void	dropValue (size_t attr, int val)
	Drops any occurrences of the specified value, and removes it as a possible value. More...

double	eigenValue (const GVec &eigenVector)
	Computes the eigenvalue that corresponds to the specified eigenvector of this matrix. More...

double	eigenValue (const double pMean, const double pEigenVector, GRand *pRand) const
	Computes the eigenvalue that corresponds to *pEigenvector. More...

void	eigenVector (double eigenvalue, GVec &outVector)
	Computes the eigenvector that corresponds to the specified eigenvalue of this matrix. Note that this method trashes this matrix, so make a copy first if you care. More...

GMatrix *	eigs (size_t nCount, GVec &eigenVals, GRand *pRand, bool mostSignificant)
	Computes nCount eigenvectors and the corresponding eigenvalues using the power method (which is only accurate if a small number of eigenvalues/vectors are needed.) More...

void	ensureDataHasNoMissingNominals () const
	Throws an exception if this data contains any missing values in a nominal attribute. More...

void	ensureDataHasNoMissingReals () const
	Throws an exception if this data contains any missing values in a continuous attribute. More...

double	entropy (size_t nColumn) const
	Measures the entropy of the specified attribute. More...

void	fill (double val, size_t colStart=0, size_t colCount=INVALID_INDEX)
	Fills all elements in the specified range of columns with the specified value. If no column ranges are specified, the default is to set all of them. More...

void	fillNormal (GRand &rand, double deviation=1.0)
	Fills all elements with random values from a Normal distribution. More...

void	fillUniform (GRand &rand, double min=0.0, double max=1.0)
	Fills all elements with random values from a uniform distribution. More...

void	fixNans ()
	Replaces any occurrences of NAN in the matrix with the corresponding values from an identity matrix. More...

void	flush ()
	Deletes all the rows in this matrix. More...

void	fromVector (const double *pVector, size_t nRows)
	Copies the data from pVector over this dataset. More...

GVec &	front ()
	Returns a pointer to the first row. More...

const GVec &	front () const

bool	gaussianElimination (double *pVector)
	Computes y in the equation M*y=x (or y=M^(-1)x), where M is this dataset, which must be a square matrix, and x is pVector as passed in, and y is pVector after the call. More...

bool	isAttrHomogenous (size_t col) const
	Returns true iff the specified attribute contains homogenous values. (Unknowns are counted as homogenous with anything) More...

bool	isHomogenous () const
	Returns true iff each of the last labelDims columns in the data are homogenous. More...

bool	leastCorrelatedVector (GVec &out, const GMatrix pThat, GRand pRand) const
	Computes the vector in this subspace that has the greatest distance from its projection into pThat subspace. More...

double	linearCorrelationCoefficient (size_t attr1, double attr1Origin, size_t attr2, double attr2Origin) const
	Computes the linear coefficient between the two specified attributes. More...

void	load (const char *szFilename)
	Loads a file and automatically detects ARFF or raw (binary) More...

void	loadArff (const char *szFilename)
	Loads an ARFF file and replaces the contents of this matrix with it. More...

void	loadRaw (const char *szFilename)
	Loads a raw (binary) file and replaces the contents of this matrix with it. More...

void	LUDecomposition ()
	Performs an in-place LU-decomposition, such that the lower triangle of this matrix (including the diagonal) specifies L, and the uppoer triangle of this matrix (not including the diagonal) specifies U, and all values of U along the diagonal are ones. (The upper triangle of L and the lower triangle of U are all zeros.) More...

void	makeIdentity ()
	Sets this dataset to an identity matrix. (It doesn't change the number of columns or rows. It just stomps over existing values.) More...

double	measureInfo () const
	Computes the sum entropy of the data (or the sum variance for continuous attributes) More...

void	mergeVert (GMatrix *pData, bool ignoreMismatchingName=false)
	Steals all the rows from pData and adds them to this set. (You still have to delete pData.) Both datasets must have the same number of columns. More...

void	mirrorTriangle (bool upperToLower)
	copies one of the triangular submatrices over the other, making a symmetric matrix. More...

void	multiply (double scalar)
	Multiplies every element in the dataset by scalar. Behavior is undefined for nominal columns. More...

void	multiply (const GVec &vectorIn, GVec &vectorOut, bool transpose=false) const
	Multiplies this matrix by the column vector pVectorIn to get pVectorOut. More...

void	newColumns (size_t n)
	Adds 'n' new columns to the matrix. (This resizes every row and copies all the existing data, which is rather inefficient.) The values in the new columns are not initialized. More...

GVec &	newRow ()
	Adds a new row to the matrix. (The values in the row are not initialized.) Returns a reference to the new row. More...

void	newRows (size_t nRows)
	Adds "nRows" uninitialized rows to this matrix. More...

void	normalizeColumn (size_t col, double dInMin, double dInMax, double dOutMin=0.0, double dOutMax=1.0)
	Normalizes the specified column. More...

GMatrix &	operator= (const GMatrix &orig)
	Make *this into a copy of orig. More...

bool	operator== (const GMatrix &other) const
	Returns true iff all the entries in this and other* are identical and their relations are compatible, and they are the same size. More...

GVec &	operator[] (size_t index)
	Returns a pointer to the specified row. More...

const GVec &	operator[] (size_t index) const
	Returns a const pointer to the specified row. More...

void	pairedTTest (size_t pOutV, double pOutT, size_t attr1, size_t attr2, bool normalize) const
	Performs a paired T-Test with data from the two specified attributes. More...

void	parseArff (const char *szFile, size_t nLen)
	Parses an ARFF file and replaces the contents of this matrix with it. More...

void	parseArff (GArffTokenizer &tok)
	Parses an ARFF file and replaces the contents of this matrix with it. More...

void	principalComponent (GVec &outVector, const GVec &centroid, GRand *pRand) const
	This is an efficient algorithm for iteratively computing the principal component vector (the eigenvector of the covariance matrix) of the data. More...

void	principalComponentAboutOrigin (GVec &outVector, GRand *pRand) const
	Computes the first principal component assuming the mean is already subtracted out of the data. More...

void	principalComponentIgnoreUnknowns (GVec &outVector, const GVec &centroid, GRand *pRand) const
	Computes principal components, while ignoring missing values. More...

void	print (std::ostream &stream=std::cout, char separator= ',') const
	Prints this matrix in ARFF format to the specified stream. More...

GMatrix *	pseudoInverse ()
	Computes the Moore-Penrose pseudoinverse of this matrix (using the SVD method). You are responsible to delete the matrix this returns. More...

const GRelation &	relation () const
	Returns a const pointer to the relation object, which holds meta-data about the attributes (columns) More...

void	releaseAllRows ()
	Abandons (leaks) all the rows in this matrix. More...

GVec *	releaseRow (size_t index)
	Swaps the specified row with the last row, and then releases it from the dataset. More...

GVec *	releaseRowPreserveOrder (size_t index)
	Releases the specified row from the dataset and shifts everything after it up one slot. More...

void	removeComponent (const GVec &centroid, const GVec &component)
	Removes the component specified by pComponent from the data. (pComponent should already be normalized.) More...

void	removeComponentAboutOrigin (const GVec &component)
	Removes the specified component assuming the mean is zero. More...

void	replaceMissingValuesRandomly (size_t nAttr, GRand *pRand)
	Replaces all missing values by copying a randomly selected non-missing value in the same attribute. More...

void	replaceMissingValuesWithBaseline (size_t nAttr)
	Replace missing values with the appropriate measure of central tendency. More...

void	reserve (size_t n)
	Allocates space for the specified number of patterns (to avoid superfluous resizing) More...

void	resize (size_t rows, size_t cols)
	Resizes this matrix. Assigns all columns to be continuous, and replaces all element values with garbage. More...

void	reverseRows ()
	Reverses the row order. More...

GVec &	row (size_t index)
	Returns a pointer to the specified row. More...

const GVec &	row (size_t index) const
	Returns a const pointer to the specified row. More...

size_t	rows () const
	Returns the number of rows in the dataset. More...

void	saveArff (const char *szFilename)
	Saves the dataset to a file in ARFF format. More...

void	saveRaw (const char *szFilename)
	Saves the dataset to a file in raw (binary) format. More...

void	scaleColumn (size_t col, double scalar)
	Scales the column by the specified scalar. More...

GDomNode *	serialize (GDom *pDoc) const
	Marshalls this object to a DOM, which may be saved to a variety of serial formats. More...

void	setCol (size_t index, const double *pVector)
	Copies pVector over the specified column. More...

void	setRelation (GRelation *pRelation)
	Sets the relation for this dataset, which specifies the number of columns, and their data types. If there are one or more rows in this matrix, and the new relation does not have the same number of columns as the old relation, then this will throw an exception. Takes ownership of pRelation. That is, the destructor will delete it. More...

void	shuffle (GRand &rand, GMatrix *pExtension=NULL)
	Randomizes the order of the rows. More...

void	shuffle2 (GRand &rand, GMatrix &other)
	Shuffles the order of the rows. Also shuffles the rows in "other" in the same way, such that corresponding rows are preserved. More...

void	shuffleLikeCards ()
	This is an inferior way to shuffle the data. More...

void	singularValueDecomposition (GMatrix ppU, double ppDiag, GMatrix **ppV, bool throwIfNoConverge=false, size_t maxIters=80)
	Performs SVD on A, where A is this m-by-n matrix. More...

void	sort (size_t nDimension)
	Sorts the data from smallest to largest in the specified dimension. More...

template<typename CompareFunc >
void	sort (CompareFunc &pComparator)
	Sorts rows according to the specified compare function. (Return true to indicate that the first row comes before the second row.) More...

void	sortPartial (size_t row, size_t col)
	This partially sorts the specified column, such that the specified row will contain the same row as if it were fully sorted, and previous rows will contain a value <= to it in that column, and later rows will contain a value >= to it in that column. Unlike sort, which has O(m*log(m)) complexity, this method has O(m) complexity. This might be useful, for example, for efficiently finding the row with a median value in some attribute, or for separating data by a threshold in some value. More...

void	splitByPivot (GMatrix pGreaterOrEqual, size_t nAttribute, double dPivot, GMatrix pExtensionA=NULL, GMatrix *pExtensionB=NULL)
	Splits this set of data into two sets. Values greater-than-or-equal-to dPivot stay in this data set. Values less than dPivot go into pLessThanPivot. More...

void	splitBySize (GMatrix &other, size_t nOtherRows)
	Removes the last nOtherRows rows from this data set and puts them in "other". (Order is preserved.) More...

void	splitCategoricalKeepIfEqual (GMatrix pOtherValues, size_t nAttr, int nValue, GMatrix pExtensionA=NULL, GMatrix *pExtensionB=NULL)
	Moves all rows with the specified value in the specified attribute into pOtherValues. More...

void	splitCategoricalKeepIfNotEqual (GMatrix pSingleClass, size_t nAttr, int nValue, GMatrix pExtensionA=NULL, GMatrix *pExtensionB=NULL)
	Moves all rows with the specified value in the specified attribute into pSingleClass. More...

void	subtract (const GMatrix *pThat, bool transpose)
	Matrix subtract. Subtracts the values in pThat from this. More...

double	sumSquaredDifference (const GMatrix &that, bool transpose=false) const
	Computes the squared distance between this and that. More...

double	sumSquaredDiffWithIdentity ()
	Returns the sum squared difference between this matrix and an identity matrix. More...

double	sumSquaredDistance (const GVec &point) const
	Computes the sum-squared distance between pPoint and all of the points in the dataset. More...

void	swapColumns (size_t nAttr1, size_t nAttr2)
	Swaps two columns. More...

GVec *	swapRow (size_t i, GVec *pNewRow)
	Swap pNewRow in for row i, and return row i. The caller is then responsible to delete the row that is returned. More...

void	swapRows (size_t a, size_t b)
	Swaps the two specified rows. More...

void	takeRow (GVec *pRow, size_t pos=(size_t)-1)
	Adds an already-allocated row to this dataset. If pos is specified, the new row will be inserted and the speicified position. More...

size_t	toReducedRowEchelonForm ()
	Converts the matrix to reduced row echelon form. More...

void	toVector (double *pVector) const
	Copies all the data from this dataset into pVector. More...

double	trace ()
	Returns the sum of the diagonal elements. More...

GMatrix *	transpose ()
	Returns a pointer to a new dataset that is this dataset transposed. (All columns in the returned dataset will be continuous.) More...

void	weightedPrincipalComponent (GVec &outVector, const GVec &centroid, const double pWeights, GRand pRand) const
	Computes the first principal component of the data with each row weighted according to the vector pWeights. (pWeights must have an element for each row.) More...

void	wilcoxonSignedRanksTest (size_t attr1, size_t attr2, double tolerance, int pNum, double pWMinus, double *pWPlus) const
	Performs the Wilcoxon signed ranks test from the two specified attributes. More...

Static Public Member Functions
static GMatrix *	align (GMatrix pA, GMatrix pB)
	This uses the Kabsch algorithm to rotate and translate pB in order to minimize RMS with pA. (pA and pB must have the same number of rows and columns.) More...

static GSimpleAssignment	bipartiteMatching (GMatrix &a, GMatrix &b, GDistanceMetric &metric)
	Projects pPoint onto this hyperplane (where each row defines one of the orthonormal basis vectors of this hyperplane) More...

static GMatrix *	kabsch (GMatrix pA, GMatrix pB)
	This computes K=kabsch(A,B), such that K is an n-by-n matrix, where n is pA->cols(). K is the optimal orthonormal rotation matrix to align A and B, such that A(K^T) minimizes sum-squared error with B, and BK minimizes sum-squared error with A. (This rotates around the origin, so typically you will want to subtract the centroid from both pA and pB before calling this.) More...

static GMatrix *	mergeHoriz (const GMatrix pSetA, const GMatrix pSetB)
	Merges two datasets side-by-side. The resulting dataset will contain the attributes of both datasets. Both pSetA and pSetB (and the resulting dataset) must have the same number of rows. More...

static GMatrix *	multiply (const GMatrix &a, const GMatrix &b, bool transposeA, bool transposeB)
	Matrix multiply. More...

static double	normalizeValue (double dVal, double dInMin, double dInMax, double dOutMin=0.0, double dOutMax=1.0)
	Normalize a value from the input min and max to the output min and max. More...

static void	test ()
	Performs unit tests for this class. Throws an exception if there is a failure. More...

Protected Member Functions
double	determinantHelper (size_t nEndRow, size_t *pColumnList)

void	inPlaceSquareTranspose ()

void	singularValueDecompositionHelper (GMatrix ppU, double ppDiag, GMatrix **ppV, bool throwIfNoConverge, size_t maxIters)

Protected Attributes
GRelation *	m_pRelation

std::vector< GVec * >	m_rows

Constructor & Destructor Documentation

GClasses::GMatrix::GMatrix ( )

Makes an empty 0x0 matrix.

GClasses::GMatrix::GMatrix	(	size_t	rows,
		size_t	cols
	)

Construct a rows x cols matrix with all elements of the matrix assumed to be continuous.

It is okay to initially set rows to 0 and later call newRow to add each row. Adding columns later, however, is not very computationally efficient.)

GClasses::GMatrix::GMatrix ( std::vector< size_t > & attrValues )

Construct a matrix with a mixed relation. That is, one with some continuous attributes (columns), and some nominal attributes (columns).

attrValues specifies the number of nominal values suppored in each attribute (column), or 0 for a continuous attribute.

Initially, this matrix will have 0 rows, but you can add more rows by calling newRow or newRows.

GClasses::GMatrix::GMatrix ( GRelation * pRelation )

Create an empty matrix whose attributes/column types are specified by pRelation.

Takes ownership of pRelation. That is, the destructor will delete pRelation.

Initially, this matrix will have 0 rows, but you can add more rows by calling newRow or newRows.

GClasses::GMatrix::GMatrix	(	const GMatrix &	orig,
		size_t	rowStart = `0`,
		size_t	colStart = `0`,
		size_t	rowCount = `(size_t)-1`,
		size_t	colCount = `(size_t)-1`
	)

Copy-constructor.

Copies orig, making a new relation object and new storage for the rows (with the same content).

Parameters

orig	the GMatrix object to copy

GClasses::GMatrix::GMatrix ( const GDomNode * pNode )

Load from a DOM.

GClasses::GMatrix::~GMatrix ( )

Member Function Documentation

void GClasses::GMatrix::add	(	const GMatrix *	pThat,
		bool	transpose = `false`,
		double	scalar = `1.0`
	)

Matrix add.

Adds scalar * pThat to this. (If transpose is true, adds scalar * the transpose of pThat to this.) Both datasets must have the same dimensions. Behavior is undefined for nominal columns.

static GMatrix* GClasses::GMatrix::align	(	GMatrix *	pA,
		GMatrix *	pB
	)

static

This uses the Kabsch algorithm to rotate and translate pB in order to minimize RMS with pA. (pA and pB must have the same number of rows and columns.)

GVec& GClasses::GMatrix::back ( size_t reverse_index = 0 )

inline

Returns a pointer to a row indexed from the back of the matrix. index 0 (default) is the last row, index 1 is the second-to-last row, etc.

const GVec& GClasses::GMatrix::back ( size_t reverse_index = 0 ) const

inline

double GClasses::GMatrix::baselineValue ( size_t nAttribute ) const

Returns the mean if the specified attribute is continuous, otherwise returns the most common nominal value in the attribute.

static GSimpleAssignment GClasses::GMatrix::bipartiteMatching	(	GMatrix &	a,
		GMatrix &	b,
		GDistanceMetric &	metric
	)

static

Projects pPoint onto this hyperplane (where each row defines one of the orthonormal basis vectors of this hyperplane)

This computes (A^T)Ap, where A is this matrix, and p is pPoint. Projects pPoint onto this hyperplane (where each row defines one of the orthonormal basis vectors of this hyperplane) Performs a bipartite matching between the rows of a and b using the Linear Assignment Problem (LAP) optimizer

Treats the rows of the matrices a and b as vectors and calculates the distances between these vectors using cost. Returns an optimal assignment from rows of a to rows of b that minimizes sum of the costs of the assignments.

Each row is considered to be a vector in multidimensional space. The cost is the distance given by cost when called on each row of a and row of b in turn. The cost must not be $-\infty$ for any pair of rows. Other than that, there are no limitations on the cost function.

Because of the limitations of GDistanceMetric, a and b must have the same number of columns.

If $m$ is $\max(rows(a), rows(b))$ then this routine requires $\Theta(rows(a) \cdot rows(b))$ memory and $O(m^3)$ time.

Parameters

a	the matrix containing the vectors of set a. Must have the same number of columns as the matrix containing the vectors of set b. Each row is considered to be a vector in multidimensional space.
b	the matrix containing the vectors of set b. Must have the same number of columns as the matrix containing the vectors of set a. Each row is considered to be a vector in multidimensional space.
metric	given a row of a and a row of b, returns the cost of assigning a to b.

Returns: the optimal assignment in which each of the rows of a or b (whichever has fewer rows) is assigned to a row of the other matrix

double GClasses::GMatrix::boundingSphere	(	GVec &	outCenter,
		size_t *	pIndexes,
		size_t	indexCount,
		GDistanceMetric *	pMetric
	)		const

Finds a sphere that tightly bounds all the points in the specified vector of row-indexes.

Returns the squared radius of the sphere, and stores its center in pOutCenter.

void GClasses::GMatrix::centerMeanAtOrigin ( )

Shifts the data such that the mean occurs at the origin. Only continuous values are affected. Nominal values are left unchanged.

void GClasses::GMatrix::centroid	(	GVec &	outCentroid,
		const double *	pWeights = `NULL`
	)		const

Computes the arithmetic means of all attributes If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix.

GMatrix* GClasses::GMatrix::cholesky ( bool tolerant = false )

This computes the square root of this matrix. (If you take the matrix that this returns and multiply it by its transpose, you should get the original dataset again.) (Returns a lower-triangular matrix.)

Behavior is undefined if there are nominal attributes. If tolerant is true, it will return even if it cannot compute accurate results. If tolerant is false (the default) and this matrix is not positive definite, it will throw an exception.

void GClasses::GMatrix::clipColumn	(	size_t	col,
		double	dMin,
		double	dMax
	)

Clips the values in the specified column to fall beween dMin and dMax (inclusively).

void GClasses::GMatrix::col	(	size_t	index,
		double *	pOutVector
	)

Copies the specified column into pOutVector.

size_t GClasses::GMatrix::cols ( ) const

inline

Returns the number of columns in the dataset.

double GClasses::GMatrix::columnMax ( size_t nAttribute ) const

Returns the maximum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns -1e300 if there are no known values in the column.

double GClasses::GMatrix::columnMean	(	size_t	nAttribute,
		const double *	pWeights = `NULL`,
		bool	throwIfEmpty = `true`
	)		const

Computes the arithmetic mean of the values in the specified column If pWeights is NULL, then each row is given equal weight. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix. If there are no values in this column with any weight, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE.

double GClasses::GMatrix::columnMedian	(	size_t	nAttribute,
		bool	throwIfEmpty = `true`
	)		const

Computes the median of the values in the specified column If there are no values in this column, then it will throw an exception if throwIfEmpty is true, or else return UNKNOWN_REAL_VALUE.

double GClasses::GMatrix::columnMin ( size_t nAttribute ) const

Returns the minimum value in the specified column (not counting UNKNOWN_REAL_VALUE). Returns 1e300 if there are no known values in the column.

double GClasses::GMatrix::columnSquaredMagnitude ( size_t col ) const

Returns the squared magnitude of the vector in the specified column.

double GClasses::GMatrix::columnSum ( size_t col ) const

Returns the sum of the values in the specified column.

double GClasses::GMatrix::columnSumSquaredDifference	(	const GMatrix &	that,
		size_t	col,
		double *	pOutSAE = `NULL`
	)		const

Computes the sum-squared distance between the specified column of this and that. If the column is a nominal attribute, then Hamming distance is used. if pOutSAE is not NULL, the sum absolute error will be placed there.

double GClasses::GMatrix::columnVariance	(	size_t	nAttr,
		double	mean
	)		const

Computes the sample variance of a single attribute.

void GClasses::GMatrix::copy	(	const GMatrix &	that,
		size_t	rowStart = `0`,
		size_t	colStart = `0`,
		size_t	rowCount = `(size_t)-1`,
		size_t	colCount = `(size_t)-1`
	)

Copies (deep) all the data and metadata from pThat.

void GClasses::GMatrix::copyBlock	(	const GMatrix &	source,
		size_t	srcRow = `0`,
		size_t	srcCol = `0`,
		size_t	hgt = `INVALID_INDEX`,
		size_t	wid = `INVALID_INDEX`,
		size_t	destRow = `0`,
		size_t	destCol = `0`,
		bool	checkMetaData = `true`
	)

Copies values from a rectangular region of the source matrix into this matrix. The wid and hgt values are clipped if they exceed the size of the source matrix. An exception is thrown if the destination is not big enough to hold the values at the specified location. If checkMetaData is true, then this will throw an exception if the data types are incompatible.

void GClasses::GMatrix::copyCols	(	const GMatrix &	that,
		size_t	firstCol,
		size_t	colCount
	)

Copies the specified range of columns (including meta-data) from that matrix into this matrix, replacing all data currently in this matrix.

size_t GClasses::GMatrix::countPrincipalComponents	(	double	d,
		GRand *	pRand
	)		const

Computes the minimum number of principal components necessary so that less than the specified portion of the deviation in the data is unaccounted for.

For example, if the data projected onto the first 3 principal components contains 90 percent of the deviation that the original data contains, then if you pass the value 0.1 to this method, it will return 3.

size_t GClasses::GMatrix::countUniqueValues	(	size_t	col,
		size_t	maxCount = `(size_t)-1`
	)		const

Counts the number of unique values in the specified column. If maxCount unique values are found, it immediately returns maxCount.

size_t GClasses::GMatrix::countValue	(	size_t	attribute,
		double	value
	)		const

Returns the number of ocurrences of the specified value in the specified attribute.

double GClasses::GMatrix::covariance	(	size_t	nAttr1,
		double	dMean1,
		size_t	nAttr2,
		double	dMean2,
		const double *	pWeights = `NULL`
	)		const

Computes the covariance between two attributes. If pWeights is NULL, each row is given a weight of 1. If pWeights is non-NULL, then it is assumed to be a vector of weights, one for each row in this matrix.

GMatrix* GClasses::GMatrix::covarianceMatrix ( ) const

Computes the covariance matrix of the data.

void GClasses::GMatrix::deleteColumns	(	size_t	index,
		size_t	count
	)

Deletes some columns. This does not reallocate the rows, but it does shift the elements, which is a slow operation, especially if there are many columns that follow those being deleted.

void GClasses::GMatrix::deleteRow ( size_t index )

Swaps the specified row with the last row, and then deletes it.

void GClasses::GMatrix::deleteRowPreserveOrder ( size_t index )

Deletes the specified row and shifts everything after it up one slot.

double GClasses::GMatrix::determinant ( )

Computes the determinant of this matrix.

double GClasses::GMatrix::determinantHelper	(	size_t	nEndRow,
		size_t *	pColumnList
	)

protected

double GClasses::GMatrix::dihedralCorrelation	(	const GMatrix *	pThat,
		GRand *	pRand
	)		const

Computes the cosine of the dihedral angle between this subspace and pThat subspace.

bool GClasses::GMatrix::doesHaveAnyMissingValues ( ) const

Returns true iff this matrix is missing any values.

void GClasses::GMatrix::dropValue	(	size_t	attr,
		int	val
	)

Drops any occurrences of the specified value, and removes it as a possible value.

double GClasses::GMatrix::eigenValue ( const GVec & eigenVector )

Computes the eigenvalue that corresponds to the specified eigenvector of this matrix.

double GClasses::GMatrix::eigenValue	(	const double *	pMean,
		const double *	pEigenVector,
		GRand *	pRand
	)		const

Computes the eigenvalue that corresponds to *pEigenvector.

After you compute the principal component, you can call this to obtain the eigenvalue that corresponds to that principal component vector (eigenvector).

void GClasses::GMatrix::eigenVector	(	double	eigenvalue,
		GVec &	outVector
	)

Computes the eigenvector that corresponds to the specified eigenvalue of this matrix. Note that this method trashes this matrix, so make a copy first if you care.

GMatrix* GClasses::GMatrix::eigs	(	size_t	nCount,
		GVec &	eigenVals,
		GRand *	pRand,
		bool	mostSignificant
	)

Computes nCount eigenvectors and the corresponding eigenvalues using the power method (which is only accurate if a small number of eigenvalues/vectors are needed.)

If mostSignificant is true, the largest eigenvalues are found. If mostSignificant is false, the smallest eigenvalues are found.

void GClasses::GMatrix::ensureDataHasNoMissingNominals ( ) const

Throws an exception if this data contains any missing values in a nominal attribute.

void GClasses::GMatrix::ensureDataHasNoMissingReals ( ) const

Throws an exception if this data contains any missing values in a continuous attribute.

double GClasses::GMatrix::entropy ( size_t nColumn ) const

Measures the entropy of the specified attribute.

void GClasses::GMatrix::fill	(	double	val,
		size_t	colStart = `0`,
		size_t	colCount = `INVALID_INDEX`
	)

Fills all elements in the specified range of columns with the specified value. If no column ranges are specified, the default is to set all of them.

void GClasses::GMatrix::fillNormal	(	GRand &	rand,
		double	deviation = `1.0`
	)

Fills all elements with random values from a Normal distribution.

void GClasses::GMatrix::fillUniform	(	GRand &	rand,
		double	min = `0.0`,
		double	max = `1.0`
	)

Fills all elements with random values from a uniform distribution.

void GClasses::GMatrix::fixNans ( )

Replaces any occurrences of NAN in the matrix with the corresponding values from an identity matrix.

void GClasses::GMatrix::flush ( )

Deletes all the rows in this matrix.

void GClasses::GMatrix::fromVector	(	const double *	pVector,
		size_t	nRows
	)

Copies the data from pVector over this dataset.

nRows specifies the number of rows of data in pVector.

GVec& GClasses::GMatrix::front ( )

inline

Returns a pointer to the first row.

const GVec& GClasses::GMatrix::front ( ) const

inline

bool GClasses::GMatrix::gaussianElimination ( double * pVector )

Computes y in the equation M*y=x (or y=M^(-1)x), where M is this dataset, which must be a square matrix, and x is pVector as passed in, and y is pVector after the call.

If there are multiple solutions, it finds the one for which all the variables in the null-space have a value of 1. If there are no solutions, it returns false. Note that this method trashes this dataset (so make a copy first if you care).

void GClasses::GMatrix::inPlaceSquareTranspose ( )

protected

bool GClasses::GMatrix::isAttrHomogenous ( size_t col ) const

Returns true iff the specified attribute contains homogenous values. (Unknowns are counted as homogenous with anything)

bool GClasses::GMatrix::isHomogenous ( ) const

Returns true iff each of the last labelDims columns in the data are homogenous.

static GMatrix* GClasses::GMatrix::kabsch	(	GMatrix *	pA,
		GMatrix *	pB
	)

static

This computes K=kabsch(A,B), such that K is an n-by-n matrix, where n is pA->cols(). K is the optimal orthonormal rotation matrix to align A and B, such that A(K^T) minimizes sum-squared error with B, and BK minimizes sum-squared error with A. (This rotates around the origin, so typically you will want to subtract the centroid from both pA and pB before calling this.)

bool GClasses::GMatrix::leastCorrelatedVector	(	GVec &	out,
		const GMatrix *	pThat,
		GRand *	pRand
	)		const

Computes the vector in this subspace that has the greatest distance from its projection into pThat subspace.

Returns true if the results are computed.

Returns false if the subspaces are so nearly parallel that pOut cannot be computed with accuracy.

double GClasses::GMatrix::linearCorrelationCoefficient	(	size_t	attr1,
		double	attr1Origin,
		size_t	attr2,
		double	attr2Origin
	)		const

Computes the linear coefficient between the two specified attributes.

Usually you will want to pass the mean values for attr1Origin and attr2Origin.

void GClasses::GMatrix::load ( const char * szFilename )

Loads a file and automatically detects ARFF or raw (binary)

void GClasses::GMatrix::loadArff ( const char * szFilename )

Loads an ARFF file and replaces the contents of this matrix with it.

void GClasses::GMatrix::loadRaw ( const char * szFilename )

Loads a raw (binary) file and replaces the contents of this matrix with it.

void GClasses::GMatrix::LUDecomposition ( )

Performs an in-place LU-decomposition, such that the lower triangle of this matrix (including the diagonal) specifies L, and the uppoer triangle of this matrix (not including the diagonal) specifies U, and all values of U along the diagonal are ones. (The upper triangle of L and the lower triangle of U are all zeros.)

void GClasses::GMatrix::makeIdentity ( )

Sets this dataset to an identity matrix. (It doesn't change the number of columns or rows. It just stomps over existing values.)

double GClasses::GMatrix::measureInfo ( ) const

Computes the sum entropy of the data (or the sum variance for continuous attributes)

static GMatrix* GClasses::GMatrix::mergeHoriz	(	const GMatrix *	pSetA,
		const GMatrix *	pSetB
	)

static

Merges two datasets side-by-side. The resulting dataset will contain the attributes of both datasets. Both pSetA and pSetB (and the resulting dataset) must have the same number of rows.

void GClasses::GMatrix::mergeVert	(	GMatrix *	pData,
		bool	ignoreMismatchingName = `false`
	)

Steals all the rows from pData and adds them to this set. (You still have to delete pData.) Both datasets must have the same number of columns.

void GClasses::GMatrix::mirrorTriangle ( bool upperToLower )

copies one of the triangular submatrices over the other, making a symmetric matrix.

Parameters

upperToLower If true, copies the upper triangle of this matrix over the lower triangle. Otherwise, copies the lower triangle of this matrix over the upper triangle

void GClasses::GMatrix::multiply ( double scalar )

Multiplies every element in the dataset by scalar. Behavior is undefined for nominal columns.

void GClasses::GMatrix::multiply	(	const GVec &	vectorIn,
		GVec &	vectorOut,
		bool	transpose = `false`
	)		const

Multiplies this matrix by the column vector pVectorIn to get pVectorOut.

(If transpose is true, then it multiplies the transpose of this matrix by pVectorIn to get pVectorOut.)

pVectorIn should have the same number of elements as columns (or rows if transpose is true)

pVectorOut should have the same number of elements as rows (or cols, if transpose is true.)

Note: if transpose is true, then pVectorIn is treated as a row vector and is multiplied by this matrix to get pVectorOut.

static GMatrix* GClasses::GMatrix::multiply	(	const GMatrix &	a,
		const GMatrix &	b,
		bool	transposeA,
		bool	transposeB
	)

static

Matrix multiply.

For convenience, you can also specify that neither, one, or both of the inputs are virtually transposed prior to the multiplication. (If you want the results to come out transposed, you can use the equality (AB)^T=(B^T)(A^T) to figure out how to specify the parameters.)

void GClasses::GMatrix::newColumns ( size_t n )

Adds 'n' new columns to the matrix. (This resizes every row and copies all the existing data, which is rather inefficient.) The values in the new columns are not initialized.

GVec& GClasses::GMatrix::newRow ( )

Adds a new row to the matrix. (The values in the row are not initialized.) Returns a reference to the new row.

void GClasses::GMatrix::newRows ( size_t nRows )

Adds "nRows" uninitialized rows to this matrix.

void GClasses::GMatrix::normalizeColumn	(	size_t	col,
		double	dInMin,
		double	dInMax,
		double	dOutMin = `0.0`,
		double	dOutMax = `1.0`
	)

Normalizes the specified column.

static double GClasses::GMatrix::normalizeValue	(	double	dVal,
		double	dInMin,
		double	dInMax,
		double	dOutMin = `0.0`,
		double	dOutMax = `1.0`
	)

static

Normalize a value from the input min and max to the output min and max.

GMatrix& GClasses::GMatrix::operator= ( const GMatrix & orig )

Make *this into a copy of orig.

Copies orig, making a new relation object and new storage for the rows (with the same content).

Parameters

orig	the GMatrix object to copy

Returns: a reference to this GMatrix object

bool GClasses::GMatrix::operator== ( const GMatrix & other ) const

Returns true iff all the entries in *this and other are identical and their relations are compatible, and they are the same size.

Returns: true iff all the entries in *this and other are identical, their relations are compatible, and they are the same size

GVec& GClasses::GMatrix::operator[] ( size_t index )

inline

Returns a pointer to the specified row.

const GVec& GClasses::GMatrix::operator[] ( size_t index ) const

inline

Returns a const pointer to the specified row.

void GClasses::GMatrix::pairedTTest	(	size_t *	pOutV,
		double *	pOutT,
		size_t	attr1,
		size_t	attr2,
		bool	normalize
	)		const

Performs a paired T-Test with data from the two specified attributes.

pOutV will hold the degrees of freedom. pOutT will hold the T-value. You can use GMath::tTestAlphaValue to convert these to a P-value.

void GClasses::GMatrix::parseArff	(	const char *	szFile,
		size_t	nLen
	)

Parses an ARFF file and replaces the contents of this matrix with it.

void GClasses::GMatrix::parseArff ( GArffTokenizer & tok )

Parses an ARFF file and replaces the contents of this matrix with it.

void GClasses::GMatrix::principalComponent	(	GVec &	outVector,
		const GVec &	centroid,
		GRand *	pRand
	)		const

This is an efficient algorithm for iteratively computing the principal component vector (the eigenvector of the covariance matrix) of the data.

See "EM Algorithms for PCA and SPCA" by Sam Roweis, 1998 NIPS.

The size of pOutVector will be the number of columns in this matrix. (To compute the next principal component, call RemoveComponent, then call this method again.)

void GClasses::GMatrix::principalComponentAboutOrigin	(	GVec &	outVector,
		GRand *	pRand
	)		const

Computes the first principal component assuming the mean is already subtracted out of the data.

void GClasses::GMatrix::principalComponentIgnoreUnknowns	(	GVec &	outVector,
		const GVec &	centroid,
		GRand *	pRand
	)		const

Computes principal components, while ignoring missing values.

void GClasses::GMatrix::print	(	std::ostream &	stream = `std::cout`,
		char	separator = `','`
	)		const

Prints this matrix in ARFF format to the specified stream.

GMatrix* GClasses::GMatrix::pseudoInverse ( )

Computes the Moore-Penrose pseudoinverse of this matrix (using the SVD method). You are responsible to delete the matrix this returns.

const GRelation& GClasses::GMatrix::relation ( ) const

inline

Returns a const pointer to the relation object, which holds meta-data about the attributes (columns)

void GClasses::GMatrix::releaseAllRows ( )

Abandons (leaks) all the rows in this matrix.

GVec* GClasses::GMatrix::releaseRow ( size_t index )

Swaps the specified row with the last row, and then releases it from the dataset.

The caller is responsible to delete the row (array of doubles) this method returns.

GVec* GClasses::GMatrix::releaseRowPreserveOrder ( size_t index )

Releases the specified row from the dataset and shifts everything after it up one slot.

The caller is responsible to delete the row this method returns.

void GClasses::GMatrix::removeComponent	(	const GVec &	centroid,
		const GVec &	component
	)

Removes the component specified by pComponent from the data. (pComponent should already be normalized.)

This might be useful, for example, to remove the first principal component from the data so you can then proceed to compute the second principal component, and so forth.

void GClasses::GMatrix::removeComponentAboutOrigin ( const GVec & component )

Removes the specified component assuming the mean is zero.

void GClasses::GMatrix::replaceMissingValuesRandomly	(	size_t	nAttr,
		GRand *	pRand
	)

Replaces all missing values by copying a randomly selected non-missing value in the same attribute.

void GClasses::GMatrix::replaceMissingValuesWithBaseline ( size_t nAttr )

Replace missing values with the appropriate measure of central tendency.

If the specified attribute is continuous, replaces all missing values in that attribute with the mean. If the specified attribute is nominal, replaces all missing values in that attribute with the most common value.

void GClasses::GMatrix::reserve ( size_t n )

inline

Allocates space for the specified number of patterns (to avoid superfluous resizing)

void GClasses::GMatrix::resize	(	size_t	rows,
		size_t	cols
	)

Resizes this matrix. Assigns all columns to be continuous, and replaces all element values with garbage.

void GClasses::GMatrix::reverseRows ( )

Reverses the row order.

GVec& GClasses::GMatrix::row ( size_t index )

inline

Returns a pointer to the specified row.

const GVec& GClasses::GMatrix::row ( size_t index ) const

inline

Returns a const pointer to the specified row.

size_t GClasses::GMatrix::rows ( ) const

inline

Returns the number of rows in the dataset.

void GClasses::GMatrix::saveArff ( const char * szFilename )

Saves the dataset to a file in ARFF format.

void GClasses::GMatrix::saveRaw ( const char * szFilename )

Saves the dataset to a file in raw (binary) format.

void GClasses::GMatrix::scaleColumn	(	size_t	col,
		double	scalar
	)

Scales the column by the specified scalar.

GDomNode* GClasses::GMatrix::serialize ( GDom * pDoc ) const

Marshalls this object to a DOM, which may be saved to a variety of serial formats.

void GClasses::GMatrix::setCol	(	size_t	index,
		const double *	pVector
	)

Copies pVector over the specified column.

void GClasses::GMatrix::setRelation ( GRelation * pRelation )

Sets the relation for this dataset, which specifies the number of columns, and their data types. If there are one or more rows in this matrix, and the new relation does not have the same number of columns as the old relation, then this will throw an exception. Takes ownership of pRelation. That is, the destructor will delete it.

void GClasses::GMatrix::shuffle	(	GRand &	rand,
		GMatrix *	pExtension = `NULL`
	)

Randomizes the order of the rows.

If pExtension is non-NULL, then it will also be shuffled such that corresponding rows are preserved.

void GClasses::GMatrix::shuffle2	(	GRand &	rand,
		GMatrix &	other
	)

Shuffles the order of the rows. Also shuffles the rows in "other" in the same way, such that corresponding rows are preserved.

void GClasses::GMatrix::shuffleLikeCards ( )

This is an inferior way to shuffle the data.

void GClasses::GMatrix::singularValueDecomposition	(	GMatrix **	ppU,
		double **	ppDiag,
		GMatrix **	ppV,
		bool	throwIfNoConverge = `false`,
		size_t	maxIters = `80`
	)

Performs SVD on A, where A is this m-by-n matrix.

You are responsible to delete(*ppU), delete(*ppV), and delete[] *ppDiag.

Parameters

ppU	ppU will be set to an m-by-m matrix where the columns are the eigenvectors of A(A^T).
ppDiag	*ppDiag will be set to an array of n doubles holding the square roots of the corresponding eigenvalues.
ppV	*ppV will be set to an n-by-n matrix where the rows are the eigenvectors of (A^T)A.
throwIfNoConverge	if true, throws an exception if the SVD solver does not converge. does nothing otherwise
maxIters	the maximum number of iterations to perform in the SVD solver

void GClasses::GMatrix::singularValueDecompositionHelper	(	GMatrix **	ppU,
		double **	ppDiag,
		GMatrix **	ppV,
		bool	throwIfNoConverge,
		size_t	maxIters
	)

protected

void GClasses::GMatrix::sort ( size_t nDimension )

Sorts the data from smallest to largest in the specified dimension.

template<typename CompareFunc >

void GClasses::GMatrix::sort ( CompareFunc & pComparator )

inline

Sorts rows according to the specified compare function. (Return true to indicate that the first row comes before the second row.)

void GClasses::GMatrix::sortPartial	(	size_t	row,
		size_t	col
	)

This partially sorts the specified column, such that the specified row will contain the same row as if it were fully sorted, and previous rows will contain a value <= to it in that column, and later rows will contain a value >= to it in that column. Unlike sort, which has O(m*log(m)) complexity, this method has O(m) complexity. This might be useful, for example, for efficiently finding the row with a median value in some attribute, or for separating data by a threshold in some value.

void GClasses::GMatrix::splitByPivot	(	GMatrix *	pGreaterOrEqual,
		size_t	nAttribute,
		double	dPivot,
		GMatrix *	pExtensionA = `NULL`,
		GMatrix *	pExtensionB = `NULL`
	)

Splits this set of data into two sets. Values greater-than-or-equal-to dPivot stay in this data set. Values less than dPivot go into pLessThanPivot.

If pExtensionA is non-NULL, then it will also split pExtensionA such that corresponding rows are preserved.

void GClasses::GMatrix::splitBySize	(	GMatrix &	other,
		size_t	nOtherRows
	)

Removes the last nOtherRows rows from this data set and puts them in "other". (Order is preserved.)

void GClasses::GMatrix::splitCategoricalKeepIfEqual	(	GMatrix *	pOtherValues,
		size_t	nAttr,
		int	nValue,
		GMatrix *	pExtensionA = `NULL`,
		GMatrix *	pExtensionB = `NULL`
	)

Moves all rows with the specified value in the specified attribute into pOtherValues.

If pExtensionA is non-NULL, then it will also split pExtensionA such that corresponding rows are preserved.

void GClasses::GMatrix::splitCategoricalKeepIfNotEqual	(	GMatrix *	pSingleClass,
		size_t	nAttr,
		int	nValue,
		GMatrix *	pExtensionA = `NULL`,
		GMatrix *	pExtensionB = `NULL`
	)

Moves all rows with the specified value in the specified attribute into pSingleClass.

If pExtensionA is non-NULL, then it will also split pExtensionA such that corresponding rows are preserved.

void GClasses::GMatrix::subtract	(	const GMatrix *	pThat,
		bool	transpose
	)

Matrix subtract. Subtracts the values in *pThat from *this.

(If transpose is true, subtracts the transpose of *pThat from this.) Both datasets must have the same dimensions. Behavior is undefined for nominal columns.

Parameters

pThat	pointer to the matrix to subtract from *this
transpose	If true, the transpose of pThat is subtracted. If false, pThat is subtracted

double GClasses::GMatrix::sumSquaredDifference	(	const GMatrix &	that,
		bool	transpose = `false`
	)		const

Computes the squared distance between this and that.

If transpose is true, computes the difference between this and the transpose of that.

double GClasses::GMatrix::sumSquaredDiffWithIdentity ( )

Returns the sum squared difference between this matrix and an identity matrix.

double GClasses::GMatrix::sumSquaredDistance ( const GVec & point ) const

Computes the sum-squared distance between pPoint and all of the points in the dataset.

If pPoint is NULL, it computes the sum-squared distance with the origin.

Note: that this is equal to the sum of all the eigenvalues times the number of dimensions, so you can efficiently compute eigenvalues as the difference in sumSquaredDistance with the mean after removing the corresponding component, and then dividing by the number of dimensions. This is more efficient than calling eigenValue.

void GClasses::GMatrix::swapColumns	(	size_t	nAttr1,
		size_t	nAttr2
	)

Swaps two columns.

GVec* GClasses::GMatrix::swapRow	(	size_t	i,
		GVec *	pNewRow
	)

Swap pNewRow in for row i, and return row i. The caller is then responsible to delete the row that is returned.

void GClasses::GMatrix::swapRows	(	size_t	a,
		size_t	b
	)

Swaps the two specified rows.

void GClasses::GMatrix::takeRow	(	GVec *	pRow,
		size_t	pos = `(size_t)-1`
	)

Adds an already-allocated row to this dataset. If pos is specified, the new row will be inserted and the speicified position.

static void GClasses::GMatrix::test ( )

static

Performs unit tests for this class. Throws an exception if there is a failure.

size_t GClasses::GMatrix::toReducedRowEchelonForm ( )

Converts the matrix to reduced row echelon form.

void GClasses::GMatrix::toVector ( double * pVector ) const

Copies all the data from this dataset into pVector.

pVector must be big enough to hold rows() * cols() doubles.

double GClasses::GMatrix::trace ( )

Returns the sum of the diagonal elements.

GMatrix* GClasses::GMatrix::transpose ( )

Returns a pointer to a new dataset that is this dataset transposed. (All columns in the returned dataset will be continuous.)

The returned matrix must be deleted by the caller.

Returns: A pointer to a new dataset that is this dataset transposed. All columns in the returned dataset will be continuous. The caller is responsible for deleting the returned dataset.

void GClasses::GMatrix::weightedPrincipalComponent	(	GVec &	outVector,
		const GVec &	centroid,
		const double *	pWeights,
		GRand *	pRand
	)		const

Computes the first principal component of the data with each row weighted according to the vector pWeights. (pWeights must have an element for each row.)

void GClasses::GMatrix::wilcoxonSignedRanksTest	(	size_t	attr1,
		size_t	attr2,
		double	tolerance,
		int *	pNum,
		double *	pWMinus,
		double *	pWPlus
	)		const

Performs the Wilcoxon signed ranks test from the two specified attributes.

If two values are closer than tolerance, they are considered to be equal.

Member Data Documentation

GRelation* GClasses::GMatrix::m_pRelation

protected

std::vector<GVec*> GClasses::GMatrix::m_rows

protected

Detailed Description

Public Member Functions

Static Public Member Functions

Protected Member Functions

Protected Attributes

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation