Detailed Description

This class clusters the rows according to a sparse similarity metric, then uses the baseline vector in each cluster to make predictions.

#include <GRecommender.h>

Inheritance diagram for GClasses::GSparseClusterRecommender:

Public Member Functions
	GSparseClusterRecommender (size_t clusters)

virtual	~GSparseClusterRecommender ()

size_t	clusterCount ()
	Returns the number of clusters. More...

virtual void	impute (GVec &vec, size_t dims)
	See the comment for GCollaborativeFilter::impute. More...

virtual double	predict (size_t user, size_t item)
	See the comment for GCollaborativeFilter::predict. More...

virtual GDomNode *	serialize (GDom *pDoc) const
	See the comment for GCollaborativeFilter::serialize. More...

void	setClusterer (GSparseClusterer *pClusterer, bool own)
	Set the clustering algorithm to use. More...

virtual void	train (GMatrix &data)
	See the comment for GCollaborativeFilter::train. More...

Public Member Functions inherited from GClasses::GCollaborativeFilter
	GCollaborativeFilter ()

	GCollaborativeFilter (const GDomNode *pNode, GLearnerLoader &ll)

virtual	~GCollaborativeFilter ()

void	basicTest (double minMSE)
	Performs a basic unit test on this collaborative filter. More...

double	crossValidate (GMatrix &data, size_t folds, double *pOutMAE=NULL)
	This randomly assigns each rating to one of the folds. Then, for each fold, it calls train with a dataset that contains everything except for the ratings in that fold. It predicts values for the items in the fold, and returns the mean-squared difference between the predictions and the actual ratings. If pOutMAE is non-NULL, it will be set to the mean-absolute error. More...

GMatrix *	precisionRecall (GMatrix &data, bool ideal=false)
	This divides the data into two equal-size parts. It trains on one part, and then measures the precision/recall using the other part. It returns a three-column data set with recall scores in column 0 and corresponding precision scores in column 1. The false-positive rate is in column 2. (So, if you want a precision-recall plot, just drop column 2. If you want an ROC curve, drop column 1 and swap the remaining two columns.) This method assumes the ratings range from 0 to 1, so be sure to scale the ratings to fit that range before calling this method. If ideal is true, then it will ignore your model and report the ideal results as if your model always predicted the correct rating. (This is useful because it shows the best possible results.) More...

GRand &	rand ()
	Returns a reference to the pseudo-random number generator associated with this object. More...

double	trainAndTest (GMatrix &train, GMatrix &test, double *pOutMAE=NULL)
	This trains on the training set, and then tests on the test set. Returns the mean-squared difference between actual and target predictions. More...

void	trainDenseMatrix (const GMatrix &data, const GMatrix *pLabels=NULL)
	Train from an m-by-n dense matrix, where m is the number of users and n is the number of items. All attributes must be continuous. Missing values are indicated with UNKNOWN_REAL_VALUE. If pLabels is non-NULL, then the labels will be appended as additional items. More...

Static Public Member Functions
static void	test ()
	Performs unit tests. Throws if a failure occurs. Returns if successful. More...

Static Public Member Functions inherited from GClasses::GCollaborativeFilter
static double	areaUnderCurve (GMatrix &data)
	Pass in the data returned by the precisionRecall function (unmodified), and this will compute the area under the ROC curve. More...

Protected Attributes
size_t	m_clusters

size_t	m_items

bool	m_ownClusterer

GSparseClusterer *	m_pClusterer

GMatrix *	m_pPredictions

size_t	m_users

Protected Attributes inherited from GClasses::GCollaborativeFilter
GRand	m_rand

Additional Inherited Members
Protected Member Functions inherited from GClasses::GCollaborativeFilter
GDomNode *	baseDomNode (GDom pDoc, const char szClassName) const
	Child classes should use this in their implementation of serialize. More...