Detailed Description

A naive Bayes classifier.

#include <GNaiveBayes.h>

Inheritance diagram for GClasses::GNaiveBayes:

Public Member Functions
	GNaiveBayes ()

	GNaiveBayes (const GDomNode *pNode)
	Load from a DOM. More...

virtual	~GNaiveBayes ()

void	autoTune (GMatrix &features, GMatrix &labels)
	Uses cross-validation to find a set of parameters that works well with the provided data. More...

virtual void	clear ()
	See the comment for GSupervisedLearner::clear. More...

double	equivalentSampleSize ()
	Returns the equivalent sample size. (The number of samples of each possible value that is added by default to prevent zeros.) More...

virtual void	predict (const GVec &in, GVec &out)
	See the comment for GSupervisedLearner::predict. More...

virtual void	predictDistribution (const GVec &in, GPrediction *pOut)
	See the comment for GSupervisedLearner::predictDistribution. More...

virtual GDomNode *	serialize (GDom *pDoc) const
	Marshal this object into a DOM, which can then be converted to a variety of serial formats. More...

void	setEquivalentSampleSize (double d)
	To ensure that unsampled values don't dominate the joint distribution by multiplying by a zero, each value is given at least as much representation as specified here. (The default is 0.5, which is as if there were half of a sample for each value.) More...

virtual void	trainIncremental (const GVec &in, const GVec &out)
	Adds a single training sample to the collection. More...

virtual void	trainSparse (GSparseMatrix &features, GMatrix &labels)
	See the comment for GIncrementalLearner::trainSparse This method assumes that the values in pData are all binary values (0 or 1). More...

Public Member Functions inherited from GClasses::GIncrementalLearner
	GIncrementalLearner ()
	General-purpose constructor. More...

	GIncrementalLearner (const GDomNode *pNode)
	Deserialization constructor. More...

virtual	~GIncrementalLearner ()
	Destructor. More...

void	beginIncrementalLearning (const GRelation &featureRel, const GRelation &labelRel)
	You must call this method before you call trainIncremental. More...

void	beginIncrementalLearning (const GMatrix &features, const GMatrix &labels)
	A version of beginIncrementalLearning that supports data-dependent filters. More...

virtual bool	canTrainIncrementally ()
	Returns true. More...

virtual bool	isFilter ()
	Only the GFilter class should return true to this method. More...

Public Member Functions inherited from GClasses::GSupervisedLearner
	GSupervisedLearner ()
	General-purpose constructor. More...

	GSupervisedLearner (const GDomNode *pNode)
	Deserialization constructor. More...

virtual	~GSupervisedLearner ()
	Destructor. More...

void	basicTest (double minAccuracy1, double minAccuracy2, double deviation=1e-6, bool printAccuracy=false, double warnRange=0.035)
	This is a helper method used by the unit tests of several model learners. More...

virtual bool	canGeneralize ()
	Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows. More...

void	confusion (GMatrix &features, GMatrix &labels, std::vector< GMatrix * > &stats)
	Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL. More...

void	precisionRecall (double *pOutPrecision, size_t nPrecisionSize, GMatrix &features, GMatrix &labels, size_t label, size_t nReps)
	label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision. More...

const GRelation &	relFeatures ()
	Returns a reference to the feature relation (meta-data about the input attributes). More...

const GRelation &	relLabels ()
	Returns a reference to the label relation (meta-data about the output attributes). More...

double	sumSquaredError (const GMatrix &features, const GMatrix &labels, double *pOutSAE=NULL)
	Computes the sum-squared-error for predicting the labels from the features. For categorical labels, Hamming distance is used. More...

void	train (const GMatrix &features, const GMatrix &labels)
	Call this method to train the model. More...

virtual double	trainAndTest (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels, double *pOutSAE=NULL)
	Trains and tests this learner. Returns sum-squared-error. More...

Public Member Functions inherited from GClasses::GTransducer
	GTransducer ()
	General-purpose constructor. More...

	GTransducer (const GTransducer &that)
	Copy-constructor. Throws an exception to prevent models from being copied by value. More...

virtual	~GTransducer ()

virtual bool	canImplicitlyHandleMissingFeatures ()
	Returns true iff this algorithm supports missing feature values. If it cannot, then an imputation filter will be used to predict missing values before any feature-vectors are passed to the algorithm. More...

virtual bool	canImplicitlyHandleNominalFeatures ()
	Returns true iff this algorithm can implicitly handle nominal features. If it cannot, then the GNominalToCat transform will be used to convert nominal features to continuous values before passing them to it. More...

virtual bool	canImplicitlyHandleNominalLabels ()
	Returns true iff this algorithm can implicitly handle nominal labels (a.k.a. classification). If it cannot, then the GNominalToCat transform will be used during training to convert nominal labels to continuous values, and to convert categorical predictions back to nominal labels. More...

double	crossValidate (const GMatrix &features, const GMatrix &labels, size_t nFolds, double pOutSAE=NULL, RepValidateCallback pCB=NULL, size_t nRep=0, void pThis=NULL)
	Perform n-fold cross validation on pData. Returns sum-squared error. Uses trainAndTest for each fold. pCB is an optional callback method for reporting intermediate stats. It can be NULL if you don't want intermediate reporting. nRep is just the rep number that will be passed to the callback. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. if pOutSAE is not NULL, the sum absolute error will be placed there. More...

GTransducer &	operator= (const GTransducer &other)
	Throws an exception to prevent models from being copied by value. More...

GRand &	rand ()
	Returns a reference to the random number generator associated with this object. For example, you could use it to change the random seed, to make this algorithm behave differently. This might be important, for example, in an ensemble of learners. More...

double	repValidate (const GMatrix &features, const GMatrix &labels, size_t reps, size_t nFolds, double pOutSAE=NULL, RepValidateCallback pCB=NULL, void pThis=NULL)
	Perform cross validation "nReps" times and return the average score. pCB is an optional callback method for reporting intermediate stats It can be NULL if you don't want intermediate reporting. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. if pOutSAE is not NULL, the sum absolute error will be placed there. More...

virtual bool	supportedFeatureRange (double pOutMin, double pOutMax)
	Returns true if this algorithm supports any feature value, or if it does not implicitly handle continuous features. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...

virtual bool	supportedLabelRange (double pOutMin, double pOutMax)
	Returns true if this algorithm supports any label value, or if it does not implicitly handle continuous labels. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...

std::unique_ptr< GMatrix >	transduce (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
	Predicts a set of labels to correspond with features2, such that these labels will be consistent with the patterns exhibited by features1 and labels1. More...

void	transductiveConfusionMatrix (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels, std::vector< GMatrix * > &stats)
	Makes a confusion matrix for a transduction algorithm. More...

Static Public Member Functions
static void	test ()
	Performs unit tests for this class. Throws an exception if there is a failure. More...

Static Public Member Functions inherited from GClasses::GSupervisedLearner
static void	test ()
	Runs some unit tests related to supervised learning. Throws an exception if any problems are found. More...

Protected Member Functions
virtual void	beginIncrementalLearningInner (const GRelation &featureRel, const GRelation &labelRel)
	See the comment for GIncrementalLearner::beginIncrementalLearningInner. More...

virtual bool	canImplicitlyHandleContinuousFeatures ()
	See the comment for GTransducer::canImplicitlyHandleContinuousFeatures. More...

virtual bool	canImplicitlyHandleContinuousLabels ()
	See the comment for GTransducer::canImplicitlyHandleContinuousLabels. More...

virtual void	trainInner (const GMatrix &features, const GMatrix &labels)
	See the comment for GSupervisedLearner::trainInner. More...

Protected Member Functions inherited from GClasses::GIncrementalLearner
virtual void	beginIncrementalLearningInner (const GMatrix &features, const GMatrix &labels)

Protected Member Functions inherited from GClasses::GSupervisedLearner
GDomNode *	baseDomNode (GDom pDoc, const char szClassName) const
	Child classes should use this in their implementation of serialize. More...

size_t	precisionRecallContinuous (GPrediction pOutput, double pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label)
	This is a helper method used by precisionRecall. More...

size_t	precisionRecallNominal (GPrediction pOutput, double pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label, int value)
	This is a helper method used by precisionRecall. More...

void	setupFilters (const GMatrix &features, const GMatrix &labels)
	This method determines which data filters (normalize, discretize, and/or nominal-to-cat) are needed and trains them. More...

virtual std::unique_ptr< GMatrix >	transduceInner (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
	See GTransducer::transduce. More...

Protected Attributes
double	m_equivalentSampleSize

size_t	m_nSampleCount

GNaiveBayesOutputAttr **	m_pOutputs

Protected Attributes inherited from GClasses::GSupervisedLearner
GRelation *	m_pRelFeatures

GRelation *	m_pRelLabels

Protected Attributes inherited from GClasses::GTransducer
GRand	m_rand

Additional Inherited Members
Static Protected Member Functions inherited from GClasses::GSupervisedLearner
static void	addInterpolatedFunction (double pOut, size_t nOutVals, double pIn, size_t nInVals)
	Adds the function pIn to pOut after interpolating pIn to be the same size as pOut. (This is a helper-function used by precisionRecall.) More...

Constructor & Destructor Documentation

GClasses::GNaiveBayes::GNaiveBayes ( )

GClasses::GNaiveBayes::GNaiveBayes ( const GDomNode * pNode )

Load from a DOM.

virtual GClasses::GNaiveBayes::~GNaiveBayes ( )

virtual

Member Function Documentation

void GClasses::GNaiveBayes::autoTune	(	GMatrix &	features,
		GMatrix &	labels
	)

Uses cross-validation to find a set of parameters that works well with the provided data.

virtual void GClasses::GNaiveBayes::beginIncrementalLearningInner	(	const GRelation &	featureRel,
		const GRelation &	labelRel
	)

protectedvirtual

See the comment for GIncrementalLearner::beginIncrementalLearningInner.

Implements GClasses::GIncrementalLearner.

virtual bool GClasses::GNaiveBayes::canImplicitlyHandleContinuousFeatures ( )

inlineprotectedvirtual

See the comment for GTransducer::canImplicitlyHandleContinuousFeatures.

Reimplemented from GClasses::GTransducer.

virtual bool GClasses::GNaiveBayes::canImplicitlyHandleContinuousLabels ( )

inlineprotectedvirtual

See the comment for GTransducer::canImplicitlyHandleContinuousLabels.

Reimplemented from GClasses::GTransducer.

virtual void GClasses::GNaiveBayes::clear ( )

virtual

See the comment for GSupervisedLearner::clear.

Implements GClasses::GSupervisedLearner.

double GClasses::GNaiveBayes::equivalentSampleSize ( )

inline

Returns the equivalent sample size. (The number of samples of each possible value that is added by default to prevent zeros.)

virtual void GClasses::GNaiveBayes::predict	(	const GVec &	in,
		GVec &	out
	)

virtual

See the comment for GSupervisedLearner::predict.

Implements GClasses::GSupervisedLearner.

virtual void GClasses::GNaiveBayes::predictDistribution	(	const GVec &	in,
		GPrediction *	pOut
	)

virtual

See the comment for GSupervisedLearner::predictDistribution.

Implements GClasses::GSupervisedLearner.

virtual GDomNode* GClasses::GNaiveBayes::serialize ( GDom * pDoc ) const

virtual

Marshal this object into a DOM, which can then be converted to a variety of serial formats.

Implements GClasses::GSupervisedLearner.

void GClasses::GNaiveBayes::setEquivalentSampleSize ( double d )

inline

To ensure that unsampled values don't dominate the joint distribution by multiplying by a zero, each value is given at least as much representation as specified here. (The default is 0.5, which is as if there were half of a sample for each value.)

static void GClasses::GNaiveBayes::test ( )

static

Performs unit tests for this class. Throws an exception if there is a failure.

virtual void GClasses::GNaiveBayes::trainIncremental	(	const GVec &	in,
		const GVec &	out
	)

virtual

Adds a single training sample to the collection.

Implements GClasses::GIncrementalLearner.

virtual void GClasses::GNaiveBayes::trainInner	(	const GMatrix &	features,
		const GMatrix &	labels
	)

protectedvirtual

See the comment for GSupervisedLearner::trainInner.

Implements GClasses::GSupervisedLearner.

virtual void GClasses::GNaiveBayes::trainSparse	(	GSparseMatrix &	features,
		GMatrix &	labels
	)

virtual

See the comment for GIncrementalLearner::trainSparse This method assumes that the values in pData are all binary values (0 or 1).

Implements GClasses::GIncrementalLearner.

Member Data Documentation

double GClasses::GNaiveBayes::m_equivalentSampleSize

protected

size_t GClasses::GNaiveBayes::m_nSampleCount

protected

GNaiveBayesOutputAttr** GClasses::GNaiveBayes::m_pOutputs

protected

Detailed Description

Public Member Functions

Static Public Member Functions

Protected Member Functions

Protected Attributes

Additional Inherited Members

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation