GClasses
|
A class for parsing CSV files (or tab-separated files, or whitespace separated files, etc.). (This class does not support Mac line endings, so you should replace all '' with '
' before using this class if your data comes from a Mac.)
#include <GMatrix.h>
Public Member Functions | |
GCSVParser () | |
~GCSVParser () | |
void | columnNamesInFirstRow () |
Indicate that the first row specifies column names. More... | |
void | parse (GMatrix &outMatrix, const char *szFilename) |
Load the specified file, and parse it. More... | |
void | parse (GMatrix &outMatrix, const char *pString, size_t len) |
Parse the given string. More... | |
std::string & | report (size_t column) |
Return a string that reports the status of the specified column. (This should only be called after parsing.) More... | |
void | setClearlyNumericalThreshold (size_t n) |
Specify the number of unique numerical values before a column is deemed to be clearly numerical. More... | |
void | setMaxVals (size_t n) |
Specify the maximum number of values to allow in a categorical attribute. The parsing of any columns that contain non-numerical values, and contain more than this number of unique values, will be aborted. More... | |
void | setNominalAttr (size_t attr) |
Indiciate that the specified attribute should be treated as nominal. More... | |
void | setRealAttr (size_t attr) |
Indiciate that the specified attribute should be treated as real. More... | |
void | setSeparator (char c) |
Specify the separating character. '\0' indicates that an arbitrary amount of whitespace is used for separation. More... | |
void | setTimeFormat (size_t attr, const char *szFormat) |
Specify that a certain attribute should be expected to be a date or time stamp that follows a given format. For example, szFormat might be "YYYY-MM-DD hh:mm:ss". More... | |
void | tolerant () |
Specify to ignore inconsistencies in the number of values in each row. (Using this is very dangerous.) More... | |
Protected Attributes | |
size_t | m_clearlyNumericalThreshold |
bool | m_columnNamesInFirstRow |
std::map< size_t, std::string > | m_formats |
size_t | m_maxVals |
std::vector< std::string > | m_report |
char | m_separator |
std::map< size_t, size_t > | m_specifiedNominal |
std::map< size_t, size_t > | m_specifiedReal |
bool | m_tolerant |
GClasses::GCSVParser::GCSVParser | ( | ) |
GClasses::GCSVParser::~GCSVParser | ( | ) |
|
inline |
Indicate that the first row specifies column names.
void GClasses::GCSVParser::parse | ( | GMatrix & | outMatrix, |
const char * | szFilename | ||
) |
Load the specified file, and parse it.
void GClasses::GCSVParser::parse | ( | GMatrix & | outMatrix, |
const char * | pString, | ||
size_t | len | ||
) |
Parse the given string.
|
inline |
Return a string that reports the status of the specified column. (This should only be called after parsing.)
|
inline |
Specify the number of unique numerical values before a column is deemed to be clearly numerical.
|
inline |
Specify the maximum number of values to allow in a categorical attribute. The parsing of any columns that contain non-numerical values, and contain more than this number of unique values, will be aborted.
void GClasses::GCSVParser::setNominalAttr | ( | size_t | attr | ) |
Indiciate that the specified attribute should be treated as nominal.
void GClasses::GCSVParser::setRealAttr | ( | size_t | attr | ) |
Indiciate that the specified attribute should be treated as real.
|
inline |
Specify the separating character. '\0' indicates that an arbitrary amount of whitespace is used for separation.
void GClasses::GCSVParser::setTimeFormat | ( | size_t | attr, |
const char * | szFormat | ||
) |
Specify that a certain attribute should be expected to be a date or time stamp that follows a given format. For example, szFormat might be "YYYY-MM-DD hh:mm:ss".
|
inline |
Specify to ignore inconsistencies in the number of values in each row. (Using this is very dangerous.)
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |