GClasses
GClasses::GTokenizer Class Reference

Detailed Description

This is a simple tokenizer that reads a file, one token at-a-time. To use it, you should make a child class that defines several character sets. Example:

class MyTokenizer : public GTokenizer { public: GCharSet m_whitespace, m_alphanum, m_float, m_commanewline;

MyTokenizer(const char* szFilename) : GTokenizer(szFilename), m_whitespace("\t\n\r "), m_alphanum("a-zA-Z0-9"), m_float("-.,0-9e"), m_commanewline(",\n") {}

virtual ~MyTokenizer() {} };

#include <GTokenizer.h>

Public Member Functions

 GTokenizer (const char *szFilename)
 Opens the specified filename. charSets is a class that inherits from GCharSetHolder. More...
 
 GTokenizer (const char *pFile, size_t len)
 Uses the provided buffer of data. (If len is 0, then it will read until a null-terminator is found.) More...
 
virtual ~GTokenizer ()
 
void advance (size_t n)
 Advances past the next 'n' characters. (Stops if the end-of-file is reached.) More...
 
char * appendToToken (const char *string)
 Appends a string to the current token (without modifying the file), and returns the full modified token. More...
 
size_t col ()
 Returns the current column index, which is the number of characters that have been read since the last newline character, plus 1. More...
 
void expect (const char *szString)
 Reads past the specified string of characters. If the characters that are read from the file do not exactly match those in the string, an exception is thrown. More...
 
bool has_more ()
 Returns whether there is more data to be read. More...
 
size_t line ()
 Returns the current line number. (Begins at 1. Each time a '
' is encountered, the line number is incremented. Mac line-endings do not increment the line number.) More...
 
char * nextArg (GCharSet &delimiters, char escapeChar= '\\')
 Returns the next token defined by the given delimiters. More...
 
char * nextUntil (GCharSet &delimeters, size_t minLen=1)
 Reads until the next character would be one of the specified delimeters. The delimeter character is not read. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...
 
char * nextUntilNotEscaped (char escapeChar, GCharSet &delimeters)
 Reads until the next character would be one of the specified delimeters, and the current character is not escapeChar. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...
 
char * nextWhile (GCharSet &set, size_t minLen=1)
 Reads while the character is one of the specified characters. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...
 
char peek ()
 Returns the next character in the stream. Returns '\0' if there are no more characters in the stream. (This could theoretically be ambiguous if the the next character in the stream is '\0', but presumably this class is mostly used for parsing text files, and that character should not occur in a text file.) More...
 
char peek (size_t n)
 Peek up to GTOKENIZER_MAX_LOOKAHEAD characters ahead. If n=0, returns the next character to be read. If n=1, retuns the second character ahead to be read, and so on. If n>=GTOKENIZER_MAX_LOOKAHEAD, throws an exception. More...
 
void skip (GCharSet &delimeters)
 Reads past any characters specified in the list of delimeters. If szDelimeters is NULL, then any characters <= ' ' are considered to be delimeters. (This method is similar to nextWhile, except that it does not buffer the characters it reads.) More...
 
void skipTo (GCharSet &delimeters)
 Skip until the next character is one of the delimeters. (This method is the same as nextUntil, except that it does not buffer what it reads.) More...
 
size_t tokenLength ()
 Returns the length of the last token that was returned. More...
 
char * trim (GCharSet &set)
 Returns the previously-returned token, except with any of the specified characters trimmed off of both the beginning and end of the token. For example, this method could be used to convert " tok " to "tok". (Calling this method will not change the value returned by tokenLength.) More...
 

Protected Member Functions

void bufferChar (char c)
 Read the next character into the token buffer. More...
 
char get ()
 Returns the next character in the stream. If the next character is EOF, then it returns '\0'. More...
 
void growBuf ()
 Double the size of the token buffer. More...
 
char * nullTerminate ()
 Add a '\0' to the end of the token buffer and return the token buffer. More...
 

Protected Attributes

size_t m_line
 
size_t m_lineCol
 
char * m_pBufEnd
 
char * m_pBufPos
 
char * m_pBufStart
 
std::istream * m_pStream
 
char m_q [GTOKENIZER_MAX_LOOKAHEAD]
 
size_t m_qCount
 
size_t m_qPos
 

Constructor & Destructor Documentation

GClasses::GTokenizer::GTokenizer ( const char *  szFilename)

Opens the specified filename. charSets is a class that inherits from GCharSetHolder.

GClasses::GTokenizer::GTokenizer ( const char *  pFile,
size_t  len 
)

Uses the provided buffer of data. (If len is 0, then it will read until a null-terminator is found.)

virtual GClasses::GTokenizer::~GTokenizer ( )
virtual

Member Function Documentation

void GClasses::GTokenizer::advance ( size_t  n)

Advances past the next 'n' characters. (Stops if the end-of-file is reached.)

char* GClasses::GTokenizer::appendToToken ( const char *  string)

Appends a string to the current token (without modifying the file), and returns the full modified token.

void GClasses::GTokenizer::bufferChar ( char  c)
protected

Read the next character into the token buffer.

size_t GClasses::GTokenizer::col ( )

Returns the current column index, which is the number of characters that have been read since the last newline character, plus 1.

void GClasses::GTokenizer::expect ( const char *  szString)

Reads past the specified string of characters. If the characters that are read from the file do not exactly match those in the string, an exception is thrown.

char GClasses::GTokenizer::get ( )
protected

Returns the next character in the stream. If the next character is EOF, then it returns '\0'.

void GClasses::GTokenizer::growBuf ( )
protected

Double the size of the token buffer.

bool GClasses::GTokenizer::has_more ( )

Returns whether there is more data to be read.

size_t GClasses::GTokenizer::line ( )

Returns the current line number. (Begins at 1. Each time a '
' is encountered, the line number is incremented. Mac line-endings do not increment the line number.)

char* GClasses::GTokenizer::nextArg ( GCharSet delimiters,
char  escapeChar = '\\' 
)

Returns the next token defined by the given delimiters.

Allows quoting " or ' and escapes with an escape character.

Returns the next token delimited by the given delimiters.

The token may include delimiter characters if it is enclosed in quotes or the delimiters are escaped.

If the next token begins with single or double quotes, then the token will be delimited by the quotes. If a newline character or the end-of-file is encountered before the matching quote, then an exception is thrown. The quotation marks are included in the token. The escape character is ignored inside quotes (unlike what would happen in C++).

If the first character of the token is not an apostrophe or quotation mark then it attempts to use the escape character to escape any special characters. That is, if the escape character appears, then the next character is interpreted to be part of the token. The escape character is consumed but not included in the token. Thus, if the input is (The \\rain\\ in \"spain\") (not including the parentheses) and the esapeChar is '\', then the token read will be (The \rain\ in "spain").

No token may extend over multiple lines, thus the new-line character acts as an unescapable delimiter, no matter what set of delimiters is passed to the function.

Parameters
delimitersthe set of delimiters used to separate tokens
escapeCharthe character that can be used to escape delimiters when quoting is not active
Returns
a pointer to an internal character buffer containing the null-terminated token
char* GClasses::GTokenizer::nextUntil ( GCharSet delimeters,
size_t  minLen = 1 
)

Reads until the next character would be one of the specified delimeters. The delimeter character is not read. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.

char* GClasses::GTokenizer::nextUntilNotEscaped ( char  escapeChar,
GCharSet delimeters 
)

Reads until the next character would be one of the specified delimeters, and the current character is not escapeChar. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.

char* GClasses::GTokenizer::nextWhile ( GCharSet set,
size_t  minLen = 1 
)

Reads while the character is one of the specified characters. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.

char* GClasses::GTokenizer::nullTerminate ( )
protected

Add a '\0' to the end of the token buffer and return the token buffer.

char GClasses::GTokenizer::peek ( )

Returns the next character in the stream. Returns '\0' if there are no more characters in the stream. (This could theoretically be ambiguous if the the next character in the stream is '\0', but presumably this class is mostly used for parsing text files, and that character should not occur in a text file.)

char GClasses::GTokenizer::peek ( size_t  n)

Peek up to GTOKENIZER_MAX_LOOKAHEAD characters ahead. If n=0, returns the next character to be read. If n=1, retuns the second character ahead to be read, and so on. If n>=GTOKENIZER_MAX_LOOKAHEAD, throws an exception.

void GClasses::GTokenizer::skip ( GCharSet delimeters)

Reads past any characters specified in the list of delimeters. If szDelimeters is NULL, then any characters <= ' ' are considered to be delimeters. (This method is similar to nextWhile, except that it does not buffer the characters it reads.)

void GClasses::GTokenizer::skipTo ( GCharSet delimeters)

Skip until the next character is one of the delimeters. (This method is the same as nextUntil, except that it does not buffer what it reads.)

size_t GClasses::GTokenizer::tokenLength ( )

Returns the length of the last token that was returned.

char* GClasses::GTokenizer::trim ( GCharSet set)

Returns the previously-returned token, except with any of the specified characters trimmed off of both the beginning and end of the token. For example, this method could be used to convert " tok " to "tok". (Calling this method will not change the value returned by tokenLength.)

Member Data Documentation

size_t GClasses::GTokenizer::m_line
protected
size_t GClasses::GTokenizer::m_lineCol
protected
char* GClasses::GTokenizer::m_pBufEnd
protected
char* GClasses::GTokenizer::m_pBufPos
protected
char* GClasses::GTokenizer::m_pBufStart
protected
std::istream* GClasses::GTokenizer::m_pStream
protected
char GClasses::GTokenizer::m_q[GTOKENIZER_MAX_LOOKAHEAD]
protected
size_t GClasses::GTokenizer::m_qCount
protected
size_t GClasses::GTokenizer::m_qPos
protected