Detailed Description

This is a simple tokenizer that reads a file, one token at-a-time. To use it, you should make a child class that defines several character sets. Example:

class MyTokenizer : public GTokenizer { public: GCharSet m_whitespace, m_alphanum, m_float, m_commanewline;

MyTokenizer(const char* szFilename) : GTokenizer(szFilename), m_whitespace("\t\n\r "), m_alphanum("a-zA-Z0-9"), m_float("-.,0-9e"), m_commanewline(",\n") {}

virtual ~MyTokenizer() {} };

#include <GTokenizer.h>

Public Member Functions
	GTokenizer (const char *szFilename)
	Opens the specified filename. charSets is a class that inherits from GCharSetHolder. More...

	GTokenizer (const char *pFile, size_t len)
	Uses the provided buffer of data. (If len is 0, then it will read until a null-terminator is found.) More...

virtual	~GTokenizer ()

void	advance (size_t n)
	Advances past the next 'n' characters. (Stops if the end-of-file is reached.) More...

char *	appendToToken (const char *string)
	Appends a string to the current token (without modifying the file), and returns the full modified token. More...

size_t	col ()
	Returns the current column index, which is the number of characters that have been read since the last newline character, plus 1. More...

void	expect (const char *szString)
	Reads past the specified string of characters. If the characters that are read from the file do not exactly match those in the string, an exception is thrown. More...

bool	has_more ()
	Returns whether there is more data to be read. More...

size_t	line ()
	Returns the current line number. (Begins at 1. Each time a ' ' is encountered, the line number is incremented. Mac line-endings do not increment the line number.) More...

char *	nextArg (GCharSet &delimiters, char escapeChar= '\\')
	Returns the next token defined by the given delimiters. More...

char *	nextUntil (GCharSet &delimeters, size_t minLen=1)
	Reads until the next character would be one of the specified delimeters. The delimeter character is not read. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...

char *	nextUntilNotEscaped (char escapeChar, GCharSet &delimeters)
	Reads until the next character would be one of the specified delimeters, and the current character is not escapeChar. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...

char *	nextWhile (GCharSet &set, size_t minLen=1)
	Reads while the character is one of the specified characters. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...

char	peek ()
	Returns the next character in the stream. Returns '\0' if there are no more characters in the stream. (This could theoretically be ambiguous if the the next character in the stream is '\0', but presumably this class is mostly used for parsing text files, and that character should not occur in a text file.) More...

char	peek (size_t n)
	Peek up to GTOKENIZER_MAX_LOOKAHEAD characters ahead. If n=0, returns the next character to be read. If n=1, retuns the second character ahead to be read, and so on. If n>=GTOKENIZER_MAX_LOOKAHEAD, throws an exception. More...

void	skip (GCharSet &delimeters)
	Reads past any characters specified in the list of delimeters. If szDelimeters is NULL, then any characters <= ' ' are considered to be delimeters. (This method is similar to nextWhile, except that it does not buffer the characters it reads.) More...

void	skipTo (GCharSet &delimeters)
	Skip until the next character is one of the delimeters. (This method is the same as nextUntil, except that it does not buffer what it reads.) More...

size_t	tokenLength ()
	Returns the length of the last token that was returned. More...

char *	trim (GCharSet &set)
	Returns the previously-returned token, except with any of the specified characters trimmed off of both the beginning and end of the token. For example, this method could be used to convert " tok " to "tok". (Calling this method will not change the value returned by tokenLength.) More...

Protected Member Functions
void	bufferChar (char c)
	Read the next character into the token buffer. More...

char	get ()
	Returns the next character in the stream. If the next character is EOF, then it returns '\0'. More...

void	growBuf ()
	Double the size of the token buffer. More...

char *	nullTerminate ()
	Add a '\0' to the end of the token buffer and return the token buffer. More...

Protected Attributes
size_t	m_line

size_t	m_lineCol

char *	m_pBufEnd

char *	m_pBufPos

char *	m_pBufStart

std::istream *	m_pStream

char	m_q [GTOKENIZER_MAX_LOOKAHEAD]

size_t	m_qCount

size_t	m_qPos

Constructor & Destructor Documentation

GClasses::GTokenizer::GTokenizer ( const char * szFilename )

Opens the specified filename. charSets is a class that inherits from GCharSetHolder.

GClasses::GTokenizer::GTokenizer	(	const char *	pFile,
		size_t	len
	)

Uses the provided buffer of data. (If len is 0, then it will read until a null-terminator is found.)

virtual GClasses::GTokenizer::~GTokenizer ( )

virtual

Member Function Documentation

void GClasses::GTokenizer::advance ( size_t n )

Advances past the next 'n' characters. (Stops if the end-of-file is reached.)

char* GClasses::GTokenizer::appendToToken ( const char * string )

Appends a string to the current token (without modifying the file), and returns the full modified token.

void GClasses::GTokenizer::bufferChar ( char c )

protected

Read the next character into the token buffer.

size_t GClasses::GTokenizer::col ( )

Returns the current column index, which is the number of characters that have been read since the last newline character, plus 1.

void GClasses::GTokenizer::expect ( const char * szString )

Reads past the specified string of characters. If the characters that are read from the file do not exactly match those in the string, an exception is thrown.

char GClasses::GTokenizer::get ( )

protected

Returns the next character in the stream. If the next character is EOF, then it returns '\0'.

void GClasses::GTokenizer::growBuf ( )

protected

Double the size of the token buffer.

bool GClasses::GTokenizer::has_more ( )

Returns whether there is more data to be read.

size_t GClasses::GTokenizer::line ( )

Returns the current line number. (Begins at 1. Each time a '
' is encountered, the line number is incremented. Mac line-endings do not increment the line number.)

char* GClasses::GTokenizer::nextArg	(	GCharSet &	delimiters,
		char	escapeChar = `'\\'`
	)

Returns the next token defined by the given delimiters.

Allows quoting " or ' and escapes with an escape character.

Returns the next token delimited by the given delimiters.

The token may include delimiter characters if it is enclosed in quotes or the delimiters are escaped.

If the next token begins with single or double quotes, then the token will be delimited by the quotes. If a newline character or the end-of-file is encountered before the matching quote, then an exception is thrown. The quotation marks are included in the token. The escape character is ignored inside quotes (unlike what would happen in C++).

If the first character of the token is not an apostrophe or quotation mark then it attempts to use the escape character to escape any special characters. That is, if the escape character appears, then the next character is interpreted to be part of the token. The escape character is consumed but not included in the token. Thus, if the input is (The \\rain\\ in \"spain\") (not including the parentheses) and the esapeChar is '\', then the token read will be (The \rain\ in "spain").

No token may extend over multiple lines, thus the new-line character acts as an unescapable delimiter, no matter what set of delimiters is passed to the function.

Parameters

delimiters	the set of delimiters used to separate tokens
escapeChar	the character that can be used to escape delimiters when quoting is not active

Returns: a pointer to an internal character buffer containing the null-terminated token

char* GClasses::GTokenizer::nextUntil	(	GCharSet &	delimeters,
		size_t	minLen = `1`
	)

Reads until the next character would be one of the specified delimeters. The delimeter character is not read. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.

char* GClasses::GTokenizer::nextUntilNotEscaped	(	char	escapeChar,
		GCharSet &	delimeters
	)

Reads until the next character would be one of the specified delimeters, and the current character is not escapeChar. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.

char* GClasses::GTokenizer::nextWhile	(	GCharSet &	set,
		size_t	minLen = `1`
	)

Reads while the character is one of the specified characters. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.

char* GClasses::GTokenizer::nullTerminate ( )

protected

Add a '\0' to the end of the token buffer and return the token buffer.

char GClasses::GTokenizer::peek ( )

Returns the next character in the stream. Returns '\0' if there are no more characters in the stream. (This could theoretically be ambiguous if the the next character in the stream is '\0', but presumably this class is mostly used for parsing text files, and that character should not occur in a text file.)

char GClasses::GTokenizer::peek ( size_t n )

Peek up to GTOKENIZER_MAX_LOOKAHEAD characters ahead. If n=0, returns the next character to be read. If n=1, retuns the second character ahead to be read, and so on. If n>=GTOKENIZER_MAX_LOOKAHEAD, throws an exception.

void GClasses::GTokenizer::skip ( GCharSet & delimeters )

Reads past any characters specified in the list of delimeters. If szDelimeters is NULL, then any characters <= ' ' are considered to be delimeters. (This method is similar to nextWhile, except that it does not buffer the characters it reads.)

void GClasses::GTokenizer::skipTo ( GCharSet & delimeters )

Skip until the next character is one of the delimeters. (This method is the same as nextUntil, except that it does not buffer what it reads.)

size_t GClasses::GTokenizer::tokenLength ( )

Returns the length of the last token that was returned.

char* GClasses::GTokenizer::trim ( GCharSet & set )

Returns the previously-returned token, except with any of the specified characters trimmed off of both the beginning and end of the token. For example, this method could be used to convert " tok " to "tok". (Calling this method will not change the value returned by tokenLength.)

Member Data Documentation

size_t GClasses::GTokenizer::m_line

protected

size_t GClasses::GTokenizer::m_lineCol

protected

char* GClasses::GTokenizer::m_pBufEnd

protected

char* GClasses::GTokenizer::m_pBufPos

protected

char* GClasses::GTokenizer::m_pBufStart

protected

std::istream* GClasses::GTokenizer::m_pStream

protected

char GClasses::GTokenizer::m_q[GTOKENIZER_MAX_LOOKAHEAD]

protected

size_t GClasses::GTokenizer::m_qCount

protected

size_t GClasses::GTokenizer::m_qPos

protected

Detailed Description

Public Member Functions

Protected Member Functions

Protected Attributes

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation