This is a simple tokenizer that reads a file, one token at-a-time. To use it, you should make a child class that defines several character sets. Example:
class MyTokenizer : public GTokenizer { public: GCharSet m_whitespace, m_alphanum, m_float, m_commanewline;
MyTokenizer(const char* szFilename) : GTokenizer(szFilename), m_whitespace("\t\n\r "), m_alphanum("a-zA-Z0-9"), m_float("-.,0-9e"), m_commanewline(",\n") {}
virtual ~MyTokenizer() {} };
|
| GTokenizer (const char *szFilename) |
| Opens the specified filename. charSets is a class that inherits from GCharSetHolder. More...
|
|
| GTokenizer (const char *pFile, size_t len) |
| Uses the provided buffer of data. (If len is 0, then it will read until a null-terminator is found.) More...
|
|
virtual | ~GTokenizer () |
|
void | advance (size_t n) |
| Advances past the next 'n' characters. (Stops if the end-of-file is reached.) More...
|
|
char * | appendToToken (const char *string) |
| Appends a string to the current token (without modifying the file), and returns the full modified token. More...
|
|
size_t | col () |
| Returns the current column index, which is the number of characters that have been read since the last newline character, plus 1. More...
|
|
void | expect (const char *szString) |
| Reads past the specified string of characters. If the characters that are read from the file do not exactly match those in the string, an exception is thrown. More...
|
|
bool | has_more () |
| Returns whether there is more data to be read. More...
|
|
size_t | line () |
| Returns the current line number. (Begins at 1. Each time a '
' is encountered, the line number is incremented. Mac line-endings do not increment the line number.) More...
|
|
char * | nextArg (GCharSet &delimiters, char escapeChar= '\\') |
| Returns the next token defined by the given delimiters. More...
|
|
char * | nextUntil (GCharSet &delimeters, size_t minLen=1) |
| Reads until the next character would be one of the specified delimeters. The delimeter character is not read. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...
|
|
char * | nextUntilNotEscaped (char escapeChar, GCharSet &delimeters) |
| Reads until the next character would be one of the specified delimeters, and the current character is not escapeChar. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...
|
|
char * | nextWhile (GCharSet &set, size_t minLen=1) |
| Reads while the character is one of the specified characters. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned. More...
|
|
char | peek () |
| Returns the next character in the stream. Returns '\0' if there are no more characters in the stream. (This could theoretically be ambiguous if the the next character in the stream is '\0', but presumably this class is mostly used for parsing text files, and that character should not occur in a text file.) More...
|
|
char | peek (size_t n) |
| Peek up to GTOKENIZER_MAX_LOOKAHEAD characters ahead. If n=0, returns the next character to be read. If n=1, retuns the second character ahead to be read, and so on. If n>=GTOKENIZER_MAX_LOOKAHEAD, throws an exception. More...
|
|
void | skip (GCharSet &delimeters) |
| Reads past any characters specified in the list of delimeters. If szDelimeters is NULL, then any characters <= ' ' are considered to be delimeters. (This method is similar to nextWhile, except that it does not buffer the characters it reads.) More...
|
|
void | skipTo (GCharSet &delimeters) |
| Skip until the next character is one of the delimeters. (This method is the same as nextUntil, except that it does not buffer what it reads.) More...
|
|
size_t | tokenLength () |
| Returns the length of the last token that was returned. More...
|
|
char * | trim (GCharSet &set) |
| Returns the previously-returned token, except with any of the specified characters trimmed off of both the beginning and end of the token. For example, this method could be used to convert " tok " to "tok". (Calling this method will not change the value returned by tokenLength.) More...
|
|
char* GClasses::GTokenizer::nextArg |
( |
GCharSet & |
delimiters, |
|
|
char |
escapeChar = '\\' |
|
) |
| |
Returns the next token defined by the given delimiters.
Allows quoting " or ' and escapes with an escape character.
Returns the next token delimited by the given delimiters.
The token may include delimiter characters if it is enclosed in quotes or the delimiters are escaped.
If the next token begins with single or double quotes, then the token will be delimited by the quotes. If a newline character or the end-of-file is encountered before the matching quote, then an exception is thrown. The quotation marks are included in the token. The escape character is ignored inside quotes (unlike what would happen in C++).
If the first character of the token is not an apostrophe or quotation mark then it attempts to use the escape character to escape any special characters. That is, if the escape character appears, then the next character is interpreted to be part of the token. The escape character is consumed but not included in the token. Thus, if the input is (The \\rain\\ in \"spain\") (not including the parentheses) and the esapeChar is '\', then the token read will be (The \rain\ in "spain").
No token may extend over multiple lines, thus the new-line character acts as an unescapable delimiter, no matter what set of delimiters is passed to the function.
- Parameters
-
delimiters | the set of delimiters used to separate tokens |
escapeChar | the character that can be used to escape delimiters when quoting is not active |
- Returns
- a pointer to an internal character buffer containing the null-terminated token