===== Split Line - A Clean and Small String Tokenizer =====
=== Overview ===
//split_line// is a clean STL string tokenizer written in C++ in less than 100 lines of code. In its simplest form it creates a vector of strings with the tokens from a line of text separated at space, tab, carriage return and newline. In its most complex form it supports user provided delimiters, a user provided quote character, a user provided escape character, a special character for comments and limited abilities to resume tokenization with another part of the string.
=== Features ===
* splits a line of text into words delimited by one or more delimiters
* user can provide delimiters (defaults to \t\r\n and space)
* user can provide one special character for quoted text (defaults to ")
* user can provide one special escape character (defaults to \)
* user can provide one special character for comments (disabled by default)
* limited support to resume at another part of the string
=== Download ===
* {{projects:split_line-1.0.zip}}
=== Code Example ===
int main(int argc, char *argv[]) {
vector tokens;
string line = "Writing programs \"in C++\" is \
Fun!!";
split_line(tokens, line);
cout << "Tokens:" << endl;
for(unsigned int i = 0; i < tokens.size(); i++)
cout << "'" << tokens[i] << "'" << endl;
return 0;
}
Output:
Tokens:
'Writing'
'programs'
'in C++'
'is'
'Fun!!'
=== Documentation ===
A more complex example can be found in [[cfg_parser]] in function readFile(). The function resembles a state machine with 5 states (see enum SPLIT_LINE_STATE). It is possible to provide the starting state of the machine which gives you the ability to resume tokenization of a string in some cases. In resuming mode (start_state != SL_NORMAL) the read in characters are appended to the last string in the string vector //ret// until the state switches back to SL_NORMAL. In [[cfg_parser]] this behaviour was used to read in multiline values. However this features does not give you the ability to split a string anywhere yourself and then pass it over to split_line (using the return state as new start_state). The outcome will be different from what you might expect in most cases!
enum {
SL_NORMAL,
SL_ESCAPE,
SL_SAFEMODE,
SL_SAFEESCAPE,
SL_COMMENT,
} SPLIT_LINE_STATE;
// splits line into tokens and stores them in ret. Supports delimiters, escape characters,
// ignores special characters between safemode_char and between comment_char and line end '\n'.
// returns SPLIT_LINE_STATE the parser was in when returning
int split_line(std::vector& ret, std::string& line, const std::string& delimiters = " \t\r\n", char escape_char = '\\', char safemode_char = '"', char comment_char = '\0', int start_state = SL_NORMAL);
== State Diagram ==
{{ projects:splitline.png }}
**Legend**
* character read in / action
* eat: append the character to the current token
* finish: append token to token list and start with a new token
=== License ===
This software is licensed under the CC-GNU GPL.