split_line

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen Revision Vorhergehende Überarbeitung
split_line [25.02.2006 13:32 (vor 18 Jahren)] cwachasplit_line [16.11.2016 23:18 (vor 8 Jahren)] (aktuell) – Externe Bearbeitung 127.0.0.1
Zeile 1: Zeile 1:
 +===== Split Line - A Clean and Small String Tokenizer =====
 +=== Overview ===
 +//split_line// is a clean STL string tokenizer written in C++ in less than 100 lines of code. In its simplest form it creates a vector of strings with the tokens from a line of text separated at space, tab, carriage return and newline. In its most complex form it supports user provided delimiters, a user provided quote character, a user provided escape character, a special character for comments and limited abilities to resume tokenization with another part of the string.
 +
 +=== Features ===
 +  * splits a line of text into words delimited by one or more delimiters
 +  * user can provide delimiters (defaults to \t\r\n and space)
 +  * user can provide one special character for quoted text (defaults to ")
 +  * user can provide one special escape character (defaults to \)
 +  * user can provide one special character for comments (disabled by default)
 +  * limited support to resume at another part of the string
 +
 +=== Download ===
 +  * {{projects:split_line-1.0.zip}}
 +
 +=== Code Example ===
 +
 +<code cpp>
 +int main(int argc, char *argv[]) {
 +    vector<string> tokens;
 +    string line = "Writing    programs     \"in C++\"  is   \
 +     Fun!!";
 +
 +    split_line(tokens, line);
 +
 +    cout << "Tokens:" << endl;
 +    for(unsigned int i = 0; i < tokens.size(); i++)
 +        cout << "'" << tokens[i] << "'" << endl;
 +
 +    return 0;
 +}
 +</code>
 +
 +Output:
 +<code>
 +Tokens:
 +'Writing'
 +'programs'
 +'in C++'
 +'is'
 +'Fun!!'
 +</code>
 +=== Documentation ===
 +
 +A more complex example can be found in [[cfg_parser]] in function readFile(). The function resembles a state machine with 5 states (see enum SPLIT_LINE_STATE). It is possible to provide the starting state of the machine which gives you the ability to resume tokenization of a string in some cases. In resuming mode (start_state != SL_NORMAL) the read in characters are appended to the last string in the string vector //ret// until the state switches back to SL_NORMAL. In [[cfg_parser]] this behaviour was used to read in multiline values. However this features does not give you the ability to split a string anywhere yourself and then pass it over to split_line (using the return state as new start_state). The outcome will be different from what you might expect in most cases!
 +
 +<code cpp>
 +enum {
 + SL_NORMAL,
 + SL_ESCAPE,
 + SL_SAFEMODE,
 + SL_SAFEESCAPE,
 + SL_COMMENT,
 +} SPLIT_LINE_STATE;
 +
 +// splits line into tokens and stores them in ret. Supports delimiters, escape characters,
 +// ignores special characters between safemode_char and between comment_char and line end '\n'.
 +// returns SPLIT_LINE_STATE the parser was in when returning
 +int split_line(std::vector<std::string>& ret, std::string& line, const std::string& delimiters = " \t\r\n", char escape_char = '\\', char safemode_char = '"', char comment_char = '\0', int start_state = SL_NORMAL);
 +</code>
 +
 +== State Diagram ==
 +
 +{{ projects:splitline.png }}
 +
 +**Legend**
 +
 +  * character read in / action
 +  * eat: append the character to the current token
 +  * finish: append token to token list and start with a new token
 +
 +
 +=== License ===
 +
 +<html>
 +
 +<!-- Creative Commons License -->
 +<a href="http://creativecommons.org/licenses/GPL/2.0/">
 +<img alt="CC-GNU GPL" border="0" src="http://creativecommons.org/images
 +/public/cc-GPL-a.png" /></a><br />
 +This software is licensed under the <a href="http://creativecommons.org/licenses/GPL/2.0/">CC-GNU GPL</a>.
 +<!-- /Creative Commons License -->
 +
 +<!--
 +
 +<rdf:RDF xmlns="http://web.resource.org/cc/"
 +    xmlns:dc="http://purl.org/dc/elements/1.1/"
 +    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 +<Work rdf:about="">
 +   <license rdf:resource="http://creativecommons.org/licenses/GPL/2.0/" />
 +   <dc:type rdf:resource="http://purl.org/dc/dcmitype/Software" />
 +</Work>
 +
 +<License rdf:about="http://creativecommons.org/licenses/GPL/2.0/">
 +<permits rdf:resource="http://web.resource.org/cc/Reproduction" />
 +   <permits rdf:resource="http://web.resource.org/cc/Distribution" />
 +   <requires rdf:resource="http://web.resource.org/cc/Notice" />
 +   <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
 +   <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
 +   <requires rdf:resource="http://web.resource.org/cc/SourceCode" />
 +</License>
 +
 +</rdf:RDF>
 +
 +-->
 +
 +</html>
 +