Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
Beide Seiten der vorigen Revision Vorhergehende Überarbeitung | |||
split_line [25.02.2006 13:32 (vor 18 Jahren)] – cwacha | split_line [16.11.2016 23:18 (vor 8 Jahren)] (aktuell) – Externe Bearbeitung 127.0.0.1 | ||
---|---|---|---|
Zeile 1: | Zeile 1: | ||
+ | ===== Split Line - A Clean and Small String Tokenizer ===== | ||
+ | === Overview === | ||
+ | // | ||
+ | |||
+ | === Features === | ||
+ | * splits a line of text into words delimited by one or more delimiters | ||
+ | * user can provide delimiters (defaults to \t\r\n and space) | ||
+ | * user can provide one special character for quoted text (defaults to ") | ||
+ | * user can provide one special escape character (defaults to \) | ||
+ | * user can provide one special character for comments (disabled by default) | ||
+ | * limited support to resume at another part of the string | ||
+ | |||
+ | === Download === | ||
+ | * {{projects: | ||
+ | |||
+ | === Code Example === | ||
+ | |||
+ | <code cpp> | ||
+ | int main(int argc, char *argv[]) { | ||
+ | vector< | ||
+ | string line = " | ||
+ | | ||
+ | |||
+ | split_line(tokens, | ||
+ | |||
+ | cout << " | ||
+ | for(unsigned int i = 0; i < tokens.size(); | ||
+ | cout << "'" | ||
+ | |||
+ | return 0; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Output: | ||
+ | < | ||
+ | Tokens: | ||
+ | ' | ||
+ | ' | ||
+ | 'in C++' | ||
+ | ' | ||
+ | ' | ||
+ | </ | ||
+ | === Documentation === | ||
+ | |||
+ | A more complex example can be found in [[cfg_parser]] in function readFile(). The function resembles a state machine with 5 states (see enum SPLIT_LINE_STATE). It is possible to provide the starting state of the machine which gives you the ability to resume tokenization of a string in some cases. In resuming mode (start_state != SL_NORMAL) the read in characters are appended to the last string in the string vector //ret// until the state switches back to SL_NORMAL. In [[cfg_parser]] this behaviour was used to read in multiline values. However this features does not give you the ability to split a string anywhere yourself and then pass it over to split_line (using the return state as new start_state). The outcome will be different from what you might expect in most cases! | ||
+ | |||
+ | <code cpp> | ||
+ | enum { | ||
+ | SL_NORMAL, | ||
+ | SL_ESCAPE, | ||
+ | SL_SAFEMODE, | ||
+ | SL_SAFEESCAPE, | ||
+ | SL_COMMENT, | ||
+ | } SPLIT_LINE_STATE; | ||
+ | |||
+ | // splits line into tokens and stores them in ret. Supports delimiters, escape characters, | ||
+ | // ignores special characters between safemode_char and between comment_char and line end ' | ||
+ | // returns SPLIT_LINE_STATE the parser was in when returning | ||
+ | int split_line(std:: | ||
+ | </ | ||
+ | |||
+ | == State Diagram == | ||
+ | |||
+ | {{ projects: | ||
+ | |||
+ | **Legend** | ||
+ | |||
+ | * character read in / action | ||
+ | * eat: append the character to the current token | ||
+ | * finish: append token to token list and start with a new token | ||
+ | |||
+ | |||
+ | === License === | ||
+ | |||
+ | < | ||
+ | |||
+ | <!-- Creative Commons License --> | ||
+ | <a href=" | ||
+ | <img alt=" | ||
+ | / | ||
+ | This software is licensed under the <a href=" | ||
+ | <!-- /Creative Commons License --> | ||
+ | |||
+ | <!-- | ||
+ | |||
+ | <rdf:RDF xmlns=" | ||
+ | xmlns: | ||
+ | xmlns: | ||
+ | <Work rdf: | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | <License rdf: | ||
+ | <permits rdf: | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | |||
+ | --> | ||
+ | |||
+ | </ | ||
+ | |||