tokenizing

How do I tokenize a string in C++?

Java has a convenient split method: String str = "The quick brown fox"; String[] results = str.split(" "); Is there an easy way to do this in C++? ...

Best way to parse Space Separated Text

I have string like this /c SomeText\MoreText "Some Text\More Text\Lol" SomeText I want to tokenize it, however I can't just split on the spaces. I've come up with somewhat ugly parser that works, but I'm wondering if anyone has a more elegant design. This is in C# btw. EDIT: My ugly version, while ugly, is O(N) and may actually be ...

What is the easiest/best/most correct way to iterate through the characters of a string in Java?

StringTokenizer? Convert the String to a char[] and iterate over that? Something else? ...

Splitting strings in python

I have a string which is like this: this is [bracket test] "and quotes test " I'm trying to write something in Python to split it up by space while ignoring spaces within square braces and quotes. The result I'm looking for is: ['this','is','bracket test','and quotes test '] ...

What's the best way to have stringTokenizer split up a line of text into predefined variables

I'm not sure if the title is very clear, but basically what I have to do is read a line of text from a file and split it up into 8 different string variables. Each line will have the same 8 chunks in the same order (title, author, price, etc). So for each line of text, I want to end up with 8 strings. The first problem is that the last ...

Parsing a User's Query

So here's what I'm looking to achieve. I would like to give my users a single google-like textbox where they can type their queries. And I would like them to be able to express semi-natural language such as "view all between 1/1/2008 and 1/2/2008" it's ok if the syntax has to be fairly structured and limited to this specific domain ...

Implementing keyword comparison scheme (reverse search)

I have a constantly growing database of keywords. I need to parse incoming text inputs (articles, feeds etc) and find which keywords from the database are present in the text. The database of keywords is much larger than the text. Since the database is constantly growing (users add more and more keywords to watch for), I figure the bes...

Parsing data from txt file in J2ME

Basically I'm creating an indoor navigation system in J2ME. I've put the location details in a .txt file i.e. Locations names and their coordinates. Edges with respective start node and end node as well as the weight (length of the node). I put both details in the same file so users dont have to download multiple files to get their ma...

How do I parse a token from a string in C?

How do i parse tokens from an input string. For example: char *aString = "Hello world". I want the output to be: "Hello" "world" ...

allow special characters and spaces in jquery wordCount

I'm using jquery DynaCloud with wordCount to create a dynamic tagcloud. I have specific terms to include in the cloud (though the frequency is different for each user), and some of the terms are multiple word, or have special characters ("&", "'", " ", etc.) as part of the term. I break the terms with specific html blocks: <pre><span...

Have you ever effectively used lexer/parser in real world application?

Recently, I am started learning Antlr. And knew that lexer/parser together could be used in construction of programming languages. Other than DSL & programming languages, Have you ever directly or in-directly used lexer/parser tools (and knowledge) to solve real world problem? is it possible to solve the same problem by an average progr...

Scanner vs. StringTokenizer vs. String.Split

I just learned about Java's Scanner class and now I'm wondering how it compares/competes with the StringTokenizer and String.Split. I know that the StringTokenizer and String.Split only work on Strings, so why would I want to use the Scanner for a String? Is Scanner just intended to be one-stop-shopping for spliting? ...

How Do I Tokenize This String in Ruby?

I have this string: %{Children^10 Health "sanitation management"^5} And I want to convert it to tokenize this into an array of hashes: [{:keywords=>"children", :boost=>10}, {:keywords=>"health", :boost=>nil}, {:keywords=>"sanitation management", :boost=>5}] I'm aware of StringScanner and the Syntax gem (http://syntax.rubyforge.org/) ...

Smart variadic expansion based on format string

I have a daemon that reads a configuration file in order to know where to write something. In the configuration file, a line like this exists: output = /tmp/foo/%d/%s/output Or, it may look like this: output = /tmp/foo/%s/output/%d ... or simply like this: output = /tmp/foo/%s/output ... or finally: output = /tmp/output I hav...

Trim string to length ignoring HTML.

This problem is a challenging one. Our application allows users to post news on the homepage. That news is input via a rich text editor which allows HTML. On the homepage we want to only display a truncated summary of the news item. For example, here is the full text we are displaying, including HTML In an attempt to make a b...

tokenizing and converting to pig latin

Hi, This looks like homework stuff but please be assured that it isn't homework. Just an exercise in the book we use in our c++ course, I'm trying to read ahead on pointers.. The exercise in the book tells me to split a sentence into tokens and then convert each of them into pig latin then display them.. pig latin here is basically ...

Tokenize the text depending on some specific rules. Algorithm in C++

I am writing a program which will tokenize the input text depending upon some specific rules. I am using C++ for this. Rules Letter 'a' should be converted to token 'V-A' Letter 'p' should be converted to token 'C-PA' Letter 'pp' should be converted to token 'C-PPA' Letter 'u' should be converted to token 'V-U' This is just a sample...

Tokenizing a SIC Assembler source

I've pretty much finished coding a SIC assembler for my systems programming class but I'm stumped on the tokenizing part. For example, take this line of source code: The format (free format) is: {LABEL} OPCODE {OPERAND{,X}} {COMMENT} The curls indicate that the field is optional. Also, each field must be separated by at least one sp...

Tokenizing Error: java.util.regex.PatternSyntaxException, dangling metacharacter '*'

I am using split() to tokenize a String separated with * following this format: name*lastName*ID*school*age % name*lastName*ID*school*age % name*lastName*ID*school*age I'm reading this from a file named "entrada.al" using this code: static void leer() { try { String ruta="entrada.al"; File myFile = new File (ruta...

What is more efficient a switch case or an std::map

I'm thinking about the tokenizer here. Each token calls a different function inside the parser. What is more efficient: A map of std::functions/boost::functions A switch case I thank everyone in advance for their answer. ...