tokenizing

Recognizing new line

I've got a fairly complex calculator that prints output when the user inputs ";" (and hits enter.) What I'm trying to do now is allow the user to print output when they hit enter, (without use of semicolon.) I need to know how I can implement this. Side note: The calculator uses tokenization to read user input This is part of the calcu...

shlex alternative for Java

Is there a shlex alternative for Java? I'd like to be able to split quote delimited strings like the shell would process them. For example, if I'd send : one two "three four" and perform a split, I'd like to receive the tokens onetwothree four ...

How to best split csv strings in oracle 9i

I want to be able to split csv strings in Oracle 9i I've read the following article http://www.oappssurd.com/2009/03/string-split-in-oracle.html But I didn't understand how to make this work. Here are some of my questions pertaining to it Would this work in Oracle 9i, if not, why not? Is there a better way of going about splitting c...

vba string tokens

I have around 100 rows of such texts that I want to tokenize: "<word> <unknown number of spaces and tabs> <number>" I am having trouble finding tokenize functions with VBA. What would be the easiest method to token such strings in VBA? Thanks in advance. ...

get list of numbers from stdin and tokenize them

How would I get a list of numbers from the user and then tokenize them. This is what I have but it doesn't get anything except for the first number: #include <iostream> #include <sstream> #include <vector> #include <string> using namespace std; int main() { string line = ""; cin >> line; stringstream lineStream(line); ...

How to find position of nth token

We have a string that has a maximum limit of 20 words. If the user enters something that is more than 20 words, we then need to truncate the string at its 20th word. How can we automate this? We are able to find the 20th token with #GetToken(myString, 20, ' ')#, but are unsure on how to find it's position in order to left-trim. Any ideas...

Using boost::tokenizer with string delimiters

I've been looking boost::tokenizer, and I've found that the documentation is very thin. Is it possible to make it tokenize a string such as "dolphin--monkey--baboon" and make every word a token, as well as every double dash a token? From the examples I've only seen single character delimiters being allowed. Is the library not advanced en...

What is the best way to tokenize a text file in Java?

What is the best way to tokenize a text file in Java, if I want to work with a java.io.Reader, not a String Delimiters should be returned? I have evaluated the following classes: java.util.StringTokenizer fulfills [2.], but not [1.] java.util.Scanner fulfills [1.], but not [2.] java.io.StreamTokenizer seems quite complicated. I d...

Parsing Classes, Functions and Arguments in PHP

I want to create a function which receives a single argument that holds the path to a PHP file and then parses the given file and returns something like this: class NameOfTheClass function Method1($arg1, $arg2, $arg2) private function Method2($arg1, $arg2, $arg2) public function Method2($arg1, $arg2, $arg2) abstract class Anot...

register_printf_function in PHP

I need to let the user specify a custom format for a function which uses vsprintf, and since PHP doesn't have glibc' register_printf_function(), I'll have to do it with PCRE. My question is, what would be the best REGEXP to match % followed by any character and not having % before it, in an usable manner for programmatic use afterwards?...

c++ tokenize a string and include delimiters

I'm tokening with the following, but unsure how to include the delimiters with it. void Tokenize(const string str, vector<string>& tokens, const string& delimiters) { int startpos = 0; int pos = str.find_first_of(delimiters, startpos); string strTemp; while (string::npos != pos || string::npos != startpos) { ...

Does PL/SQL have an equivalent StringTokenizer to Java's?

I use java.util.StringTokenizer for simple parsing of delimited strings in java. I have a need for the same type of mechanism in pl/sql. I could write it, but if it already exists, I would prefer to use that. Anyone know of a pl/sql implementation? Some useful alternative? ...

Fast ESP character normalization

Hi, I'm running a search application on a FAST ESP server. Now I have this problem with character normalization. What I want is to search for 'wurth' and get a hit in 'würth'. i've tried configuring the following in esp/etc/tokenizer/tokenization.xml <normalizationlist name="German to Norwegian"> <normalization description="Germ...

Is there a simple way I can tokenize a string without a full-blown lexer?

I'm looking to implement the Shunting-yard Algorithm, but I need some help figuring out what the best way to split up a string into its tokens is. If you notice, the first step of the algorithm is "read a token." This isn't exactly a non-trivial thing to do. Tokens can consist of numbers, operators and parens. If you are doing some...

Word break in languages without spaces between words (e.g., Asian)?

I'd like to make MySQL full text search work with Japanese and Chinese text, as well as any other language. The problem is that these languages and probably others do not normally have white space between words. Search is not useful when you must type the same sentence as is in the text. I can not just put a space between every charact...

Approaching Text Parsing in Scala

I'm making an application that will parse commands in Scala. An example of a command would be: todo get milk for friday So the plan is to have a pretty smart parser break the line apart and recognize the command part and the fact that there is a reference to time in the string. In general I need to make a tokenizer in Scala. So I'm w...

Dealing with Tokens in C#

I have the following assignment for homework. Requirements design a class called TokenGiver with the following elements: a default constructor, a parametrized constructor that takes an int a method that adds a specified number of tokens to the number of tokens a method that subtracts exactly ONE token from your number of tokens a m...

C tokenize polynomial coefficients

I'm trying to put the coefficients of polynomials from a char array into an int array I have this: char string[] = "-4x^0 + x^1 + 4x^3 - 3x^4"; and can tokenize it by the space into -4x^0 x^1 4x^3 3x^4 So I am trying to get: -4, 1, 4, 3 into an int array int *coefficient; coefficient = new int[counter]; p = strtok(copy, " +"); ...

C++ extract polynomial coefficients

So I have a polynomial that looks like this: -4x^0 + x^1 + 4x^3 - 3x^4 I can tokenize this by space and '+' into: -4x^0, x^1, 4x^3, -, 3x^4 How could I just get the coefficients with the negative sign: -4, 1, 0, 4, -3 x is the only variable that will appear and this will alway appear in order im planning on storing the coefficients in ...

tokenize a string keeping delimiters in Python

Hi, Is there any equivalent to str.split in Python that also returns the delimiters? I need to preserve the whitespace layout for my output after processing some of the tokens. Example: >>> s="\tthis is an example" >>> print s.split() ['this', 'is', 'an', 'example'] >>> print what_I_want(s) ['\t', 'this', ' ', 'is', ' ', 'an', ' '...