tokenize

NSString tokenize in Objective-C

What is the best way to tokenize/split a NSString in Objective-C? ...

When using XPath to find variable, taking only a piece of that variable

Hi, I am new to XPath. I am writing a code to grab all the 3 digit numbers from a page. They are not constant, varying between 105, 515, and 320. I want two be able to tokenize these numbers into two separate pieces... i would love to be able to grab the first digit in one X-path expression and the second two digits in a second X-...

string slicing, php

is there a way to slice a string lets say i have this variable $output=Country=UNITED STATES (US) &City=Scottsdale, AZ &Latitude=33.686 &Longitude=-111.87 i want to slice it in a way i want to pull latitude and longitude values in to seperate variables, subtok is not serving the purpose ...

tokenize module

Hello, Please help There are many tokens in module tokenize like STRING,BACKQUOTE,AMPEREQUAL etc. >>> import cStringIO >>> import tokenize >>> source = "{'test':'123','hehe':['hooray',0x10]}" >>> src = cStringIO.StringIO(source).readline >>> src = tokenize.generate_tokens(src) >>> src <generator object at 0x00BFBEE0> >>> src.next(...

Python3.0: tokenize & BytesIO

When attempting to tokenize a string in python3.0, why do I get a leading 'utf-8' before the tokens start? From the python3 docs, tokenize should now be used as follows: g = tokenize(BytesIO(s.encode('utf-8')).readline) However, when attempting this at the terminal, the following happens: >>> from tokenize import tokenize >>> from i...

What is more efficient a switch case or an std::map

I'm thinking about the tokenizer here. Each token calls a different function inside the parser. What is more efficient: A map of std::functions/boost::functions A switch case I thank everyone in advance for their answer. ...

Python3.0 - tokenize and untokenize

I am using something similar to the following simplified script to parse snippets of python from a larger file: import io import tokenize src = 'foo="bar"' src = bytes(src.encode()) src = io.BytesIO(src) src = list(tokenize.tokenize(src.readline)) for tok in src: print(tok) src = tokenize.untokenize(src) Although the code is not...

Tokenizing and sorting with XSLT 1.0

I have a delimited string (delimited by spaces in my example below) that I need to tokenize, sort, and then join back together and I need to do all this using XSLT 1.0. How would I do that? I know I need to use xsl:sort somehow, but everything I’ve tried so far has given me some sort of error. For example, if I run the code at the bot...

Parsing string in Ruby

I have a pretty simple string I want to parse in ruby and trying to find the most elegant solution. The string is of format /xyz/mov/exdaf/daeed.mov?arg1=blabla&arg2=3bla3bla What I would like to have is : string1: /xyz/mov/exdaf/daeed.mov string2: arg1=blabla&arg2=3bla3bla so basically tokenise on ? but can't find a good example. ...

What is the best way to tokenize a text file in Java?

What is the best way to tokenize a text file in Java, if I want to work with a java.io.Reader, not a String Delimiters should be returned? I have evaluated the following classes: java.util.StringTokenizer fulfills [2.], but not [1.] java.util.Scanner fulfills [1.], but not [2.] java.io.StreamTokenizer seems quite complicated. I d...

Tokenize a string with a space in java

I want to tokenize a string like this String line = "a=b c='123 456' d=777 e='uij yyy'"; I cannot split based like this String [] words = line.split(" "); Any idea how can I split so that I get tokens like a=b c='123 456' d=777 e='uij yyy'; ...

How to handle a tokenize error with unterminated multiline comments (python 2.6)

The following sample code: import token, tokenize, StringIO def generate_tokens(src): rawstr = StringIO.StringIO(unicode(src)) tokens = tokenize.generate_tokens(rawstr.readline) for i, item in enumerate(tokens): toktype, toktext, (srow,scol), (erow,ecol), line = item print i, token.tok_name[toktype], toktext...

C++ Boost: Split String

How can I split a string with Boost with a regex AND have the delimiter included in the result list? for example, if I have the string "1d2" and my regex is "[a-z]" I want the results in a vector with (1, d, 2) I have: std::string expression = "1d2"; boost::regex re("[a-z]"); boost::sregex_token_iterator i (expression.begin (), ...

Java - Tokenize Parameter List

I'm trying to create a method which takes a String parameter and then returns a two dimensional String array of parameter names and values. protected final String[][] setParams (String parms) { String[][] params; int i = 0; Pattern p = Pattern.compile(NEED_REGEX_HERE); Matcher m = p.matcher(parms); params = String[m...

jQuery + Facebook style status update textbox - Autocomplete links to profiles

I have a social networking website with a status update textbox, much like facebook. However I would also like the user to be able to type the @ symbol while typing a new status which brings up an autocomplete option of friends profiles (again the same as facebook does). When the user selects one it should be included in the status as ...

SWI-Prolog tokenize_atom/2 replacement?

What I need to do is to break atom to tokens. E. g.: tokenize_string('Hello, World!', L). would unify L=['Hello',',','World','!']. Exactly as tokenize_atom/2 do. But when I try to use tokenize_atom/2 with non-latin letters it fails. Is there any universal replacement or how I can write one? Thanks in advance. ...

Tokenizing Twitter Posts in Lucene

Hello, My question in a nutshell: Does anyone know of a TwitterAnalyzer or TwitterTokenizer for Lucene? More detailed version: I want to index a number of tweets in Lucene and keep the terms like @user or #hashtag intact. StandardTokenizer does not work because it discards the punctuation (but it does other useful stuff like keeping d...

How do I get the next token in a Cstring if I want to use it as an int? (c++)

My objective is to take directions from a user and eventually a text file to move a robot. The catch is that I must use Cstrings(such as char word[];) rather than the std::string and tokenize them for use. the code looks like this: void Navigator::manualDrive() { char uinput[1]; char delim[] = " "; char *token; cout ...

help with handling tokenization errors

Hi, The code is in python. Please find below the piece of code that I use to tokenize a string. strList = list(token[STRING] for token in generate_tokens(StringIO(line).readline) if token[STRING]) I get an error that reads like:- raise TokenError, ("EOF in multi-line statement", (lnum, 0)) tokenize.TokenError: ('EOF in multi-li...

using intrusive containers to tokenize a wide c string

I was wondering if I could tokenize my string (lpcwszBuffer) using boost::tokenizer without making a copy of the string. I heard about using intrusive containers to avoid giant memory footprints, but I'm not really sure if it's applicable here. If I was unclear, here's what I mean: size_t Tokenize(const wchar_t* lpcwszBuffer, boost::sc...