tokenizing

How to parse an Xml-like tag with regular expression

I need to tokenize following tag: {TagName attrib1=”value1” attrib2=”value 3”}. I would like to write regex to do it, but the trouble is that attribute value can contain space, so I can’t just split with space. ...

Convert comma separated string to array in PL/SQL

How do I convert a comma separated string to a array? I have the input '1,2,3' , and I need to convert it into an array. ...

Tokenizing source code in Java

For a systems software development course, I'm working on a complete assembler for an instructor-invented assembly language. Currently I'm working on the tokenizer. While doing some searching, I've come across the Java StringTokenizer class...but I see that it has been essentially deprecated. It seems far easier to use, however, than the...

extracting last 2 words from a sequence of strings, space-separated

I have any sequence (or sentence) and i want to extract the last 2 strings. For example, sdfsdfds sdfs dfsd fgsd 3 dsfds should produce: 3 dsfds sdfsd (dfgdg)gfdg fg 6 gg should produce: 6 gg ...

Get unmatched value in jQuery Plugin: Tokenizing Autocomplete Text Entry

I'm using jQuery Plugin: Tokenizing Autocomplete Text Entry. I only allow one token. What I need to do is, whenever there is an un-matching value to the all of the list and user hits enter, I want to grab that enter event and add that value to master table in the database and also add that value in the list Please let me know I'm not cl...

Ignore parentheses with string tokenizer?

I have an input that looks like: (0 0 0) I would like to ignore the parenthesis and only add the numbers, in this case 0, to an arraylist. I am using scanner to read from a file and this is what I have so far transitionInput = data.nextLine(); st = new StringTokenizer(transitionInput,"()", true); while (st.hasMoreTokens()) ...

Splitting comma separated string in a PL/SQL stored proc

Hi, I've CSV string 100.01,200.02,300.03 which I need to pass to a PL/SQL stored procedure in Oracle. Inside the proc,I need to insert these values in a Number column in the table. For this, I got a working approach from over here: http://stackoverflow.com/questions/1089508/how-to-best-split-csv-strings-in-oracle-9i [2) Using SQL's c...

What is proper Tokenization algorithm? & Error: TypeError: coercing to Unicode: need string or buffer, list found

Hello, I'm doing an Information Retrieval Task. As part of pre-processing I want to doing. Stopword removal Tokenization Stemming (Porter Stemmer) Initially, I skipped tokenization. As a result I got terms like this: broker broker' broker, broker. broker/deal broker/dealer' broker/dealer, broker/dealer. broker/dealer; broker/deale...