tokenizing

Rails plugin for generating unique links?

There are many places in my application where I need to generate links with unique tokens (foo.com/g6Ce7sDygw or whatever). Each link may be associated with some session data and would take the user to some specific controller/action. Does anyone know of a gem/plugin that does this? It's easy enough to implement, but would be cleaner ...

Tokenizing Twitter Posts in Lucene

Hello, My question in a nutshell: Does anyone know of a TwitterAnalyzer or TwitterTokenizer for Lucene? More detailed version: I want to index a number of tweets in Lucene and keep the terms like @user or #hashtag intact. StandardTokenizer does not work because it discards the punctuation (but it does other useful stuff like keeping d...

valgrind complains doing a very simple strtok in c

Hi I'm trying to tokenize a string by loading an entire file into a char[] using fread. For some strange reason it is not always working, and valgrind complains in this very small sample program. Given an input like test.txt first second And the following program #include <stdio.h> #include <string.h> #include <stdlib.h> #include <s...

Ruby regex match specific string with special conditions

I'm currently trying to parse a document into tokens with the help of regex. Currently I'm trying to match the keywords in the document. For example I have the following document: Func test() Return blablaFuncblabla EndFunc The keywords that needs to be matched is Func, Return and EndFunc. I've comed up with the following regex: (...

How to read values from file. tokenizer

I have a file in which each line contains two numbers. The problem is that the two number are separated by a space, but the space can be any number of blank spaces. either one, two, or more. I want to read the line and store each of the numbers in a variable, but I'm not sure how to tokenize it. i.e 1 5 3 2 5 6 3 4 83 54 23 ...

ANTLR lexer mismatches tokens

I have a simple ANTLR grammar, which I have stripped down to its bare essentials to demonstrate this problem I'm having. I am using ANTLRworks 1.3.1. grammar sample; assignment : IDENT ':=' NUM ';' ; IDENT : ('a'..'z')+ ; NUM : ('0'..'9')+ ; WS : (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ; Obviously, thi...

SQL query to translate a list of numbers matched against several ranges, to a list of values

I need to convert a list of numbers that fall within certain ranges into a list of values, ordered by a priority column. The table has the following values: | YEAR | R_MIN | R_MAX | VAL | PRIO | ------------------------------------ 2010 18000 90100 52 6 2010 240000 240099 82 3 2010 250000 259999 50 5 2...

How to get a Token from a Lucene TokenStream?

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream. The worst part is that I'm looking at the comments in the JavaDocs that address my question. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29 Somehow, an Attr...

GNU Flex, multiline rule

Hi there i have a flex rule inside my lexer definition : operators "[]"|"[]="|"[]<"|".."|"."|".="|"+"|"+="|"-"|"-="|"/"|"/="|"*"|"*="|"%"|"%="|"++"|"--"|"^"|"^="|"~"|"&"|"&="|"|"|"|="|"<<"|"<<="|">>"|"!"|"<"|">"|">="|"<="|"=="|"!="|"&&"|"||"|"~=" Is there any way to split this ruole on more lines to keep it clearer? I tried with \ ju...

parsing a string of ascii text into separate variables

Hi there, I have a piece of text that gets handed to me like: here is line one\n\nhere is line two\n\nhere is line three What I would like to do is break this string up into three separate variables. I'm not quite sure how one would go about accomplishing this in python. Thanks for any help, jml ...

C++ String tokenisation from 3D .obj files

I'm pretty new to C++ and was looking for a good way to pull the data out of this line. A sample line that I might need to tokenise is f 11/65/11 16/70/16 17/69/17 I have a tokenisation method that splits strings into a vector as delimited by a string which may be useful static void Tokenise(const string& str, vector<string>& tokens,...

array or list into Oracle using cfprocparam

I have a list of values I want to insert into a table via a stored procedure. I figured I would pass an array to oracle and loop through the array but I don't see how to pass an array into Oracle. I'd pass a list but I don't see how to work with the list to turn it into an array using PL/SQL (I'm fairly new to PL/SQL). Am I approaching ...

How can fill a variable of my own created data type within Oracle PL/SQL?

In Oracle I've created a data type: TABLE of VARCHAR2(200) I want to have a variable of this type within a Stored Procedure (defined locally, not as an actual table in the DB) and fill it with data. Some online samples show how I'd use my type if it was filled and passed as a parameter to the stored procedure: SELECT column_value cu...

Parsing pipe delimited string into columns?

Hello, I have a column with pipe separated values such as: '23|12.1| 450|30|9|78|82.5|92.1|120|185|52|11' I want to parse this column to fill a table with 12 corresponding columns: month1, month2, month3...month12. So month1 will have the value 23, month2 the value 12.1 etc... Is there a way to parse it by a loop or delimeter instea...

Problem parsing a list in batch

I am trying to extract tokens from a list of strings using a batch script, but for some reason it ignores my string if it contains an asterisk. An example to illustrate this problem is as follows: @echo off set mylist="test1a,test1b" set mylist="test2a,test2b*" %mylist% set mylist="test3a,test3b" %mylist% echo %mylist% for %%a in ( ...

JavaCC: How can one exclude a string from a token? (A.k.a. understanding token ambiguity.)

Hello, everyone! I had already many problems with understanding, how ambiguous tokens can be handled elegantly (or somehow at all) in JavaCC. Let's take this example: I want to parse XML processing instruction. The format is: "<?" <target> <data> "?>": target is an XML name, data can be anything except ?>, because it's the closing tag...

Tokenize problem in Java with separator ". "

I need to split a text using the separator ". ". For example I want this string : Washington is the U.S Capital. Barack is living there. To be cut into two parts: Washington is the U.S Capital. Barack is living there. Here is my code : // Initialize the tokenizer StringTokenizer tokenizer = new StringTokenizer("Washington is the ...

Tokenizing numbers for a parser

I am writing my first parser and have a few questions conerning the tokenizer. Basically, my tokenizer exposes a nextToken() function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types: SYMBOL (such as <, :=, ( and the like WHITESPAC...

Good way to deal with comma seperated values in oracle

I am getting passed comma seperated values to a stored procedure in oracle. I want to treat these values as a table so that I can use them in a query like: select * from tabl_a where column_b in (<csv values passed in>) What is the best way to do this in 11g? Right now we are looping through these one by one and inserting them into ...

comma separated values in oracle function body

I've got following oracle function but it does not work and errors out. I used Ask Tom's way to convert comma separated values to be used in select * from table1 where col1 in <> declared in package header: TYPE myTableType IS table of varchar2 (255); Part of package body: l_string long default iv_value_with_comma_separated|...