words

Tokenizer, Stop Word Removal, Stemming in Java

Hi there I am looking for a class or method that takes a long string of many 100s of words and tokenizes, removes the stop words and stems for use in an IR system. For example: "The big fat cat, said 'your funniest guy i know' to the kangaroo..." the tokenizer would remove the punctuation and return an arrayList of words the stop wo...

PHP - Which word is properly typed?

Hey, I'm looking for help on writing a script to check a list of phrases/words and compare them to one another and see which one is the properly typed phrase/word. $arr1 = array('fbook', 'yahoo msngr', 'text me later', 'how r u'); $arr2 = array('facebook', 'yahoo messenger', 'txt me l8r', 'how are you'); So, in comparing each index...

How to convert numbers to words in Erlang?

I found this interesting question about converting numbers into "words": http://stackoverflow.com/questions/309884/code-golf-number-to-words I would really like to see how you would implement this efficiently in Erlang. ...

Does WordNet have "levels"? (NLP)

For example... Chicken is an animal. Burrito is a food. WordNet allows you to do "is-a"...the hiearchy feature. However, how do I know when to stop travelling up the tree? I want a LEVEL. That is consistent. For example, if presented with a bunch of words, I want wordNet to categorize all of them, but at a certain level, so it doesn'...

Limiting characters inside HTML paragraph

I want to make it so there's only 350 characters inside the paragraph, regardless of how many characters are put into it, I only want 350 displayed. How can I do this? The text is just in a div tag in text. Cheers ...

Long words breaking layout. What about with HTML input in UTF-8?

Imagine if I have in a text something like [a href="this-is-a-very-big-link"]this is ok[/a] (switch < and > with [ and ])... And also this-is-a-very-big-word. I need to cut the second case in two lines... Notice wordwrap kills the link so it is not useful for solving this sort of problem. Any idea? ...

Regex with exception of particular words

Hi everyone, I have problem with regex. I need to make regex with an exception of a set of specified words, for example: apple, orange, juice. and given these words, it will match everything except those words above. apple (should not match) applejuice (match) yummyjuice (match) yummy-apple-juice (match) orangeapplejuice (match) orang...

Word lists for a lot of articles - document-term matrix

I have nearly 150k articles in Turkish. I will use articles for natural language processing research. I want to store words and frequency of them per article after processing articles. I'm storing them in RDBS now. I have 3 tables: Articles -> article_id,text Words -> word_id, type, word Words-Article -> id, word_id, article_id, frequ...

Split a large string into multiple substrings containing 'n' number of words via python

Source text: United States Declaration of Independence How can one split the above source text into a number of sub-strings, containing an 'n' number of words? I use split(' ') to extract each word, however I do not know how to do this with multiple words in one operation. I could run through the list of words that I have, and create...

SIFR - LINK IN A LIST

I have a problem using sIFR for links in a list. When I try to apply it to the my links are perfectly skinned (with the right font and right color) but the last word of each link is cut or display on an other line. When I apply it to the the text is well displayed but appears as a hypertextlink (blue and underline). I have tryed an...

Java- how to parse for words in a string for a specific word

How would I parse for the word "hi" in the sentence "hi, how are you?" or in parse for the word "how" in "how are you?"? example of what I want in code: String word = "hi"; String word2 = "how"; Scanner scan = new Scanner(System.in).useDelimiter("\n"); String s = scan.nextLine(); if(s.equals(word)) { System.out.println("Hey"); } if(s....

C++ Need to compare one string to 200.000 words...

Hi In my program in C++ ... User types in program string "foo". I need to compare this string to my strings, in txt files to write: this string is noun! (or adjective...) I got few TXT files - one file with nouns, 2-nd file with adjectives... but in each file is about 200.000 words. How I can effectively compare this string "foo" wi...

PHP Stop Word List

I'm playing about with a stop words within my code I have an array full of words that I'd like to check, and an array of words I want to check against. At the moment I'm looping through the array one at at a time and removing the word if its in_array vs the stop word list but I wonder if there's a better way of doing it, I've looked at ...

Convert number into words using flex.

Hi I am trying to convert an entry using a numeric stepper in flex into words to display in a textarea. i.e a user uses the stepper to enter "89" as a value and in the text area the words "Eighty nine" are displayed. After much searching i haven't found anything that helps - a few javascript functions but that is all. any help sampl...

Get n Number of words using regex in Java

I have a section of a book, complete with punctuation, line breaks etc. and I want to be able to extract the first n words from the text, and divide that into 5 parts. Regex mystifies me. This is what I am trying. I creates an array of index size 0, with all the input text: public static String getNumberWords2(String s, int nWords){ ...

Split large text string into variable length strings without breaking words and keeping linebreaks and spaces

I am trying to break a large string of text into several smaller strings of text and define each smaller text strings max length to be different. for example: "The quick brown fox jumped over the red fence. The blue dog dug under the fence." I would like to have code that can split this into smaller lines and have the first li...

Which would be better? Storing/access data in a local text file, or in a database?

Basically, I'm still working on a puzzle-related website (micro-site really), and I'm making a tool that lets you input a word pattern (e.g. "r??n") and get all the matching words (in this case: rain, rein, ruin, etc.). Should I store the words in local text files (such as words5.txt, which would have a return-delimited list of 5-letter ...

Match and replace whole words in javascript

I have a textarea.I want when I write ,for example "want", to replace it with "two". How to match and replace whole words in javascript? ...

Cassandra full text search like

Let's say I have a column family named Questions like below: Questions = { Who are you: { username: "user1" }, What is the answer: { username: "user1" }... } How do I search for all the questions that contain certain words? Get all questions that contain 'what' word. How do I do it using python or at le...

Php Match words code not working on php 5.1.6

Hello, i have piece of code that works fine on my local test server but on live server for some reason it does not. Php version on live server is 5.1.6. $subject = 'random words to check'; $terms = explode(' ', 'word1 word2 check'); $wordIndex = array_flip(preg_split('/\P{L}+/u', mb_strtolower($subject), -1, PREG_SPLIT_NO_EMPTY)); fo...