Without getting a degree in information retrieval, I'd like to know if there exists any algorithms for counting the frequency that words occur in a given body of text. The goal is to get a "general feel" of what people are saying over a set of textual comments. Along the lines of Wordle.
What I'd like:
ignore articles, pronouns, etc...
Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a working prototype, and would like to incorporate more contemporary language. A recent answer here pointed indirectly to a great archive of usenet movie reviews, whic...
English is not my mother tongue. However, I have to write comments in english. I want to improve my "comment english" by studying a piece of code which is commented in a good english. Please recommend an open source project which contains a lot of meaningful comments written by people with an excellent command of the language.
...
This might be a bit weird question, but I'll give it a shot:
HELP, my Visual Studio 2008 / ASP.NET is giving me GERMAN error messages. Besides the fact that translations tend to be not as good as the original text, I can't search for those and find relevant answers to my problems on the internet.
So: How do I switch my German Visual St...
I can't find anything other than closed-source web applications. Are there any active projects? I'd be interested in using the software in something I'm developing and getting involved.
...
There are the standard A-Z, a-z characters, but also there are hyphens, em dashes, quotes, etc.
Plus, there are all of the international characters, like umlauts, etc.
So, for an English-based system, what's the complete set? What about sets for other languages? What about UTF8, UTF16, etc?
Bonus question: How many name fields are nee...
I am looking for an existing database of English words with each word separated by syllables. My purpose is to further edit each word in any selected article based on the separation of syllables.
Does anyone know an existing product or method that can help me achieve this process?
Thanks!
...
So recently in the Rails literature the non-word (please, no down grades, I know non-word is a non-word but I'm not publishing this stuff and I don't claim to be more intelligent than those who write books :) P "dasherize" has become somewhat of a de-facto term as in:
"to_xml will default to dasherizing the field names"
Now in every ot...
The app was working fine but now a few weeks later when the new version begun testing, it crashes. Tried it on five of the workstations, it crashes only on two of them. And the only common about them I can find is that those two have Windows installed with english language.
Its a DirectX 8.1 application, written in C++ with Visual Studi...
There is another problem. It happens with XP and Windows 7 (interestingly, Vista didn't show this behavior).
I have 3 keyboard layouts. English, Russian and Hebrew. The English is a customized layout, similar to the US-International but AltGr has many different languages' dead-keys to create any possible diacritics or Old English letter...
We have all seen this type of code. A calls B, which calls back A, which delegates to C which does a few difficult to understand tests and depending on results calls a single method of D with different parameters, which has a big switch block all of which do essentially similar but slightly different things. There is most likely some pol...
Whenever I need information on a programming topic I tend to search Google directly in English. I don't even bother trying to search in Spanish which is my mother tongue, I know that probably I won't find anything interesting.
Do you ever try to search for
programming help in your mother
tongue?
Do you always find your
responses in tha...
I want to set the LIST_SEPARATOR in perl, but all I get is this warning:
Name "main::LIST_SEPARATOR" used only once: possible typo at ldapflip.pl line 7.
Here is my program:
#!/usr/bin/perl -w
@vals;
push @vals, "a";
push @vals, "b";
$LIST_SEPARATOR='|';
print "@vals\n";
I am sure I am missing something obvious, but I don't see ...
I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones.
My test words are: "cats running ran cactus cactuses community communities", and both get less than half right.
Ideally the class/function would be in PHP, but I can port it if it's in another language.
See also:
Stemming algorith...
I'm creating this cool answer engine that answers "who" questions. I will share the URL soon.
However, I need a list of English words for that, so find "proper nouns". Can I get an English dictionary dump or just a list of all English words, preferably British and American.
Any help will be amazingly helpful!
...
What is the correct plural form of the portmanteau mutex. Is it mutexes or mutices?
...
I wish to use the following sentence as the comment on a form field. I have already come up with a short-form label for the field. This text is meant to explain the field in a bit more detail:
The country [where] you come from.
The question is: is this "where" needed there, can be used there (optional) or cannot be used there (error).
...
What is the difference between re-engineering and reverse engineering?
Simple example is much appreciated if provided.
...
If I want to check passwords in my application for the inclusion of English words, should I store a database of English words locally (is there a free database?) or is there a (free) web service I can use to check them remotely?
Ideally I would check the words using an Ajax call but I don't want to pass the entire English dictionary by...
I'm wondering if productivity can be correlated to whether a programmer is a native English speaker or not. I work in Japan and I can tell you that Japanese programmers struggle in getting the English part of a language (reserved keywords, function names, tutorials etc), it's just not natural for them and their thinking process is slow d...