english

Word frequency algorithm for natural language processing

Without getting a degree in information retrieval, I'd like to know if there exists any algorithms for counting the frequency that words occur in a given body of text. The goal is to get a "general feel" of what people are saying over a set of textual comments. Along the lines of Wordle. What I'd like: ignore articles, pronouns, etc...

NLP: Building (small) corpora, or "Where to get lots of not-too-specialized English-language text files?"

Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a working prototype, and would like to incorporate more contemporary language. A recent answer here pointed indirectly to a great archive of usenet movie reviews, whic...

Please recommend an open source project with quality comments in good english

English is not my mother tongue. However, I have to write comments in english. I want to improve my "comment english" by studying a piece of code which is commented in a good english. Please recommend an open source project which contains a lot of meaningful comments written by people with an excellent command of the language. ...

English Error Messages in German Visual Studio 2008 / ASP.NET

This might be a bit weird question, but I'll give it a shot: HELP, my Visual Studio 2008 / ASP.NET is giving me GERMAN error messages. Besides the fact that translations tend to be not as good as the original text, I can't search for those and find relevant answers to my problems on the internet. So: How do I switch my German Visual St...

Is there open source software available that analyses a string and guesses the gender of the author?

I can't find anything other than closed-source web applications. Are there any active projects? I'd be interested in using the software in something I'm developing and getting involved. ...

What are all of the allowable characters for people's names?

There are the standard A-Z, a-z characters, but also there are hyphens, em dashes, quotes, etc. Plus, there are all of the international characters, like umlauts, etc. So, for an English-based system, what's the complete set? What about sets for other languages? What about UTF8, UTF16, etc? Bonus question: How many name fields are nee...

Is there a dictionary or database of English words with each word separated by syllables?

I am looking for an existing database of English words with each word separated by syllables. My purpose is to further edit each word in any selected article based on the separation of syllables. Does anyone know an existing product or method that can help me achieve this process? Thanks! ...

Misuse of English in the computer literature.....

So recently in the Rails literature the non-word (please, no down grades, I know non-word is a non-word but I'm not publishing this stuff and I don't claim to be more intelligent than those who write books :) P "dasherize" has become somewhat of a de-facto term as in: "to_xml will default to dasherizing the field names" Now in every ot...

Unexplainable crash in DirectX app in Windows XP that uses english language

The app was working fine but now a few weeks later when the new version begun testing, it crashes. Tried it on five of the workstations, it crashes only on two of them. And the only common about them I can find is that those two have Windows installed with english language. Its a DirectX 8.1 application, written in C++ with Visual Studi...

Windows XP and 7 adding English US, even if I have different (nonsandard) English layout already.

There is another problem. It happens with XP and Windows 7 (interestingly, Vista didn't show this behavior). I have 3 keyboard layouts. English, Russian and Hebrew. The English is a customized layout, similar to the US-International but AltGr has many different languages' dead-keys to create any possible diacritics or Old English letter...

looking for a phrase that expresses convoluted code

We have all seen this type of code. A calls B, which calls back A, which delegates to C which does a few difficult to understand tests and depending on results calls a single method of D with different parameters, which has a big switch block all of which do essentially similar but slightly different things. There is most likely some pol...

Programming information in your mother tongue

Whenever I need information on a programming topic I tend to search Google directly in English. I don't even bother trying to search in Spanish which is my mother tongue, I know that probably I won't find anything interesting. Do you ever try to search for programming help in your mother tongue? Do you always find your responses in tha...

Why can't I set $LIST_SEPARATOR in Perl?

I want to set the LIST_SEPARATOR in perl, but all I get is this warning: Name "main::LIST_SEPARATOR" used only once: possible typo at ldapflip.pl line 7. Here is my program: #!/usr/bin/perl -w @vals; push @vals, "a"; push @vals, "b"; $LIST_SEPARATOR='|'; print "@vals\n"; I am sure I am missing something obvious, but I don't see ...

How do I do word Stemming or Lemmatization?

I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: "cats running ran cactus cactuses community communities", and both get less than half right. Ideally the class/function would be in PHP, but I can port it if it's in another language. See also: Stemming algorith...

Can I get an English dictionary word list somewhere?

I'm creating this cool answer engine that answers "who" questions. I will share the URL soon. However, I need a list of English words for that, so find "proper nouns". Can I get an English dictionary dump or just a list of all English words, preferably British and American. Any help will be amazingly helpful! ...

Plural form of word "mutex"

What is the correct plural form of the portmanteau mutex. Is it mutexes or mutices? ...

Proper language to use in form field labels: A linguistic question

I wish to use the following sentence as the comment on a form field. I have already come up with a short-form label for the field. This text is meant to explain the field in a bit more detail: The country [where] you come from. The question is: is this "where" needed there, can be used there (optional) or cannot be used there (error). ...

What is the difference between re-engineering and reverse engineering?

What is the difference between re-engineering and reverse engineering? Simple example is much appreciated if provided. ...

Checking passwords against word database on server or use a web service?

If I want to check passwords in my application for the inclusion of English words, should I store a database of English words locally (is there a free database?) or is there a (free) web service I can use to check them remotely? Ideally I would check the words using an Ajax call but I don't want to pass the entire English dictionary by...

Cultural issues in programming languages

I'm wondering if productivity can be correlated to whether a programmer is a native English speaker or not. I work in Japan and I can tell you that Japanese programmers struggle in getting the English part of a language (reserved keywords, function names, tutorials etc), it's just not natural for them and their thinking process is slow d...