linguistics

Is "performant" a valid word? What's the alternative?

Given that performant isn't officially a real word, what is an appropriate alternative term for expressing "something that performs well"? So, instead of saying something like, This iteration of the SQL query is particularly performant. What would you say instead? Or how about: We're going to go with the most performant, yet u...

Your favorite natural language parser?

This is just a poll on what parser you like to use for parsing sentences of natural language syntactically. I am interested in complete software toolkits/solutions. A good answer would list at least some of the following: The name of the parser (obviously) and a link to its webpage. The (programming!) language(s) it's written in. The (...

How do I determine if a random string sounds like English?

I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD. EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted. ...

Theory: "Lexical Encoding"

I am using the term "Lexical Encoding" for my lack of a better one. A Word is arguably the fundamental unit of communication as opposed to a Letter. Unicode tries to assign a numeric value to each Letter of all known Alphabets. What is a Letter to one language, is a Glyph to another. Unicode 5.1 assigns more than 100,000 values to th...

About "AUTOMATIC TEXT SUMMARIZER (lingustic based)"

Hello, I am having "AUTOMATIC TEXT SUMMARIZER (linguistic approach)" as my final year project. I have collected enough research papers and gone through them. Still i am not very clear about the 'how-to-go-for-it' thing. Basically i found "AUTOMATIC TEXT SUMMARIZER (statistical based)" and found that it is much easier compared to my...

Misuse of English in the computer literature.....

So recently in the Rails literature the non-word (please, no down grades, I know non-word is a non-word but I'm not publishing this stuff and I don't claim to be more intelligent than those who write books :) P "dasherize" has become somewhat of a de-facto term as in: "to_xml will default to dasherizing the field names" Now in every ot...

Proper language to use in form field labels: A linguistic question

I wish to use the following sentence as the comment on a form field. I have already come up with a short-form label for the field. This text is meant to explain the field in a bit more detail: The country [where] you come from. The question is: is this "where" needed there, can be used there (optional) or cannot be used there (error). ...

Best practices for seaching for alternate forms of a word with Lucene

I have a site which is searchable using Lucene. I've noticed from logs that users sometimes don't find what they're looking for because they enter a singular term, but only the plural version of that term is used on the site. I would like the search to find uses of other forms of a word as well. This is a problem that I'm sure has bee...

Cheap tools for splitting german compound words

Do you now any tool or library to split german compound words like "Hochhaus" into single words ("Hoch", "Haus"). It would be great if it's open source and/or cheap. ...

LSA - Latent Semantic Analysis - How to code it in PHP?

Hello! I would like to implement Latent Semantic Analysis (LSA) in PHP in order to find out topics/tags for texts. Here is what I think I have to do. Is this correct? How can I code it in PHP? How do I determine which words to chose? I don't want to use any external libraries. I've already an implementation for the Singular Value Deco...

identify tense in php

Hi, I'm looking for a way to analyze a string of text and find out in which tense it was written, for example : "I'm going to the store" == current, "I bought a car" == past ect.. Any tips on how I could this done? ...

Which word stemmer should I use in nltk?

My goal is to analyze some corpus (twitter for the now) for emotional content. Just today I realized it would make a bit of sense to search for word stems as opposed to having an exhaustive list of emotional word stems. And so I've been exploring nltk.stem only to realize that there are 4 different stemmers. I'd like to ask the stackover...

How can I correctly prefix a word with "a" and "an"?

I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that? Before you think the answer is to simply check if the first letter is a vowel, consider phrases like: an honest mistake a used car ...

Translating human languages in Python

Is there a Python module for the translation of texts from one human language to another? I'm planning to work with texts that are to be pre and post processed with Python scripts. What other Python-integrated approaches can be used? ...

Arabic taggged Corpora

Hello, please does any one know a free Arabic tagged corpora because i am working on grammar and i need one.Thanks very much. Hani Almousli..... ...

Assistance with Find and Replace Regex

I have a text file, and each line is of the form: TAB WORD TAB PoS TAB FREQ# Word PoS Freq the Det 61847 of Prep 29391 and Conj 26817 a Det 21626 in Prep 18214 to Inf 16284 it Pron 10875 is Verb 9982 to Prep 9343 was Verb 9236 I Pron 8875 for Prep 8412 that Conj 7308 you Pron 6954 Would one of you regex wizards kindly assist me in is...

Searching for Database of Entity Names (colleges, cities, personalities, countries...)

For an enterprise application research project me and another person are working on, we are looking to remove certain content from the page to keep the posted messages universal(meaning not offensive and essentially anonymous). Right now we want to take a message that a user has posted to a message board, and remove any type of name, nam...

Is there software that outputs speech-to-text at the Phonological level?

Is there any software out there capable of taking audio files and outputting phonological (IPA) text? I understand much of the software out there takes it straight to a language, but is there one that is 'teachable'? ...

Interesting linguistics/nlp problems/projects

As I know, looking for a problem to solve (debugging, thinking up a theme for an article, whatever) is the most creative, interesting and difficult part of any problem-solving work. Or just the most difficult. But I have no idea what's going on in programming-related linguistics. I love languages and simple-for-babies-but-neither-unders...

Is there a fairly simple way for a script to tell (from context) whether "her" is a possessive pronoun?

I am writing a script to reverse all genders in a piece of text, so all gendered words are swapped - "man" is swapped with "woman", "she" is swapped with "he", etc. But there is an ambiguity as to whether "her" should be replaced with "him" or "his". ...