I'm generating some statistics for some english-language text and I would like to skip uninteresting words such as "a" and "the".
Where can I find some lists of these uninteresting words?
Is a list of these words the same as a list of the most frequently used words in English?
update: these are apparently called "stop words" and not ...
I am trying to find information (and hopefully c# source code) about trying to create a basic AI tool that can understand english words, grammar and context.
The Idea is to train the AI by using as many written documents as possible and then based on these documents, for the AI to create its own creative writitng in proper english that ...
I have a arbitrarily large string of text from the user that needs to be split into 10k chunks (potentially adjustable value) and sent off to another system for processing.
Chunks cannot be longer than 10k (or other arbitrary value)
Text should be broken with natural language context in mind
split on punctuation when possible
split ...