I am currently parsing a bunch of mails and want to get words and other interesting tokens out of mails (even with spelling errors or combination of characters and letters, like "zebra21" or "customer242"). But how can I know that "0013lCnUieIquYjSuIA" and "anr5Brru2lLngOiEAVk1BTjN" are not words and not relevant? How to extract words a...
For a NLP project of mine, I want to download a large number of pages (say, 10000) at random from Wikipedia. Without downloading the entire XML dump, this is what I can think of:
Open a Wikipedia page
Parse the HTML for links in a Breadth First Search fashion and open each page
Recursively open links on the pages obtained in 2
In ste...
void permute(string elems, int mid, int end)
{
static int count;
if (mid == end) {
cout << ++count << " : " << elems << endl;
return ;
}
else {
for (int i = mid; i <= end; i++) {
swap(elems, mid, i);
permute(elems, mid + 1, end);
swap(elems, mid, i);
}
}
...
I've given this a lot of thought but haven't really been able to come up with something.
Suppose I want a m X n collection of elements sortable by any column and any row in under O(m*n), and also the ability to insert or delete a row in O(m+n) or less... is it possible?
What I've come up with is a linked-grid, where the nodes are inser...
So I need some algorithm for programing lightning path generation. Which one is fastest and at the same time realistic?
...
I was given as homework the "Introduction to Algorithms" exercise 11.1-3 which goes as follows:
Suggest how to implement a direct-access table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (Insert, Delete and Search) should run in O(1) time. D...
I saw this question, which focuses on the "Brittney Spears" problem. But I have a bit of a different question. How does the algorithm determine which words or phrases need to be ranked? For instance, if I send out a tweet that says "Michael Jackson died", how does it know to pull out "Michael Jackson" but not "died"?
Or suppose that ...
According to the definition of big O f(n) <= C*g(n)(which means f(n) = O(g(n)), it could be deduced that:
f(n) <= C
f(n) <= 2C
I think there are no big differences between these two. What I could come up with is:
f(n) = 1 - 1 / n
f(n) = 2 - 1 / n
C = 1
But what differs this two complexities,since both are constant complexity?
Cou...
Hi,
This question is related with Microcontroller programming but anyone may suggest a good algorithm to handle this situation.
I have a one central console and set of remote sensors. The central console has a receiver and the each sensor has a transmitter operates on same frequency. So we can only implement Simplex communication.
Si...
I've read Lamport's paper on Paxos. I've also heard that it isn't used much in practice, for reasons of performance. What algorithms are commonly used for consensus in distributed systems?
...
I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.
...
I need to fill an arbitrary polygon using a near-uniform tiling of triangles. How would I do this? You may provide either references to existing algorithms or even simply ideas or hints of your own.
The following is presumed:
The polygon may be convex (but bonus points if you come up with an algorithm that works for concave shapes)
Th...
We all know there are plenty of self-balancing binary search trees (BST), being the most famous the Red-Black and the AVL. It might be useful to take a look at AA-trees and scapegoat trees too.
I want to do deletions insertions and searches, like any other BST. However, it will be common to delete all values in a given range, or deletin...
Consider a black and white image like this http://img13.imageshack.us/img13/7401/10416827.jpg
What I am trying to do is to find the region where the white points are most dense. In this case there are 20-21 such dense regions. (i.e the clusters of points makes a dense region)
Can anyone give me any hint on how this can be achieved ? ...
Guys,
I'm developing a log parser, and I'm reading files of strings of more than 150MB.- This is my approach, Is there any way to optimize what is in the While statement? The problem is that is consuming a lot of memory.- I also tried with a stringbuilder facing the same memory comsuption.-
private void ReadLogInThread()
{
...
I am looking for an efficient way to solve the following problem.
List 1 is a list of records that are identified by a primitive triplet:
X | Y | Z
List 2 is a list of records that are identified by three sets. One Xs, one Ys, one Zs. The X, Y, Zs are of the same 'type' as those in list one so are directly comparable with one another...
If I construct a shape using constructive solid geometry techniques, how can I construct a wireframe mesh for rendering?
I'm aware of algorithms for directly rendering CSG shapes, but I want to convert it into a wireframe mesh just once so that I can render it "normally"
To add a little more detail. Given a description of a shape such a...
Hello, I am trying to make some very elementary thing that will cycle through every possible permutation of an array.
Really this is being done in assembly, but I'll explain it in C.
Basically, say we have an array uint8_t *data=malloc(10);
I want to create an algorithm that will print every possible combination of the bytes in the ar...
Hey guys, if I have an array that looks like [A,B,C,A,B,C,A,C,B] (random order), and I wish to arrange it into [A,A,A,B,B,B,C,C,C] (each group is together), and the only operations allowed are:
1)query the i-th item of the array
2)swap two items in the array.
How to design an algorithm that does the job in O(n)?
Thanks!
...
Supposing I have 4 bitmaps, say, CMYK, all of which are 1bit/pixel and are of different colors, and I wanted to convert them to an 8bit/color (24bpp) bitmap, how would I do this?
if the data looks like this:
// the Cyan separation
CCCC CCCC CCCC CCCC CCCC CCCC CCCC CCCC
...
// the magenta separation, and so on..
MMMM MMMM MMMM MMMM M...