algorithm

How to recognize words in text with non-word tokens?

I am currently parsing a bunch of mails and want to get words and other interesting tokens out of mails (even with spelling errors or combination of characters and letters, like "zebra21" or "customer242"). But how can I know that "0013lCnUieIquYjSuIA" and "anr5Brru2lLngOiEAVk1BTjN" are not words and not relevant? How to extract words a...

Getting a large number (but not all) Wikipedia pages

For a NLP project of mine, I want to download a large number of pages (say, 10000) at random from Wikipedia. Without downloading the entire XML dump, this is what I can think of: Open a Wikipedia page Parse the HTML for links in a Breadth First Search fashion and open each page Recursively open links on the pages obtained in 2 In ste...

Are there any better methods to do permutation of string?

void permute(string elems, int mid, int end) { static int count; if (mid == end) { cout << ++count << " : " << elems << endl; return ; } else { for (int i = mid; i <= end; i++) { swap(elems, mid, i); permute(elems, mid + 1, end); swap(elems, mid, i); } } ...

2D Data Structure

I've given this a lot of thought but haven't really been able to come up with something. Suppose I want a m X n collection of elements sortable by any column and any row in under O(m*n), and also the ability to insert or delete a row in O(m+n) or less... is it possible? What I've come up with is a linked-grid, where the nodes are inser...

Best lightning generation\simulation algorithm?

So I need some algorithm for programing lightning path generation. Which one is fastest and at the same time realistic? ...

Implementing a direct address table

I was given as homework the "Introduction to Algorithms" exercise 11.1-3 which goes as follows: Suggest how to implement a direct-access table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (Insert, Delete and Search) should run in O(1) time. D...

How does twitter's trending topics algorithm decide which words to extract from tweets?

I saw this question, which focuses on the "Brittney Spears" problem. But I have a bit of a different question. How does the algorithm determine which words or phrases need to be ranked? For instance, if I send out a tweet that says "Michael Jackson died", how does it know to pull out "Michael Jackson" but not "died"? Or suppose that ...

What are the differences between O(1) and O(2) in algorithm-analysis?

According to the definition of big O f(n) <= C*g(n)(which means f(n) = O(g(n)), it could be deduced that: f(n) <= C f(n) <= 2C I think there are no big differences between these two. What I could come up with is: f(n) = 1 - 1 / n f(n) = 2 - 1 / n C = 1 But what differs this two complexities,since both are constant complexity? Cou...

What is the best way for "Polling"?

Hi, This question is related with Microcontroller programming but anyone may suggest a good algorithm to handle this situation. I have a one central console and set of remote sensors. The central console has a receiver and the each sensor has a transmitter operates on same frequency. So we can only implement Simplex communication. Si...

What are the faster Paxos-related algorithms for consensus in distributed systems?

I've read Lamport's paper on Paxos. I've also heard that it isn't used much in practice, for reasons of performance. What algorithms are commonly used for consensus in distributed systems? ...

What algorithms could I use to identify content on a web page

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such. ...

Tessellating an arbitrary polygon by tiling triangles

I need to fill an arbitrary polygon using a near-uniform tiling of triangles. How would I do this? You may provide either references to existing algorithms or even simply ideas or hints of your own. The following is presumed: The polygon may be convex (but bonus points if you come up with an algorithm that works for concave shapes) Th...

Binary Search Tree for specific intent

We all know there are plenty of self-balancing binary search trees (BST), being the most famous the Red-Black and the AVL. It might be useful to take a look at AA-trees and scapegoat trees too. I want to do deletions insertions and searches, like any other BST. However, it will be common to delete all values in a given range, or deletin...

How to find the most dense regions in an image

Consider a black and white image like this http://img13.imageshack.us/img13/7401/10416827.jpg What I am trying to do is to find the region where the white points are most dense. In this case there are 20-21 such dense regions. (i.e the clusters of points makes a dense region) Can anyone give me any hint on how this can be achieved ? ...

How to optimize memory usage in this algorithm?

Guys, I'm developing a log parser, and I'm reading files of strings of more than 150MB.- This is my approach, Is there any way to optimize what is in the While statement? The problem is that is consuming a lot of memory.- I also tried with a stringbuilder facing the same memory comsuption.- private void ReadLogInThread() { ...

Efficient Matching Algorithm for Set Based Triplets

I am looking for an efficient way to solve the following problem. List 1 is a list of records that are identified by a primitive triplet: X | Y | Z List 2 is a list of records that are identified by three sets. One Xs, one Ys, one Zs. The X, Y, Zs are of the same 'type' as those in list one so are directly comparable with one another...

Constructive solid geometry mesh

If I construct a shape using constructive solid geometry techniques, how can I construct a wireframe mesh for rendering? I'm aware of algorithms for directly rendering CSG shapes, but I want to convert it into a wireframe mesh just once so that I can render it "normally" To add a little more detail. Given a description of a shape such a...

Creating every possible value of a fixed size array

Hello, I am trying to make some very elementary thing that will cycle through every possible permutation of an array. Really this is being done in assembly, but I'll explain it in C. Basically, say we have an array uint8_t *data=malloc(10); I want to create an algorithm that will print every possible combination of the bytes in the ar...

Grouping items in an array?

Hey guys, if I have an array that looks like [A,B,C,A,B,C,A,C,B] (random order), and I wish to arrange it into [A,A,A,B,B,B,C,C,C] (each group is together), and the only operations allowed are: 1)query the i-th item of the array 2)swap two items in the array. How to design an algorithm that does the job in O(n)? Thanks! ...

How is 1-bit bitmap data converted to 8bit (24bpp)?

Supposing I have 4 bitmaps, say, CMYK, all of which are 1bit/pixel and are of different colors, and I wanted to convert them to an 8bit/color (24bpp) bitmap, how would I do this? if the data looks like this: // the Cyan separation CCCC CCCC CCCC CCCC CCCC CCCC CCCC CCCC ... // the magenta separation, and so on.. MMMM MMMM MMMM MMMM M...