algorithm

Given a 1TB data set on disk with around 1KB per data record, how can I find duplicates using 512MB RAM and infinite disk space?

There is 1TB data on a disk with around 1KB per data record. How to find duplicates using 512MB RAM and infinite disk space? ...

Typecasting a floating value or using the math.h floor* functions?

Hi, I am coding up an implementation of Interpolation Search in C. The question is actually rather simple, I need to use the floating operations to do linear interpolation to find the correct index which will eventually be an integer result. In particular my probe index is: t = i + floor((((k-low)/(high-low)) * (j-i))); where, i,j,...

Reduce number of points in line

I'm searching for algorithms to reduce the LOD of polylines, lines (looped or not) of nodes. In simple words, I want to take hi-resolution coastline data and be able to reduce its LOD hundred- or thousandfold to render it in small-scale. I found polygon reduction algorithms (but they require triangles) and Laplacian smoothing, but that...

Data structure behind T9 type of dictionary

How does a T9 dictionary work? What is the data structure behind it. If we type '4663' we get 'good' when we press down button we get 'gone' then 'home' etc... EDIT: If the user types in 46 then it should show 'go' and when pressed down arrow should show 'gone' etc... ...

Enumerating large (20-digit) [probable] prime numbers

Given A, on the order of 10^20, I'd like to quickly obtain a list of the first few prime numbers greater than A. OK, my needs aren't quite that exact - it's alright if occasionally a composite number ends up on the list. What's the fastest way to enumerate the (probable) primes greater than A? Is there a quicker way than stepping throu...

Question on multi-probe Local Sensitive Hashing

Hey guys sorry to be asking this kind noob question, but because I really need some guidance on how to use Multi probe LSH pretty urgently, so I did not do much research myself. I realize there is a lib call LSHKIT available that implemented that algorithm, but I have trouble trying to figure out how to use it. Right now, I have a few ...

Efficiently check string for one of several hundred possible suffixes

I need to write a C/C++ function that would quickly check if string ends with one of ~1000 predefined suffixes. Specifically the string is a hostname and I need to check if it belongs to one of several hundred predefined second-level domains. This function will be called a lot so it needs to be written as efficiently as possible. Bitwis...

Tail-recursive pow() algorithm with memoization?

I'm looking for an algorithm to compute pow() that's tail-recursive and uses memoization to speed up repeated calculations. Performance isn't an issue; this is mostly an intellectual exercise - I spent a train ride coming up with all the different pow() implementations I could, but was unable to come up with one that I was happy with t...

Codechef practice question help needed - find trailing zeros in a factorial

I have been working on this for 24 hours now, trying to optimize it. The question is how to find the number of trailing zeroes in factorial of a number in range of 10000000 and 10 million test cases in about 8 secs. The code is as follows: #include<iostream> using namespace std; int count5(int a){ int b=0; for(int i=a;i>0;i=...

List of values as keys for a Map

I have lists of variable length where each item can be one of four unique, that I need to use as keys for another object in a map. Assume that each value can be either 0, 1, 2 or 3 (it's not integer in my real code, but a lot easier to explain this way) so a few examples of key lists could be: [1, 0, 2, 3] [3, 2, 1] [1, 0, 0, 1, 1, 3] [...

Longest substring that appears n times

For a string of length L, I want to find the longest substring that appears n (n<L) or more times in ths string. For example, the longest substring that occurs 2 or more times in "BANANA" is "ANA", once starting from index 1, and once again starting from index 3. The substrings are allowed to overlap. In the string "FFFFFF", the longes...

Weighted Average and Ratings

Maths isn't my strong point and I'm at a loss here. Basically, all I need is a simple formula that will give a weighted rating on a scale of 1 to 5. If there are very few votes, they carry less influence and the rating pressess more towards the average (in this case I want it to be 3, not the average of all other ratings). I've tried a...

Algorithm to generate 1000 distinct integers in the range [0,8000]?

What are some alternative methods to generate 1000 distinct random integers in the range [0,8000] as opposed to the following: naive method: generating a number and checking if it's already in the array. O(n^2) linear shuffle: generate sequence 0 to 8000, shuffle, take the first 1000. O(n) ...

Algorithm for count-down timer that can add on time

I'm making a general timer that has functionality to count up from 0 or count down from a certain number. I also want it to allow the user to add and subtract time. Everything is simple to implement except for the case in which the timer is counting down from some number, and the user adds or subtracts time from it. For example: (m_cloc...

Is Work Stealing always the most appropriate user-level thread scheduling algorithm?

I've been investigating different scheduling algorithms for a thread pool I am implementing. Due to the nature of the problem I am solving I can assume that the tasks being run in parallel are independent and do not spawn any new tasks. The tasks can be of varying sizes. I went immediately for the most popular scheduling algorithm "wo...

The Collatz Sequence problem

I'm trying to solve this problem, its not a homework question, its just code I'm submitting to uva.onlinejudge.org so I can learn better java trough examples. Here is the problem sample input : 3 100 34 100 75 250 27 2147483647 101 304 101 303 -1 -1 Here is simple output : Case 1: A = 3, limit = 100, number of terms = 8 Case...

Good books on numerical computation with C

Hi, I've read the post "What is the best book on numerical methods?" and I wish to ask more or less the same question but in relation to C programming. Most of the time, C programming books on numerical methods are just another version of the author's previous Fortran book on the same subject. I've seen Applied numerical methods in C by ...

C++ design question on traversing binary trees

I have a binary tree T which I would like to copy to another tree. Suppose I have a visit method that gets evaluated at every node: struct visit { virtual void operator() (node* n)=0; }; and I have a visitor algorithm void visitor(node* t, visit& v) { //do a preorder traversal using stack or recursion if (!t) return; v(t); v...

How do I find the median of numbers in linear time using heaps?

Wikipedia says: Selection algorithms: Finding the min, max, both the min and max, median, or even the k-th largest element can be done in linear time using heaps. All it says is that it can be done, and not how. Can you give me some start on how this can be done using heaps? ...

find all combinations of elements for the given number N (sum upto N)

int array [] = {1,2,3,4,5,6,7,8,9} N = 10 1,2,3,4 = 10 2,3,5 = 10 3,7 = 10 1,4,5 = 10 1,3,6 = 10 8,2 = 10 9,1 = 10 4,6 = 10 5,2,3 = 10 ... the function should return all combinations, the array may be unsorted ...