bloom-filter

Modern, high performance bloom filter in Python?

I'm looking for a production quality bloom filter implementation in Python to handle fairly large numbers of items (say 100M to 1B items with 0.01% false positive rate). Pybloom is one option but it seems to be showing its age as it throws DeprecationWarning errors on Python 2.5 on a regular basis. Joe Gregorio also has an implementat...

Opposite of Bloom filter?

Hi, I'm trying to optimize a piece of software which is basically running millions of tests. These tests are generated in such a way that there can be some repetitions. Of course, I don't want to spend time running tests which I already ran if I can avoid it efficiently. So, I'm thinking about using a Bloom filter to store the tests wh...

compact data structure like set

i am looking for a specific data structure, but i forgot its name. if i knew the name it would be trivial, i would just look it up in wikipedia :) basically, it is like a set - except you cannot iterate it. you put some values in it, lets say 80k zip codes. then you can test if a given string is definately NOT a zip code, but you will...

A few Q's about Bloom Filter implementation

Hello, I recently discovered a site that set certain code kata. One of the Kata caught my eye and set me looking into Bloom filters. I'm using PHP and MySql. I have a table with roughly 45,000 words to act as a dictionary and i've written the code to create a bloom filter array. My questions are... At what point should the code run...

Needed an efficient way for search for the following specfic requirement

I have to search a given file name (let say Keyword) in a directory containing files. If there were only few keywords to be searched, I could have used regular search (like creating an array of file names residing in the specified directory and then search each file name with the given keyword). Since I need to search very large number...

Need memory efficient way to store tons of strings (was: HAT-Trie implementation in java)

I am working with a large set (5-20 million) of String keys (average length 10 chars) which I need to store in an in memory data structure that supports the following operation in constant time or near constant time: // Returns true if the input is present in the container, false otherwise public boolean contains(String input) Java's ...

How to implement a Bloom Filter in PHP?

Is there an already cooked php solution? ...

Using hash functions with Bloom filters

Hi, A bloom filter uses a hash function (or many) to generate a value between 0 and m given an input string X. My question is how to you use a hash function to generate a value in this way, for example an MD5 hash is typically represented by a 32 length hex string, how would I use an MD5 hashing algorithm to generate a value between 0 a...

What problems have you solved using bloom filters?

I'd like to know about specific problems you - the SO reader - have solved using bloom filters and what libraries/frameworks you used if you didn't roll your own. Questions: What problems have you used bloom filters to solve? What libraries/frameworks did you use? I'm looking for first-hand experiences, so please do not answer unles...

If I have an array of keys M, and an array of targets N, how can I verify that M[i] exists in N before searching it?

Like the title says, I'm trying to find elements of M that exist in the large constant array N. Most of the time, no element of M will exist in N, so the vast majority of searches done on M are a waste of time. I'm looking for some way to create an index to check before doing a full-scale search of M. A project similar to mine creates a...