I'm looking for a production quality bloom filter implementation in Python to handle fairly large numbers of items (say 100M to 1B items with 0.01% false positive rate).
Pybloom is one option but it seems to be showing its age as it throws DeprecationWarning errors on Python 2.5 on a regular basis. Joe Gregorio also has an implementat...
Hi,
I'm trying to optimize a piece of software which is basically running millions of tests. These tests are generated in such a way that there can be some repetitions. Of course, I don't want to spend time running tests which I already ran if I can avoid it efficiently.
So, I'm thinking about using a Bloom filter to store the tests wh...
i am looking for a specific data structure, but i forgot its name. if i knew the name it would be trivial, i would just look it up in wikipedia :)
basically, it is like a set - except you cannot iterate it.
you put some values in it, lets say 80k zip codes.
then you can test if a given string is definately NOT a zip code, but you will...
Hello, I recently discovered a site that set certain code kata.
One of the Kata caught my eye and set me looking into Bloom filters.
I'm using PHP and MySql.
I have a table with roughly 45,000 words to act as a dictionary and i've written the code to create a bloom filter array.
My questions are...
At what point should the code run...
I have to search a given file name (let say Keyword) in a directory containing files. If there were only few keywords to be searched, I could have used regular search (like creating an array of file names residing in the specified directory and then search each file name with the given keyword). Since I need to search very large number...
I am working with a large set (5-20 million) of String keys (average length 10 chars) which I need to store in an in memory data structure that supports the following operation in constant time or near constant time:
// Returns true if the input is present in the container, false otherwise
public boolean contains(String input)
Java's ...
Is there an already cooked php solution?
...
Hi,
A bloom filter uses a hash function (or many) to generate a value between 0 and m given an input string X. My question is how to you use a hash function to generate a value in this way, for example an MD5 hash is typically represented by a 32 length hex string, how would I use an MD5 hashing algorithm to generate a value between 0 a...
I'd like to know about specific problems you - the SO reader - have solved using bloom filters and what libraries/frameworks you used if you didn't roll your own.
Questions:
What problems have you used bloom filters to solve?
What libraries/frameworks did you use?
I'm looking for first-hand experiences, so please do not answer unles...
Like the title says, I'm trying to find elements of M that exist in the large constant array N. Most of the time, no element of M will exist in N, so the vast majority of searches done on M are a waste of time.
I'm looking for some way to create an index to check before doing a full-scale search of M. A project similar to mine creates a...