hash-collision

Purposely create two files to have the same hash?

If someone is purposely trying to modify two files to have the same hash, what are ways to stop them? Can md5 and sha1 prevent the majority case? I was thinking of writing my own and I figure even if I don't do a good job if the user doesn't know my hash he may not be able to fool mine. What's the best way to prevent this? ...

CHECKSUM() collisions in SQL Server 2005

I've got a table of 5,651,744 rows, with a primary key made of 6 columns (int x 3, smallint, varchar(39), varchar(2)). I am looking to improve the performance with this table and another table which shares this primary key plus an additional column added but has 37m rows. In anticipation of adding a column to create the hash key, I did...

Uniquely identifying URLs with one 64-bit number

This is basically a math problem, but very programing related: if I have 1 billion strings containing URLs, and I take the first 64 bits of the MD5 hash of each of them, what kind of collision frequency should I expect? How does the answer change if I only have 100 million URLs? It seems to me that collisions will be extremely rare, bu...

What is the difference between a multi-collision and a first or second pre-image attack on a hash function?

What is the difference between a multi-collision in a hash function and a first or second preimage. First preimage attacks: given a hash h, find a message m such that hash(m) = h. Second preimage attacks: given a fixed message m1, find a different message m2 such that hash(m2) = hash(m1). Multi-collision attacks: generate a series o...

Examples of Hash-Collisions?

For demonstration-purposes, what are a couple examples of strings that collide when hashed? MD5() is a relatively standard hashing-option, so this will be sufficient. ...

What does it mean by "the hash table is open" in Java?

I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself Hashtable<String, Integer> me = new Hashtable<S...

Is a hash result ever the same as the source value?

This is more of a cryptography theory question, but is it possible that the result of a hash algorithm will ever be the same value as the source? For example, say I have a string: baf34551fecb48acc3da868eb85e1b6dac9de356 If I get the SHA1 hash on it, the result is: 4d2f72adbafddfe49a726990a1bcb8d34d3da162 In theory, is there ever a...

How much more likely are hash collisions if I hash a bunch of hashes?

Say I'm using a hash to identify files, so I don't need it to be secure, I just need to minimize collisions. I was thinking that I could speed the hash up by running four hashes in parallel using SIMD and then hashing the final result. If the hash is designed to take a 512-bit block, I just step through the file taking 4x512 bit blocks a...

Why isn't randomized probing more popular in hash table implementations?

According to various sources, such as Wikipedia and various .edu websites found by Google, the most common ways for a hash table to resolve collisions are linear or quadratic probing and chaining. Randomized probing is briefly mentioned but not given much attention. I've implemented a hash table that uses randomized probing to resolve ...

How should I be handling checksum collisions in my application?

I have a part of my application that stores files. Because we could potentially be adding many of the same file, I am first keeping a hash of each file. If two files have the same hash, then we throw out one, and both "references" to that file point to the same physical file. How much should I be worried about hash collisions? In th...

What's the shortest pair of strings that causes an MD5 collision?

Up to what string length is it possible to use MD5 as a hash without having to worry about the possibility of a collision? This would presumably be calculated by generating an MD5 hash for every possible string in a particular character set, in increasing length, until a hash appears for a second time (a collision). The maximum possible...

HashMap collision

When there is a collision during a put in a HashMap is the map resized or is the entry added to a list in that particular bucket? ...

A couple of questions about Hash Tables

I've been reading a lot about Hash Tables and how to implement on in C and I think I have almost all the concepts in my head so I can start to code my own, I just have a couple of questions that I have yet to properly understand. As a reference, I've been reading this: http://eternallyconfuzzled.com/jsw_home.aspx 1) As I've read on the...

Moving from Linear Probing to Quadratic Probing (hash collisons)

Hi, My current implementation of an Hash Table is using Linear Probing and now I want to move to Quadratic Probing (and later to chaining and maybe double hashing too). I've read a few articles, tutorials, wikipedia, etc... But I still don't know exactly what I should do. Linear Probing, basically, has a step of 1 and that's easy to do...

Are hash collisions with different file sizes just as likely as same file size?

I'm hashing a large number of files, and to avoid hash collisions, I'm also storing a file's original size - that way, even if there's a hash collision, it's extremely unlikely that the file sizes will also be identical. Is this sound (a hash collision is equally likely to be of any size), or do I need another piece of information (if a ...

Hash Table: Should I increase the element count on collisions?

Hi, Right now my hash tables count the number of every element inserted into the hash table. I use this count, with the total hash table size, to calculate the load factor and when it reaches like 70%, I rehash it. I was thinking that maybe I should only count the inserted elements with fills an empty slot instead of all of them. Cause...

Looking for an array (vs linked list) hashtable implementation in C

hi, I'm looking for a hashtable implementation in C that stores its objects in (twodimensional) arrays rather than linked lists. i.e. if a collision happens, the object that is causing the collision will be stored in the next free row index rather than pushed to the head and first element of a linked list. plus, the objects themselves ...

Image caching strategy

The Scenario I am building a web application where reports can be generated on the fly (based on information retrieved from an SQL database). These reports will contain charts, which can also be generated on the fly. Because these charts contain sensitive information, using a 3rd party chart API (ie: Google Charts) is out of the questi...

SHA1 collision demo / example

This question is similar to this, but that one only references MD5 collision demos. Are there any actual SHA1 collision pairs of arbitrary messages known so far ? I'd like to use these to test how various software products (my own one and some third party) deal with it. Doing some Google searches only turned up the oh-so prominent MD5...

Confusion about linear probe method based Open Addressing in hashtables?

Suppose array index according to hashing function for string "temp" is 155 and location 155 is pre-occupied then location 156 is tried. Suppose location 156 is available, so this entry is saved in location 156 instead of 155. Later I find another string "another_temp", which maps to location 156. Again this is saved in next available loc...