I have been using git a lot recently and I quite like the concept of how GIT avoid duplicating similar data by using a hashing function based on sha1. I was wondering if current databases do something similar, or is this inefficient for some reason?
There is no need for this. Databases already have a good way of avoiding duplicating data - database normalization.
For example imagine you have a column that can contain one of five different strings. Instead of storing one of these strings into each row you should move these string out into a separate table. Create a table with two columns, one with the strings values and the other as a primary key. You can now use a foreign key in your original table instead of storing the whole string.
I came up with a nice "reuse-based-on-hash" technique (it's probably widely used though)
I computed the hash-code of all fields in the row, and then I used this hash-code as primary key.
When I inserted I simply did "INSERT IGNORE" (to suppress errors about duplicate primary keys). Either way I could be sure that what I wanted to insert, was present in the database after insertion.
If this is a known concept I'd be glad to hear about it!