I'm not sure if Youtube is the only website with this technology, but content identification in YT (Content ID) is basically a technology to automatically identify and remove copyright infringements. You can read more about it here:
http://www.youtube.com/t/contentid
Well when one of my videos (containing a particular music track) got tagged and removed for copyright infringement, I thought it [the content-ID sytstem] was probably dumb. So I did some experiments: none of which fooled the filter~
- Added a series of beeps in the middle of the song
- Changed the pitch several times through the song
- Changed the volume a few times
- Adjusted the speed
- Added an audio overlay
- Added a few audio effects
On the other hand, I don't know any material being falsely matched as copyrighted. A piano version of a song, for example, would not falsely trigger the censor.
I'm not ranting about my videos being removed. I'm just surprised how effective the content censor is. I'm wondering how the algorithm correctly identifies the song as infringing copyright even after all my efforts to circumvent it. Any attempts to directly match would have been defeated immedately, any algorithms involving note patterns would likely be fooled by the beeps and the pitch shifting.
Well this is more of my curiosity than an urgent question..