views:

194

answers:

4

Problem

Problem shaping

Image sequence position and size are fixed and known beforehand (it's not scaled). It will be quite short, maximum of 20 frames and in a closed loop. I want to verify (event driven by button click), that I have seen it before.

Lets say I have some image sequence, like:

If seen, I want to see the ID associated with it, if not - it will be analyzed and added as new instance of image sequence, that has been seen. I have though about this quite a while, and I admit, this might be a hard problem. I seem to be having hard time of putting this all together, can someone assist (in C#)?

Limitations and uses

I am not trying to recreate copyright detection system, like content id system Youtube has implemented (Margaret Gould Stewart at TED ( link )). The image sequence can be thought about like a (.gif) file, but it is not and there is no direct way to get binary. Similar method could be used, to avoid duplicates in "image sharing database", but it is not what I am trying to do.

My effort

Gaussian blur

Mathematica function to generate Gaussian blur kernels:

getKernel[L_] := Transpose[{L}].{L}/(Total[Total[Transpose[{L}].{L}]])
getVKernel[L_] := L/Total[L]

alt textalt textalt text
Turns out, that it is much more efficient to use 2 passes of vector kernel, then matrix kernel. Thy are based on Pascal triangle uneven rows:

{1d/4, 1d/2, 1d/4}
{1d/16, 1d/4, 3d/8, 1d/4, 1d/16}
{1d/64, 3d/32, 15d/64, 5d/16, 15d/64, 3d/32, 1d/64}

Data input, hashing, grayscaleing and lightboxing

Example of source bits, that might be useful:

  • Lightbox around the known rectangle: FrameX
  • Using MD5CryptoServiceProvider to get md5 hash of the content inside known rectangle atm.
  • Using ColorMatrix to grayscale image

Source example

Source example (GUI; code):

Get current content inside defined rectangle.

        private Bitmap getContentBitmap() {
            Rectangle r = f.r;
            Bitmap hc = new Bitmap(r.Width, r.Height);
            using (Graphics gf = Graphics.FromImage(hc)) {
                gf.CopyFromScreen(r.Left, r.Top, 0, 0, //
                    new Size(r.Width, r.Height), CopyPixelOperation.SourceCopy);
            }
            return hc;
        }

Get md5 hash of bitmap.

        private byte[] getBitmapHash(Bitmap hc) {
            return md5.ComputeHash(c.ConvertTo(hc, typeof(byte[])) as byte[]);
        }

Get grayscale of the image.

        public static Bitmap getGrayscale(Bitmap hc){
            Bitmap result = new Bitmap(hc.Width, hc.Height);
            ColorMatrix colorMatrix = new ColorMatrix(new float[][]{   
                new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0.5f,0.5f,0.5f,0,0},
                new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0,0,0,1,0,0},
                new float[]{0,0,0,0,1,0}, new float[]{0,0,0,0,0,1}});

            using (Graphics g = Graphics.FromImage(result)) {
                ImageAttributes attributes = new ImageAttributes();
                attributes.SetColorMatrix(colorMatrix);
                g.DrawImage(hc, new Rectangle(0, 0, hc.Width, hc.Height),
                   0, 0, hc.Width, hc.Height, GraphicsUnit.Pixel, attributes);
            }
            return result;
        }
+1  A: 

Perhaps you can find a way to get a binary copy of the image data of each frame in a variable. Hash that data (md5?) and store each of the hashes. Then you can see if you've ever seen that hash before. If you haven't, it's a new frame.

philfreo
+2  A: 

I think you have a few issues with this:

  1. Not all image sequences [videos] are equal [but many are similar]
  2. Where is your data coming from?
  3. How will you repesent the data related to your viewings?
  4. Size of the data

Issue #1:

Many images can differ slightly by compression, water marking, missing frames, and adding clips. I would suggest sampling the video. For example you may want to consider sub-sampling small sections of the images in the video. Additionally, to avoid noisy images and issues with lossely compression algorithms. You may want to consider grayscaling the frames sampled, and doing a gaussian blur. [Guassian because its "more natural" (short answer)] Once you have enough sub samples to where you have a good confidence of similarity to the video then store it in a database. With the samples you can hash them, or store them to do a % similarity later.

Issue #2

Your datasource is going to influence the tool kits, and libraries that you use. I would suggest keeping this simple [keep it with gifs and create a custom viewer, dont' try to write a browser plugin while developing your logic]

Issue #3

Using something like Postgres [if there are a lot of large sized objects] or SQLLite is highly suggested for indexing, storing, and recalling past meta data.

Issue #4

The size of the data will have a huge determination on recall, sampling, querying the database, etc.

Overall advice: Don't bite off more than you can handle at this stage. Start small and then grow.

Also take a look at Computer Vision algorithms for more help on the object representation/recall.

monksy
+1  A: 

The question itself is sure very interesting and challenging, however there are many practical issues as stated by @monksy.

The opportunist pragmatic in me would take a step back, look at the big picture and see if there is another way to solve the problem. For example, if you are building some kind of "image sharing community" and want to avoid duplicates in the database, you could do a simple md5 on the file (animated gifs on the web are usually always the same, it's rare that people modify them).

Another example: if you are analyzing scientific samples (like meteo sequences) it may be easier to directly embed some kind of hash in every file when generating them.

Francesco De Vittori
Well you are correct... most of the time its just copied. However sometimes longer videos are cut into smaller ones.... my suggestion avoids this.
monksy
+1  A: 

This depends on wether you only want to know wether you've seen an absolutely identical movie again, or you also want to identify movies that are very similar but have been changed a bit (made lighter, have a watermark added, compression changed, etc.)

In the first case, just take any type of hash of the file and use that (because the file will be identical on the binary level.

In the second case (which I think is what you want) you have an interesting image processing problem on your hands. You could find yourself at the front-lines of image processing science with this if you'd want. If that is the case I suggest you start reading about SURF and OpenCV, and continue on from that.

If you want to match very similar, but not identical videos, and don't want to go the ultra-robus scientific route then I'd suggest the following process:

  1. Do the gaussian blur you already do.
  2. Divide each image into a few equally sized rectangles (you'd have to test for the best number, but I'd suggest you start with 9.
  3. For each rectangle in each frame compute the full-colour histogram, then find the most occurring colour in that rectangle. This gives you 9*20 = 180 numbers. This is the "fingerprint" of this movie.
  4. Find the most similar fingerprint in your database, if it is similar enough you already know about it, otherwise you don't.

Step 4 is a bit vague because I'm not really into this field. You are currently using an MD5 hash as a sort of fingerprint, but this is unsuitable in this case because slight differences in the input of a good cryptographic hashing function produce very large differences in the hash. This will mean that two very similar frames will have a totally different MD5 hash, so from the hash you'd never know they were similar.

As long as speed of database lookups is not an issue I'd just go for the sum of square differences as a measure of fingerprint similarity, and set a threshold on that to identify equal movies. However, this is not very fast for huge datasets, and in those cases you'd probably need to transform your fingerprint to something that will allow you to find similar fingerprints faster. One thing you could do here is start by selecting all known movies with very similar average colour for the entire video, then from that select the movies that have very similar average colour in each frame, and in the ones that remain at that point do the full rectangle-by-rectangle fingerprint match. But I'm sure there are even faster options for matching 180 numbers.

jilles de wit