views:

4216

answers:

21

Akismet does an amazing job at detecting spam comments. But comments are not the only form of spam these days. What if I wanted something like akismet to automatically detect porn images on a social networking site which allows users to upload their pics, avatars, etc?

There are already a few image based search engines as well as face recognition stuff available so I am assuming it wouldn't be rocket science and it could be done. However, I have no clue regarding how that stuff works and how I should go about it if I want to develop it from scratch.

How should I get started?

Is there any open source project for this going on?

+3  A: 

short answer: use a moderator ;)

Long answer: I dont think there's a project for this cause what is porn? Only legs, full nudity, midgets etc. Its subjective.

PoweRoy
the question is "What is the best way to programatically detect porn images?", programatically...
Agusti-N
I know the question, but as I said there is no 100% accurate porn blocker because porn is subjective. Subjective can't be related to code. 1 thinks is just nudity, other thinks its porn. Better solution is to have a 'report image' button. Same idea as Koistya Navin .NET
PoweRoy
"Midgets etc."? Holy non-sequitur, Batman.
Doug McClean
+22  A: 

I would rather allow users report on bad images. Image recognition development can take too much efforts and time and won't be as much as accurate as human eyes. It's much much cheaper to outsource that moderation job.

Take a look at: Amazon Mechanical Turk

"The Amazon Mechanical Turk (MTurk) is one of the suite of Amazon Web Services, a crowdsourcing marketplace that enables computer programs to co-ordinate the use of human intelligence to perform tasks which computers are unable to do."

Koistya Navin
the "let's say to indians bit" is unnecessary - it adds nothing to the statement and could be deemed racist
annakata
@annakata, you're right
Koistya Navin
@annakata - completely agree with you@koistya - i am sad being a coder and being on SO you wrote that bit.
Raj
Perhaps you could source workers from Amazon Mechanical Turk to identify the pictures. Hmmm.
CiscoIPPhone
There's probably a market for a Amazon Mechanical Turk-style website but one that specialises in this kind of subject matter.... :)
Rich
If there are lot's of pictures that sould be filtered I guess some guys would work as moderators for free :))
Koistya Navin
Event on that big website as StackOverflow there used to be just 4 moderators.
Koistya Navin
Amazon Mechanical Turk probably costs money. Given the subject matter you would think there is a clever business model where you can get this done for free.
Ankur
I think this is a pretty unethical approach.
Noon Silk
@Ankur LOL! Great idea. Heading to nic.com to check whether PornOrNot.com is still available.
Pekka
Racism, oh come on.
John
+3  A: 

There is software that detects the probability for porn, but this is not an exact science, as computers can't recognize what is actually on pictures (pictures are only a big set of values on a grid with no meaning). You can just teach the computer what is porn and what not by giving examples. This has the disadvantage that it will only recognize these or similar images.

Given the repetitive nature of porn you have a good chance if you train the system with few false positives. For example if you train the system with nude people it may flag pictures of a beach with "almost" naked people as porn too.

A similar software is the facebook software that recently came out. It's just specialized on faces. The main principle is the same.

Technically you would implement some kind of feature detector that utilizes a bayes filtering. The feature detector may look for features like percentage of flesh colored pixels if it's a simple detector or just computes the similarity of the current image with a set of saved porn images.

This is of course not limited to porn, it's actually more a corner case. I think more common are systems that try to find other things in images ;-)

Patrick Cornelissen
Why do people down-vote this answer?
Patrick Cornelissen
because it doesn't contain anything like an algorithm, recipe, or reference.
Ian
So it's not a valid answer to explain the user asking the question that it's not really possible what he tries to achieve? Dude, you might be a little bit more releaxed...
Patrick Cornelissen
+1  A: 

I've seen a web filtering application which does porn image filtering, sorry I can't remember the name. It was pretty prone to false positives however most of the time it was working.

I think main trick is detecting "too much skin on the picture :)

dr. evil
I can't remember the study either - but it did an edge detection and matched what appeared to be patterns of vulvas rotated or obscured. Quite interesting from an image processing aspect.
jim
A: 

Detecting porn images is still a definite AI task which is very much theoretical yet.

Harvest collective power and human intelligence by adding a button/link "Report spam/abuse". Or employ several moderators to do this job.

P.S. Really surprised how many people ask questions assuming software and algorithms are all-mighty without even thinking whether what they want could be done. Are they representatives of that new breed of programmers who have no understanding of hardware, low-level programming and all that "magic behind"?

P.S. #2. I also remember that periodically it happens that some situation when people themselves cannot decide whether a picture is porn or art is taken to the court. Even after the court rules, chances are half of the people will consider the decision wrong. The last stupid situation of the kind was quite recently when a Wikipedia page got banned in UK because of a CD cover image that features some nakedness.

User
A: 

There is no way you could do this 100% (i would say maybe 1-5% would be plausible) with nowdays knowledge. You would get much better result (than those 1-5%) just checking the image-names for sex-related-words :).

@SO Troll: So true.

+18  A: 

This was written in 2000, not sure if the state of the art in porn detection has advanced at all, but I doubt it.

http://www.dansdata.com/pornsweeper.htm

PORNsweeper seems to have some ability to distinguish pictures of people from pictures of things that aren't people, as long as the pictures are in colour. It is less successful at distinguishing dirty pictures of people from clean ones.

With the default, medium sensitivity, if Human Resources sends around a picture of the new chap in Accounts, you've got about a 50% chance of getting it. If your sister sends you a picture of her six-month-old, it's similarly likely to be detained.

It's only fair to point out amusing errors, like calling the Mona Lisa porn, if they're representative of the behaviour of the software. If the makers admit that their algorithmic image recogniser will drop the ball 15% of the time, then making fun of it when it does exactly that is silly.

But PORNsweeper only seems to live up to its stated specifications in one department - detection of actual porn. It's half-way decent at detecting porn, but it's bad at detecting clean pictures. And I wouldn't be surprised if no major leaps were made in this area in the near future.

Jeff Atwood
+3  A: 

The answer is really easy: It's pretty safe to say that it won't be possible in the next two decades. Before that we will probably get good translation tools. The last time I checked, the AI guys were struggling to identify the same car on two photographs shot from a slightly altered angle. Take a look on how long it took them to get good enough OCR or speech recognition together. Those are recognition problems which can benefit greatly from dictionaries and are still far from having completely reliable solutions despite of the multi-million man months thrown at them.

That being said you could simply add an "offensive?" link next to user generated contend and have a mod cross check the incoming complaints.

edit:

I forgot something: IF you are going to implement some kind of filter, you will need a reliable one. If your solution would be 50% right, 2000 out of 4000 users with decent images will get blocked. Expect an outrage.

Thomasz
+1  A: 

crowdsifter.com (along the AMT lines) -Brendan (from Dolores Labs, maker of crowdsifter)

Brendan OConnor
A: 

Look at file name and any attributes. There's not nearly enough information to detect even 20% of naughty images, but a simple keyword blacklist would at least detect images with descriptive labels or metadata. 20 minutes of coding for a 20% success rate isn't a bad deal, especially as a prescreen that can at least catch some simple ones before you pass the rest to a moderator for judging.

The other useful trick is the opposite of course, maintain a whitelist of image sources to allow without moderation or checking. If most of your images come from known safe uploaders or sources, you can just accept them bindly.

SPWorley
+3  A: 

Add an offensive link and store the md5 (or other hash) of the offending image so that it can automatically tagged in the future.

How cool would it be if somebody had a large public database of image md5 along with descriptive tags running as a webservice? Alot of porn isn't original work (in that the person who has it now, didn't probably make it) and the popular images tend to float around different places, so this could really make a difference.

rfusca
I doubt it. There is SO much porn out there (and tons more generated by the day) that your odds of seeing the same picture twice are (IMHO) rather close to zero.
Vilx-
Think about how often tub girl showed up all over for awhile. It would have gotten flagged once and then everybody else could have avoided it.
rfusca
unless it were cropped, resized, or just opened and saved again before being uploaded..
Blorgbeard
Ya, I thought about that :( eh, it was a thought.
rfusca
Better than md5, licence idée's TinEye.
Tobu
@rfusca: Damn you for mentioning tub girl, that's just sick and now I won't be able to sleep.
Alix Axel
A: 

Two options I can think of (though neither of them is programatically detecting porn):

  1. Block all uploaded images until one of your administrators has looked at them. There's no reason why this should take a long time: you could write some software that shows 10 images a second, almost as a movie - even at this speed, it's easy for a human being to spot a potentially pornographic image. Then you rewind in this software and have a closer look.
  2. Add the usual "flag this image as inappropriate" option.
Rich
+10  A: 

This is actually reasonably easy. You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win.

SpliFF
Yeah! Body Painting Porn fans surely agree! :)
belisarius
A: 

The BrightCloud web service API is perfect for this. It's a REST API for doing website lookups just like this. It contains a very large and very accurate web filtering DB and one of the categories, Adult, has over 10M porn sites identified!

Chris Harris
+1  A: 

I've heard about tools which were using very simple, but quite effective algorithm. The algorithm calculated relative amount of pixels with color value near to some predefined "skin" colours. If that amount is higher than some predefined value then image is considered to be of erotic/pornographic content. Of course that algorithm will give false positive results for close-up face photos and many other things.
Since you are writing about social networking there will be lots of "normal" photos with high amount of skin colour on it, so you shouldn't use this algorithm to deny all pictures with positive result. But you can use it provide some help for moderators, for example flag these pictures with higher priority, so if moderator want to check some new pictures for pornographic content he can start from these pictures.

Dmitriy Matveev
I've actually seen a system similar to that in use. Its not reliable enough to be left on its own, but it does a very good job of alerting a moderator when appropriate. Its not full proof, especially if the person is covered with only one small exposed area. The ratio doesn't quite work as reliably in reverse.
Tim Post
A: 

Maybe something like this...

luvieere
+5  A: 

how about google safe search??
i think they don't do it manually...

takien
In google safe search, a lot more things are involved. The search is made based on text. Hence, porn sites will contain lot of related texts and hence easier to filter.
aNish
+3  A: 

BOOM! Here is the algorithm.

http://www.math.admu.edu.ph/~raf/pcsc05/proceedings/AI4.pdf

Does anyone know where to get the source code for a java (or any language) implementation?

That would rock.

One algorithm called WISE has a 98% accuracy rate but a 14% false positive rate. So what you do is you let the users flag the 2% false negatives, ideally with automatic removal if a certain number of users flag it, and have moderators view the 14% false positives.

devadvocate
You found the algorithm. That's pretty darn good. The source code is often left as an exercise. After all, we aren't specifying any particular programming language, are we?
Ian
+1  A: 
Jason S
A: 

If you're really have time and money:

One way of doing it is by 1) Writing an image detection algorithm to find whether an object is human or not. This can be done by bitmasking an image to retrieve it's "contours" and see if the contours fits a human contour.

2) Data mine a lot of porn images and use data mining techniques such as the C4 algorithms or Particle Swarm Optimization to learn to detect pattern that matches porn images.

This will require that you identify how a naked man/woman contours of a human body must look like in digitized format (this can be achieved in the same way OCR image recognition algorithms works).

Hope you have fun! :-)

The Elite Gentleman
A: 

This one looks promising. Basically they detect skin (with calibration by recognizing faces) and determine "skin paths" (i.e. measuring the proportion of skin pixels vs. face skin pixels / skin pixels). This has decent performance. http://www.prip.tuwien.ac.at/people/julian/skin-detection

alexsee75