views:

54

answers:

2

I want to write an online application that:

  1. reads the URL from address bar of the browser
  2. extracts its lexical features (like n-grams)
  3. extracts its host based features (fetch DNS records online, its A, PTR, TTL fields)
  4. classify the URL into malicious or benign (using machine learning)

Can anyone help me with 1 and 3?

A: 

I don't believe this (application) is a task you can accomplish, as you can't really determine site content based on url.

See something like Mozilla Phishing Protection Design Documentation and Google Safe Browsing spec instead

Joel L
this is a project based on a recent ACM publication.. it says we can classify sites using url alone..
trinity
Yes, you'll probably get some better-than-random results, but the security will still be bad *and* with false positives.
Joel L
A: 

No idea what language you may be looking at.

For Item 1 here is a .net library that maybe helpful

http://msdn.microsoft.com/en-us/library/system.web.httputility.aspx

Maestro1024
a stand-alone java online application.
trinity