views:

103

answers:

6

Hello,

given the URL of a well known company (eg http://mcdonalds.com/), how would you automatically and reliably find the company name (in this case "Mc Donalds")?

Thanks

Edit: someone voted to close this question, so maybe I need to explain the motivation. I have a large list of company URLs and I want to find data about each company using Google Maps. And searching Google Maps with the company name works much better than the URL.

Removing 'http' and 'com' does work in a lot of cases, particularly for well known companies, but not all. I found the whois records were not very helpful.

I was hoping there was some kind of public database matching companies to URLs, but haven't come across one so far.

+1  A: 

You would need to create your own Lookup Table: You would have to try and parse this information from the html at the URL for themost accurate data, eg: get the Html page Title, or look for the Copyright message?

Mark Redman
Nothing would be fool proof and accurate, you would have to review these.
Mark Redman
Yeah I was hoping such a table already existed, which I could reuse.
Plumo
+1  A: 

Quite probable they will have it in the <title/> element. Parse this and compare it to the website's domain. If there is a significant overlap, it is your match. If not, try some heuristics on the title (like name is everything before >> or such).

If it is a larger company, then you could also be lucky looking at the NIC entry (aka Whois) for their domain.

Boldewyn
if not, the meta tags will likely include it
scunliffe
But they are a real mess. Dublin Core is far from being even known in these companies' PR departments. To parse them for something you don't know will give you a really bad success rate.
Boldewyn
A: 

You could use the whois information. There should be libraries to let you do that in a clean way. You didnt mention what type of technology you'll be using...

Ruben Bartelink
A: 

Whois database may be of some help, though there are always edge cases that you will have to handle with more effort.

mouviciel
+1  A: 

If you want to be accurate, I would say amazon mechanical turk.

flybywire
good idea, but I am too cheap!
Plumo
+1  A: 
juno
that example seems to inspect the title tag and you suggest to inspect the meta tag. Both are good ideas, but wouldn't be reliable in general.
Plumo