views:

83

answers:

3

I am parsing the domain name out of a string by strchr() the last . (dot) and counting back until the dot before that (if any), then I know I have my domain.

This is a rather nasty piece code and I was wondering if anyone has a better way.

The possible strings I might get are:

  • domain.com
  • something.domain.com
  • some.some.domain.com

You get the idea. I need to extract the "domain.com" part.

Before you tell me to go search in google, I already did. No answer, hence I am asking here.

Thank you for your help

EDIT:

The string I have contains a full hostname. This usually is in the form of whatever.domain.com but can also take other forms and as someone mentioned it can also have whatever.domain.co.uk. Either way, I need to parse the domain part of the hostname: domain.com or domain.co.uk

A: 

Not sure what flavor of C, but you probably want to tokenize the domain using "." as the separator.

Try this: http://www.metalshell.com/source_code/31/String_Tokenizer.html

As for the domain name, not sure what your end goal is, but domains can have lots and lots of nodes, you could have a domain name foo.baz.biz.boz.bar.co.uk.

If you just want the last 2 nodes, then use above and get the last two tokens.

Joelio
A domain name can only have 255 octets, not quite "infinite"...
bstpierre
just the name.something where the .something can be .com, .net. etc or in the form of .co.uk etc
Jessica
I tried before with strtok but I need to keep on reading an saving strings... unless you know a good way to do it
Jessica
not sure what your program needs to do, you need to take http://www.foo.bar.co.uk and turn it into co.uk ?
Joelio
no, I get a a string containing a full hostname. this usually is in the form of whatever.domain.com but can also take other forms and as someone mentioned it can also have whatever.domain.co.uk. Either way, I need to parse the domain part of the hostnamer: domain.com or domain.co.uk
Jessica
should be easy with strtok, use the example I noted, store each token in an array of strings, keep track of the number of tokens. Then just join the last 2 or 3 tokens (depending on extension) for your answer.
Joelio
OK... how many char* do I need? 3? 4? 5?
Jessica
+2  A: 

Did you mean strrchr()?

I would probably approach this by doing:

  1. strrchr to get the last dot in the string, save a pointer here, replace the dot with a NUL ('\0').
  2. strrchr again to get the next to last dot in the string. The character after this is the start of the name you are looking for (domain.com).
  3. Using the pointer you saved in #1, put the dot back where you set it NUL.

Beware that names can sometimes end with a dot, if this is a valid part of your input set, you'll need to account for it.

Edit: To handle the flexibility you need in terms of example.co.uk and others, the function described above would take an additional parameter telling it how many components to extract from the end of the name.

You're on your own for figuring out how to decide how many components to extract -- as Philip Potter mentions in a comment below, this is a Hard Problem.

bstpierre
let me try it, however as Philip Potter mentioned, how can I handle domain.co.uk?
Jessica
@Jessica: you need to define what you want more clearly. For example, .jp, .uk, and .au have second-level general domains: co.uk, co.jp, com.au. But others such as .de, .es, .be, .tk don't: amazon.de etc. What exactly do you want? If you want to extract the last "meaningful" part of the domain, it's a Hard Problem.
Philip Potter
See edited question
Jessica
+2  A: 

This isn't a reply to the question itself, but an idea for an alternate approach:

In the context of already very nasty code, I'd argue that a good way to make it less nasty, and provide a good facility of parsing domain names and the likes - is to use PCRE or a similar library for regular expressions. That will definitly help you out if you also want to validate that the tld exists, for instance.

It may take some effort to learn initially, but if you need to make changes to existing matching/parsing code, or create more code for string matching - I'd argue that a regex-lib may simplify this a lot in the long term. Especially for more advanced matching.

Another library I recall which supports regex, is glib.

Kvisle
thanks. I would definatelly try regular expressions, however I can't use any external.
Jessica
What does the project currently link with? Libc only?
Kvisle
yes. only standard C and that's it.
Jessica