views:

153

answers:

4

I'm trying to pull in an src value from an XML document, and in one that I'm testing it with, the src is:

<content src="content/Orwell - 1984 - 0451524934_split_2.html#calibre_chapter_2"/>

That creates a problem when trying to open the file. I'm not sure what that #(stuff) suffix is called, so I had no luck searching for an answer. I'd just like a simple way to remove it if possible. I suppose I could write a function to search for a # and remove anything after, but that would break if the filename contained a # symbol (or can a file even have that symbol?)

Thanks!

+1  A: 

From Wikipedia:

# is used in a URL of a webpage or other resource to introduce a "fragment identifier" – an id which defines a position within that resource. For example, in the URL http://en.wikipedia.org/wiki/Number_sign#Other_uses the portion after the # (Other_uses) is the fragment identifier, in this case indicating that the display should be moved to show the tag marked by ... in the HTML

Mike Chess
+3  A: 

You should be OK assuming that URLs won't contain a "#"

The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it.

Source (search for "#" or "unsafe").

Therefore just use String.Split() with the "#" as the split character. This should give you 2 parts. In the highly unlikely event it gives more, just discard the last one and rejoin the remainder.

ChrisF
This works, thanks!
kcoppock
A: 

If you had the src in a string you could use

srcstring.SubString(0,srcstring.LastIndexOf("#"));

Which would return the src without the #. If the values you are retreiving are all web urls then this should work, the # is a bookmark in a url that takes you to a specific part of the page.

Ben Robinson
This was the easiest method for me, thanks for the help!
kcoppock
A: 

It's not safe to remove de anchor of the url. What I mean is that ajax like sites make use of the anchor to keep track of the context. For example gmail. If you go to http://www.gmail.com/#inbox, you go directly to your inbox, but if you go to http://www.gmail.com/#all, you'll go to all your mail. The server can give a different response based on the anchor, even if the response is a file.

Fede
Didn't you kinda contradict your own answer here... yes, the client can render different content based on the anchor, but the server would normally always give the same response. The important part here is Ajax, which you mentioned your self. And that happens client side.
BurningIce
Also, in my case it won't matter, these are all local XML files, it won't be an actual web request. But that's good to keep in mind for future projects, though.
kcoppock
@BurningIce, what I meant is that anchors can have side effects on the server response. When you go to http://www.gmail.com/#all, gmail doesn't send you back all your mail, and then the browser filters it based on your anchor (anchor could have been #inbox, #all, #buzz, etc). It's the server who's responding in such a way, based on the complete requested url.
Fede