tags:

views:

1909

answers:

2

I want to trim trailing whitespace at the end of all XHTML paragraphs. I am using Ruby with the REXML library.

Say I have the following in a valid XHTML file:

<p>hello <span>world</span> a </p>
<p>Hi there </p>
<p>The End </p>

I want to end up with this:

<p>hello <span>world</span> a</p>
<p>Hi there</p>
<p>The End</p>

So I was thinking what I could use XPath to get just the text nodes that I want, then just trim the text, which would allow me to end up with what I want (previous).

I started with the following XPath: //root/p/child::text()

Of course, the problem here is that it returns all text nodes that are children of all p-tags. Which is this:

'hello '
' a '
'Hi there '
'The End '

Trying the following XPath gives me the last text node of the last paragraph. Not the last text node of each paragraph that is a child of the root node.

//root/p/child::text()[last()]

This only returns: 'The End '

What I would like to get from the XPath is therefore:

' a '
'Hi there '
'The End '

Can I do this with XPath? Or should I maybe be looking at using regular expressions? (That's probably more of a headache than XPath).

Cheers, Diego

+4  A: 

Your example worked for me

//p/child::text()[last()]
nickf
that only gets the last result though, he wants all of them throughout the document
Cipher
no, it gives the exact dataset he was asking for. It returns the last child text element of every p (in this case, three of them)
nickf
@nickf: You are correct. When you said it worked, I went and double checked. In doing so, it shows that the problem seems to be with the Ruby REXML library's implementation of XPath. Well, I won't say that until I investigate further. Could be a setting I need to pass to REXML (or some such thing)
drylight
drylight
It looks like it is a bug in REXML.
drylight
+1  A: 

Just in case you didn't know, XSL has a normalize-space() function which will get rid of leading and trailing spaces.

AmbroseChapel
Thanks for the response. Can normalize-space() or a similar function, remove trailing spaces only (leaving any leading spaces alone)?
drylight