views:

121

answers:

2

I keeping having the problem trying to extract data using regex whereas my result is not what I wanted because there might be some newlines, spaces, html tags, etc in the string, but is there anyway to actually see what is in the string, the debugger seems to show only the real text. How do you deal with this?

A: 

What I do is use a regex tester (whichever uses the same regex engine that you are using) and I test my pattern on it. I've tried using text editors that display invisible characters but to me they only add to the confusion.

So I just go by trial and error. For instance, if a line ends in:

</a>

Then I'll try the following patterns on the regex tester until I find one that works:

</a>.
</a>..
</a>\s
</a>\s*
</a>\n
</a>\r
</a>\r\n

Etc.

Pessimist
The question is not how to test generic regular expressions, but how to see what is actually the source string in yahoo-pipes, since it just shows you the the printable string and not the html.
CptanPanic
I understood the question. My answer still applies. When I want to find out what kinds of non-printable characters some text is using, I paste that text onto a regex tester and run the above regexes against it until I get a match.
Pessimist
A: 

If the content of the string is HTML then debugger gives you a choice of viewing "HTML" or "Source". Source should show you any HTML tags that are there.

However if your concern is white space, this may not be enough. Your only option is to "view source" on the original page.

The best course of action is to explicitly handle these possibilities in your regex. For example, if you think you might be getting white space in your target string, use the \s* pattern in the critical positions. That will match zero or more spaces, tabs, and new lines (you must also have the "s" option checked in the regex panel for new lines).

However, without specific examples of source text and the regex you are using - advice can only be generic.

Gavin Brock