views:

668

answers:

4

Hello everyone, Once again I have hit the wall.

How to replace escape characters using regular expressions? If tab character (\t) occures more than twice, I want to replace those two or more occurances by single \t. For example if \t\t\t comes, then I want to replace it with \t only. How to do this?

I am facing one more problem regarding reading text file and applying regular expressions to it.

I am using C# to read text file and for regular expressions. When I open text file (file having txt extension), I get a normal view of file. But when I read the same file using "textReader" and store it into string, I get text something like this :

O K\t\t\t\t\t\tEmail:
[email protected] \rPhone: + 91
992\t\r\rExperience Summary
\rBusiness Intelligence and data
warehouse designer with more than 6
years of work experience in OLAP
Project.\r\r\rTechnology\rBelow is a
list of important software products
and tools that I have worked
with.\r\rSoftware
Products\r\a\r\aOperating
Systems:\rWINDOWS NT, WINDOWS 2000,
UNIX\rDatabase Management
Systems:\rOracle 8i, Oracle 9i, Oracle
10g, SQL-Server 7.0, DB2\rSoftware
Packages:\rVSS, ER Win, M1\rFourth
Generation Language:\rPL/SQL,
SQL*PLUS\rTools &
Technologies:\rOracle Warehouse
Builder 10.1.0.4.0, ORACLE 9i AS,
ORACLE Discoverer Reports Data Stage
8.0, Fast Track 8.5, DB@ Cube, JavaScript, JSP, JDEV, BI BEANS, ASP,
ASP.NET, Ab
Initio\r\r\a\r\a\v\r\r\fAssignments\rThe
details of the various assignments
that I have handled are listed here,
in chronological
order.\r\rName\r\aAvery Dennison Data
Warehousing\r\a\r\aClient\r\aAvery
Dennison, he challenge in the project
is to feed EDW from existing
warehouses which has data at an
aggregated
level.\r\a\r\a\r\rName\r\aAOL BI
(Omniture)\rite team. Designing,
coding and testing along with
coordination with Onsite team.
\r\a\r\aTools & Technologies\r\aUnix
Platform, Oracle 10g , Py. Not only
delivering the correct requirement but
also the performance has to be in
acceptable
range.\r\a\r\a\r\r\r\r\r\r\r\r\r\r\r\rName\r\aAIW
Events (ABSA)\r\a\r\aClient\r\aABSA,
South Africa\r\a\r\aP

i.e, all the escape characters like \s, \r, \f are visible. Because of this, the regular expression that works with normal text doesnt work when I read same text into string variable.

Anyone one knows how to solve this problem?

Thanks

I have one more query. I want to match text at the end of the line. I tried to use $ for this. For example, to match text ending with "assignment", I used regex assignment$.It worked with normal text. But when I run this regx on text given by streamreader, this regex doesnt work. Stream reader gives strings like Assignments\r\r\f.How to match end of line or start of line with this kind of text?

+1  A: 

You're trying to match the string "\r", right? You'll have to escape the escape character to do it:

"(\\r)*"

This expression will match "\r" any number of times. It works because "\\" escapes to a literal "\". You can apply the same idea to match "\t", too.

Welbog
Thanks.I have one more query.I want to match text at the end of the line.I tried to use $ for this. For example, to match text ending with "assignment", I used regex assignment$.It worked with normal text. But when I run this regx on text given by streamreader, this regex doesnt work. Stream reader gives strings like Assignments\r\r\f.How to match end of line or start of line with this kind of text?
Shekhar
@shekhar: You can add a check for `\r\r\f$` instead of just `$`, like this: `"assignments\\r\\r\\f$"`
Welbog
+2  A: 
/\t{2,}/\t/

replaces two or more tabs with a single character.

SilentGhost
+1  A: 

You could replace \\t\\t\\t with \\t{3}

Darin Dimitrov
you could this with just \t{3}
Manu
@Manu, you are correct, I've modified my post. Thanks for the remark.
Darin Dimitrov
+1  A: 

For the tab char use something like this:

/(\t)*/\1/g
  1. Make a group with one char (the tab char) and match it as much as possible.
  2. Replace the full match with the single character
  3. (Global) use the pattern for the full text.

Then you could use the same expression for the other escaped chars you want to replace.

UlfR