tags:

views:

86

answers:

3

Hi All,

I've got a Perl script which consumes an XML file on Linux and occasionally there are CRLF (Hex 0D0A, Dos new lines) in some of the node values which.

The system which produces the XML file writes it all as a single line, and it looks as if it occasionally decides that this is too long and writes a CRLF into one of the data elements. Unfortunately there's nothing I can do about the providing system.

I just need to remove these from the string before I process it.

I've tried all sorts of regex replacement using the perl char classes, hex values, all sorts and nothing seems to work.

I've even run the input file through dos2unix before processing and I still can't get rid of the erroneous characters.

Does anyone have any ideas?

Many Thanks,

+3  A: 

Typical, After battling for about 2 hours, I solved it within 5 minutes of asking the question..

$output =~ s/[\x0A\x0D]//g; 

Finally got it.

HeHasMoments
Rubberduck effect. It never fails ! :)
OMG_peanuts
Keep in mind that this is removing all instances of the characters `\r` and `\n` and not the string `\r\n` (just incase `\r` or `\n` could be valid values that you need in other places)
Eric Strom
+3  A: 
$output =~ tr/\x{d}\x{a}//d;

These are both whitespace characters, so if the terminators are always at the end, you can right-trim with

$output =~ s/\s+\z//;
Greg Bacon
tr/// is faster than a regex here...
drewk
+2  A: 

A few options:
1. Replace all occurrences of cr/lf with lf: $output =~ s/\r\n/\n/g; #instead of \r\n might want to use \012\015
2. Remove all trailing whitespace: output =~ s/\s+$//g;
3. Slurp and split:

#!/usr/bin/perl -w  

use strict;  
use LWP::Simple;  

   sub main{  
      createfile();  
      outputfile();
   }

   main();

   sub createfile{
      (my $file = $0)=~ s/\.pl/\.txt/;

      open my $fh, ">", $file;
         print $fh "1\n2\r\n3\n4\r\n5";
      close $fh;
   }

   sub outputfile{
      (my $filei = $0)=~ s/\.pl/\.txt/;
      (my $fileo = $0)=~ s/\.pl/out\.txt/;

      open my $fin, "<", $filei;
         local $/;                                # slurp the file
         my $text = <$fin>;                       # store the text
         my @text = split(/(?:\r\n|\n)/, $text);  # split on dos or unix newlines
      close $fin;

      local $" = ", ";                            # change array scalar separator
      open my $fout, ">", $fileo;
         print $fout "@text";                     # should output numbers separated by comma space
      close $fout;
   }
vol7ron
+1 slurp, +1 split
Armando