ansaurus

Question

Answer 1

A:

inelegant perl one-liner which should do the trick, though not particularly quickly.

cat file | perl -e '
    $x=0;
    while(<>){
        s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/g;
        print;
        $x++;
    if($x==3){
        print"\n";
        $x=0;
    }
}' > output

Mimisbrunnr 2010-03-18 04:41:52

Instead of `cat file`, just use `<file`.

Arkku 2010-04-11 02:49:34

@Arkku - would work just as well. It's an old habit of mine, and I'm more comfortable with cat $FILE |

Mimisbrunnr 2010-04-11 04:55:48

@Mimisbrunnr: It fires up a useless `cat`, though. On some highly restricted systems there's a low limit on the number of simultaneous processes which it count towards. Also, it can be a significant slowdown if the process itself is a fast reader, e.g. try `cat /dev/zero | dd bs=1k count=1000` vs `dd bs=1k count=1000 </dev/zero`. I get 7.5MB/s with `cat` and 32.7MB/s without.

Arkku 2010-04-11 05:01:00

(As a real-world example I've encountered, the parsing of a multi-gigabyte file cat-ed to the parser by a person also habitually using `cat |` proved to be a major slowdown in the process… =)

Arkku 2010-04-11 05:02:55

@Arkku - Noted, thanks, I will discontinue the use of cat except where actually needed.

Mimisbrunnr 2010-04-11 20:14:02

Answer 2

A:

You can do this:

perl -e '$i=1; while(<>){chomp;$s.=$_;if($i%3==0){$s=~s{>\s+<}{><};print "$s\n";$s="";}$i++;}' file

codaddict 2010-03-18 04:43:07

chomp is no good because it leaves behind too much whitespace, unless our asker is okay with that.

Mimisbrunnr 2010-03-18 04:46:06

@Mimisbrunnr: if you look carefully I use a regex to get rid of the extra spaces.

codaddict 2010-03-18 04:47:19

@codaddict - I apologies, I spoke before fully reading your code.

Mimisbrunnr 2010-03-18 04:52:29

that also depends on whether the ending pattern `</abc>` is always 2 lines after `<abc>`. why not grab for the actual pattern?

2010-03-18 05:38:23

Answer 3

+1 A:

$ awk '
    /<abc/ && NR > 1 {print ""}
    {gsub(" +"," "); printf "%s",$0}
' file
<abc a="1"> <val>0.25</val></abc>
<abc a="2"> <val>0.25</val></abc>
<abc a="3"> <val>0.35</val></abc>

ghostdog74 2010-03-18 04:50:51

+1 You'll also want: `END {print ""}` to ensure the file ends with a newline.

glenn jackman 2010-03-18 11:16:16

Answer 4

A:

sed '/<abc/,/<\/abc>/{:a;N;s/\n//g;s|<\/abc>|<\/abc>\n|g;H;ta}'  file

2010-03-18 05:10:55

Answer 5

A:

tr "\n" " "<myfile|sed 's|<\/abc>|<\/abc>\n|g;s/[ \t]*<abc/<abc/g;s/>[ \t]*</></g'

2010-03-18 05:33:42

Answer 6

+2 A:

In vim you could do this with

:g/<abc/ .,/<\/abc/ join

This would leave a space between some of the elements, which you could then remove with

:%s/> *</></g

In general I would recommend using a proper XML parsing library in a language like Python, Ruby or Perl for manipulating XML files (I recommend Python+ElementTree), but in this case it is simple enough to get away with using a regex solution.

Dave Kirby 2010-03-18 07:32:16

Answer 7

+1 A:

Bash:

while read s; do echo -n $s; read s; echo -n $s; read s; echo $s; done < file.xml

pazhitnov 2010-03-18 13:19:42

Answer 8

+1 A:

You can record a macro. Basically what I would do is begin with my cursor at the start of the first line. Press 'qa' (records macro to the a register). The press shift-V to being line-wise visual mode. Then search for the ending tag '/\/abc'. Then press shift-J to join the lines. Then you would have to move the cursor to the next tag, probably with 'j^' and press 'q' to stop recording. You can then rerun the recording with '@a' or specify 10000@a if you like. If the tags are different or not right after each other you just need to change how you find the opening and closing tags to searches or something like that.

Neg_EV 2010-03-18 15:21:46

Obviously this is a vim based solution...

Neg_EV 2010-03-18 15:27:00

Answer 9

+1 A:

In Vim:

position on first line
qq: start recording macro
gJgJ: joins next two lines without adding spaces
j: go down
q: stop recording
N@q: N = number of lines (actually around 1/3rd of all lines as they get condensed on the go)

kemp 2010-03-20 23:32:08

Answer 10

A:

This should work in ex mode:

:%s/$^<abc.*>$^M^$.*$^M^$^<\/abc>$.*^M/\1\2\3^M/g

I should have extra spaces (or a tab in between the value), but you coud remove it depending on what it is (\t or \ \ \ \ ).

What you are searching/replacing is here is (pattern1)[enter](pattern2)[enter](pattern3)[enter] and replacing it with (pattern1)(pattern2)(pattern3)[enter]

The ^M is done with ctrl+v CTRL+m

2010-03-24 18:08:12

Answer 11

+1 A:

sed '/^<abc/{N;N;s/\n\| //g}'

# remove \n or "space" 
# Result

<abca="1"><val>0.25</val></abc>
<abca="2"><val>0.25</val></abc>
<abca="3"><val>0.35</val></abc>

2010-03-28 16:38:09

ansaurus

tags:

views:

answers:

Combining multiple lines into one line

related questions