views:

58

answers:

5

I have the following input file that you might recognize as a debian Packages file:

Package: nimbox-apexer-sales
Version: 1.0.0-201007241449
Architecture: i386
Maintainer: Ricardo Marimon <[email protected]>
Installed-Size: 124
Depends: nimbox-apexer-root
Filename: binary/nimbox-apexer-sales_1.0.0-201007241449_i386.deb
Size: 68880
MD5sum: c4538f2913d76b57110ba73d0b87cc16
Section: base
Priority: optional
Description: Sales Application for NiMbox.

Package: nimbox-tomcat
Version: 6.0.26-5
Architecture: i386
Maintainer: Ricardo Marimon <[email protected]>
Installed-Size: 6144
Depends: sun-java6-jdk
Filename: binary/nimbox-tomcat_6.0.26-5_i386.deb
Size: 5490024
MD5sum: 5f2ccbe6137af2842e1c81bc217444e3
Section: base
Priority: optional
Description: Tomcat Servlet Application Server for NiMbox
 NiMbox requires a servlet application server in order to work.  The current
 NiMbox implementation requires a Tomcat Servlet Application.

The file actually has many of these entries and I want to get the following file

nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5

Where the Package and the Version are separated by a tab so that I can later use cut to get them. I'm pretty sure this can be done with sed. I went over the sed one liners but this is probably a bit more complex. Any ideas?

+1  A: 

Assuming that your file name is test.txt:

grep -P '^Package: |^Version:' test.txt  | awk '{ print $2 }' | sed -e 'N;s/\n/ /'

Where:

  1. grep -P '^Package: |^Version:' - greps for lines beginning with 'Package: ' or 'Version: '
  2. awk '{ print $2 }' - strips 'Package: ' and 'Version: ' substrings from the result
  3. sed -e 'N;s/\n/ /' - joins every other line
Vlad Lazarenko
Works beautifully also. Had to give the answer to the @rafl just for the `grep-dctrl` finding.
rmarimon
+1  A: 

When working with Debian Packages files, you might find grep-dctrl useful. It's incredibly flexible in both the ways it allows to limit the data it outputs, as well as in how to output it. Instead of trying to parse the Packages file format myself, I'd just ask grep-dctrl to do it for me, and print only the bits if information I'm actually interested in:

$ grep-dctrl -n -s Package,Version nimbox /var/lib/apt/lists/..._Packages

That would give you something like:

nimbox-apexer-sales
1.0.0-201007241449

nimbox-tomcat
6.0.26-5

With that, it's only a matter of joining the right lines together, which is easy enough with, for example, perl:

$ ... |perl -pi -0e's/(?<!^)\n(?!\n)/ /mg; s/\n\n/\n/g'
nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5

or any set of other standard UNIX tools you happen to like.

It's certainly possible to go directly from the Packages file format to what you want, but using tools specialized for the job seems like a good idea to me.

rafl
Great grep-dctrl command.
rmarimon
Actually settled for `grep-dctrl -n -s Package,Version nimbox Packages | paste -s -d "\t \n"`
rmarimon
A: 

Using RPMs, the solution would have been:

rpm -qa --queryformat "%{NAME}\t%{VERSION}\n"

Too bad for the sed challenge.

gawi
+1  A: 

Pure sed solution (using FreeBSD sed on Mac OS X):

# See: 
# http://sed.sourceforge.net/sedfaq3.html#s3.3: ... (6) Relentless ...
# http://sed.sourceforge.net/sed1line.txt: ... # if a line begins with ...

sed -n '/^Package:/{
:a
N
/\nVersion:/!ba
p
}' file |
sed -E -e :a -e $'$!N;s/\\nVersion: */\t/;ta' -e 'P;D' |
sed -e 's/^Package: *//'
trevor
+1  A: 

Here is a sed version:

  sed -ne 's/Package: \(.*\)/\1/p' 
      -ne 's/Version: \(.*\)/\1/p' < filename
      | sed 'N;s/\n/ /g'
dheerosaur
Works beautifully. Will change the paste command I had for the last part of your sed command. Thanks !!!
rmarimon