tags:

views:

94

answers:

4

Folks,

I have a file that contains ldap entries and I want to remove "version: 1" lines from the second occurrence and on. I know sed can do things like this, but since I am very new, I don't know how to proceed. This is a Solaris 10 machine and the file looks like as follows:

version: 1
dn: uid=tuser1,ou=people,o=example.com,o=isp
cn: tuser1
uidNumber: 3
gidNumber: 3
homeDirectory: /export/home/tuser1
loginShell: /bin/sh
objectClass: posixAccount
objectClass: shadowAccount
objectClass: account
objectClass: top
uid: tuser1
shadowLastChange:
userPassword:

version: 1
dn: uid=tuser2,ou=people,o=example.com,o=isp
uidNumber: 20
cn: tuser1
gidNumber: 3
homeDirectory: /export/home/tuser2
loginShell: /bin/sh
objectClass: posixAccount
objectClass: shadowAccount
objectClass: account
objectClass: top
uid: tuser1
shadowLastChange:
userPassword: 

version: 1
dn: uid=tuser3,ou=people,o=example.com,o=isp
uidNumber: 10
cn: tuser3
gidNumber: 3
homeDirectory: /export/home/tuser3
loginShell: /bin/sh
objectClass: posixAccount
objectClass: shadowAccount
objectClass: account
objectClass: top
uid: tuser3
shadowLastChange:
userPassword: 

version: 1
dn: uid=loperp,ou=people,o=example.com,o=isp
uid: loperp
userPassword:
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: top
sn: pop
cn: loper

version: 1
dn: uid=tuser4,ou=people,o=example.com,o=isp
userPassword: 
uid: tuser4
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: top
sn: User4
cn: Test
+5  A: 

With GNU sed

sed -ni '0,/version: 1/{p; d}; /version: 1/!p' ldap.txt

EDIT: This was initially wrong. When the first line wasn't version, it printed duplicates.

The GNU version is simpler. It prints (p) from the beginning until the first line matching the version regex, both inclusive. Also, for each line in that range, after printing we delete the pattern space and start a new cycle (d). Basically, this means go to the beginning of the script and to the next line (this avoids double printing). Unlike (standard) 1,/regex/, if the first line matches, it will not continue to another matching line.

If we haven't d'ed (so we're after the first version: 1), we then simply print every line that doesn't match the regex (!).

With standard sed):

sed -ni 'p; /version: 1/ b nov; d; :nov /version: 1/!p; n; b nov' ldap.txt

This begins by simply printing every line (p). After that print, if we match the regex, we branch to the nov (no version) label; the label name is up to us. If we do not branch, we (d) delete the pattern space and start a new cycle (newline, beginning of script). In nov, we print the line if it does not match (same as GNU). We then go to a new line, and branch back to nov. This loop continues until the end.


I (Jonathan Leffler) can confirm @kuti's observations on Solaris 10 standard 'sed'; what works is:

/bin/sed -n 'p
/version: 1/ b nov
d
:nov
/version: 1/!p
n
b nov' ldap.txt

The 'semi-colons in lieu of newlines' trick does not seem to work universally with Solaris 'sed'. Specifically, at the least, there cannot be a semi-colon after any use of a label.

This seems to work:

/bin/sed -n 'p; /version: 1/ b nov
d; :nov
/version: 1/!p; n; b nov' ldap.txt

(I can't think how to present the fix in a comment - the multiline formatting is crucial here.)

Matthew Flaschen
I tried the first suggestion without the -i option, it works but deletes all occurences, which is not what I am trying to accomplish. I want to keep the first ocurrence of version: 1 but delete the rest....
kuti
@kuti, are you sure you're using GNU sed? The default Solaris version is not. Did you try the second solution?
Matthew Flaschen
Here is what I get:root@solix# sed -n 'p; /version: 1/ b nov; d; :nov /version: 1/!p; n; b nov' zdir1-user-entry-full.txtLabel too long: p; /version: 1/ b nov; d; :nov /version: 1/!p; n; b nov
kuti
Hmm, I posted a version (still second) with one-character labels. Try that.
Matthew Flaschen
Thanks, @Jonathan.
Matthew Flaschen
@Matthew, Perfect. That one works. I really appreciate it. Could you explain a little bit what is going on inside sed? a bit cryptic to me. what do p, b nov elements do inside your sed?
kuti
@kuti, I added explanations. Let me know if I can clarify anything else.
Matthew Flaschen
@Matthew, thanks It is more clear now, looks like I need to read more about sed :), happy camper now....
kuti
+2  A: 

A simple answer uses awk:

awk '{ if ($0 ~ /^version: 1$/) { if (count++ == 0) print; }
       else print;
     }'

This assumes that you really mean you want only the first 'version: 1' line and don't mind keeping multiple 'version: 2' lines, etc.

Jonathan Leffler
No, that's not what I meant. I need the first occurence of version :1 and delete the rest of the version 1's from the list.
kuti
@kuti: I'm puzzled - you say "that's not what I meant" but then go on to request what (the bug-fixed version of) my answer gives you.
Jonathan Leffler
hmm. It deleted all occurences in mine when I tried your awk solution.Maybe I did not explain well the problem.Long story short, I have a list of users that I need to get their details from the ldap server. I do this in a for loop:for i in `cat $LDAP1-user-list.txt`doldapsearch -h $LDAP1 -D "cn=Directory Manager" -w $PWD -b o=$DOMAIN,o=$MYO uid=$i >> $LDAP1-user-entry-full.txtdoneAbove for loop inevitably gets all the information I need but with extra version: 1's after each user. I need to keep the first version: 1 and delete the rest, since this is a standard ldiff file.
kuti
@kuti: When I tried the awk solution on Solaris 10 with both GNU and Sun versions of awk on the test data from the question, I got the first 'version: 1' in the output. The comments about 'version: 2' etc are not directly relevant - you only have 'version: 1' lines in your data and don't need to worry about alternatives (or, if the version ever changes, you'll have to do a good deal of work, not only here but everywhere else). You could, of course, adapt the loop in your comment to add the 'version: 1' line up front, then for each ldapsearch, delete the version line since you don't want it.
Jonathan Leffler
A: 

here's another awk version

awk '/version: 1/{c++}c>1{gsub("version: 1","")}1' file
ghostdog74
A: 

Using man 1 ed we can mark the line containing the first match and increment it by 1 to get:

#  'm+1,$  
#  ... which creates a line address space of:  
#  /first line matched + 1/,/last line/

# http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
[[ $(grep -c -m 1 '^version: 1' file) -eq 1 ]] && \
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' | ed -s file
   H
  /^version: 1/km
  'm+1,$g/^version: 1/d
  wq
EOF
yabt