tags:

views:

913

answers:

5

I have a file which contains "title" written in it many times. How can I find the number of times "title" is written in that file using the sed command provided that "title" is the first string in a line? e.g.

# title
title
title

should output the count = 2 because in first line title is not the first string.

Update

I used awk to find the total number of occurrences as:

awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt

But how can I tell awk to count only those lines having title the first string as explained in example above?

+2  A: 

I don't think sed would be appropriate, unless you use it in a pipeline to convert your file so that the word you need appears on separate lines, and then use grep -c to count the occurrences.

I like Jonathan's idea of using tr to convert spaces to newlines. The beauty of this method is that successive spaces get converted to multiple blank lines but it doesn't matter because grep will be able to count just the lines with the single word 'title'.

pavium
Beat me by 15 seconds - drat. I should be less verbose.
Jonathan Leffler
I think i should then leave sed. May be awk will do the magic for me. See the updated question please.
baltusaj
+1  A: 

Revised answer

Succinctly, you can't - sed is not the correct tool for the job (it cannot count).

sed -n '/^title/p' file | grep -c

This looks for lines starting title and prints them, feeding the output into grep to count them. Or, equivalently:

grep -c '^title' file

Original answer - before the question was edited

Succinctly, you can't - it is not the correct tool for the job.

grep -c title file

sed -n /title/p file | wc -l

The second uses sed as a surrogate for grep and sends the output to 'wc' to count lines. Both count the number of lines containing 'title', rather than the number of occurrences of title. You could fix that with something like:

cat file |
tr ' ' '\n' |
grep -c title

The 'tr' command converts blanks into newlines, thus putting each space separated word on its own line, and therefore grep only gets to count lines containing the word title. That works unless you have sequences such as 'title-entitlement' where there's no space separating the two occurrences of title.

Jonathan Leffler
+1  A: 

just one gawk command will do. Don't use grep -c because it only counts line with "title" in it, regardless of how many "title"s there are in the line.

$ more file
#         title
#  title
one
two
#title
title title
three
title junk title
title
four
fivetitlesixtitle
last

$ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file
total: 7

if you just want "title" as the first string, use "==" instead of ~

awk '$1 == "title"{++c}END{print c}' file
ghostdog74
how can i just count those lines having title the first string? That will make total: 3 in you example
baltusaj
Since the question got changed (probably while you were answering), there is no longer a need to count the number of occurrences of title anywhere in a line - only those at the start of the line count.
Jonathan Leffler
@Johnathan, it doesn't matter. this method does it all. If requirement changes to count "title" everywhere, there is minimal change to the code.
ghostdog74
Thanks ghostdog74 for helping
baltusaj
+1  A: 
sed 's/title/title\n/g' file | grep -c title
That's essentially the same as the first part of **Jonathan Leffler's** answer.
Dennis Williamson
yes, looks similar, but not quite. different way of doing it in sed.
+1  A: 

Never say never. Pure sed (although it may require the GNU version).

#!/bin/sed -nf
# based on a script from the sed info file (info sed)
# section 4.8 Numbering Non-blank Lines (cat -b)
# modified to count lines that begin with "title"

/^title/! be

x
/^$/ s/^.*$/0/
/^9*$/ s/^/0/
s/.9*$/x&/
h
s/^.*x//
y/0123456789/1234567890/
x
s/x.*$//
G
s/\n//
h

:e

$ {x;p}
Dennis Williamson