views:

274

answers:

2

Hi there i have a truckload of files with sql commands in them, i have been asked to extract all database table names from the files How can I use grep and sed to parse the files and create a list of the unique table names in a text file ..one per line?

the name names all seem to start with "db_" which is handy!

what would be the best way to use grep and sed together to pull the table names out?

yours in head scratchyness

Buzz

+6  A: 

This will search for lines containing the table names. The output of this will quickly reveal if a more selective search is needed:

grep "\<db_[a-zA-Z0-9_]*" *.sql

Once the proper search is sorted out, remove all other characters from lines with tablenames:

grep "\<db_[a-zA-Z0-9_]*" *.sql  |  sed 's/.*\(\<db_[a-zA-Z0-9_]*\).*/\1/'

Once that's running, add on a sort and remove duplicates:

(same last pipe expression)  |  sort  |  uniq
wallyk
+1. To catch potentially several table names in the same line, you can split the lines with tr: `grep .. *.sql | tr ' ' '\n' | grep .. | sed`. First grep is to ease the burden for `tr`, the second grep is to filter out the non-"db_" parts.
orip
A: 

you just need grep

grep -owE "db_[a-zA-Z0-9]+" file|sort -u

or awk

awk '{for(i=1;i<=NF;i++)if($i~/^db_[a-zA-Z0-9]+/){print $i} }' file
ghostdog74