tags:

views:

60

answers:

4

I have a text file like below

11:00AM JOHN STAMOS 1983-08-07 I like Pizza Hut
12:00AM JACK SPARROW PIRATE 1886-09-07 I like Pizza Hut and DOminoz
11:00AM SANTA 1986-04-01 I like cold beer

How do I sort the above file on the date column? The problem I am facing is due to the variable length name column. Some people have first middle last name where as some have only first name and so on.

A: 
cat file.txt | python -c 'import re, sys; print "".join(sorted(sys.stdin, key=lambda x:re.findall("\d{4}-\d{2}-\d{2}",x)))'
gnibbler
+1  A: 

What you need to do is copy the date to the front and then sort which by default will use the whole line as the sort-key. Then remove the date again.

I used sed to pick out everything up to the (last) date which I located by its nnnn-nn-nn format, and copy the date to the front.

After the sort, just use sed (or cut -c11- would be easier) to delete the date from the front again.

This works in linux:

sed 's/^\(.* \([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] \)\)/\2\1/' | 
sort | 
sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] //'

Giving:

12:00AM JACK SPARROW PIRATE 1886-09-07 I like Pizza Hut and DOminoz
11:00AM JOHN STAMOS 1983-08-07 I like Pizza Hut
11:00AM SANTA 1986-04-01 I like cold beer

This works for your data but could easily get pretty awkward if your data changes (for example you have multiple dates in a line).

Adrian Pronk
+1  A: 
sed 's/\([0-9]\{4\}\(-[0-9]\{2\}\)\{2\}\)/|\1/' | sort -t '|' -k 2| sed s/'|'//
gnibbler
+1:I like your idea of adding an alternative delimiter to the appropriate position. I think that's probably more flexible than my answer.
Adrian Pronk
A: 

Pure Bash:

declare -a array
declare -a order

IFS=$'\n'
array=( $(cat "$infile") )

index=0
for line in "${array[@]}"; do
  [[ "$line" =~ ([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+)  ]]
  key="${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[3]}"
  if [ -z "${order[key]}" ] ; then
    order[key]="$index"
  else
    order[key]="${order[key]} $index"
  fi
  ((index++))
done < "$infile"

IFS=' '
for key in ${order[*]}; do
  printf "%s\n" "${array[key]}"
done

Generates indices from dates and uses them as a sorted list.

fgm
This fails if more than one line has the same date. Otherwise, it's clever.
Dennis Williamson
Yes, you are right. I just improved the solution.
fgm