views:

3194

answers:

8

We have a list of (let's say 50) reports that get dumped into various folders depending on certain conditions. All the reports have standard names eg. D099C.LIS, D18A0.LIS etc.

Sometimes a report can exist in up to 5 different locations, and I need to generate a list of all the locations of the most recent version of each report.

I can do it easily using code, or redirecting "dir" or "ls" output into a text file and then manipulating it in Excel, but I'd prefer a simpler (hopefully a one-liner) solution either using DOS, bash, or PowerShell.

The best I've come up with so far in PowerShell (I've done something similar using bash) is:

ls -r -fi *.lis | sort @{expression={$_.Name}}, @{expression={$_.LastWriteTime};Descending=$true} | select Directory, Name, lastwritetime

That will recursively list all files with *.lis extension, then sort it by name (asc) and date (desc), and then display the directory, name, and date.

This gives this sort of output:

C:\reports\LESE            D057A.LIS                  28/01/2009 09:00:43
C:\reports\JCSW            D057A.LIS                  27/01/2009 10:50:21
C:\reports\ALID            D075A.LIS                  04/02/2009 12:34:12
C:\reports\JCSW            D075B.LIS                  05/02/2009 10:07:15
C:\reports\ALID            D075B.LIS                  30/01/2009 09:14:57
C:\reports\BMA3            D081A.LIS                  01/09/2008 14:51:36

What I obviously need to do now is remove the files that aren't the most recent versions, so that the output looks like this (not too worried about formatting yet):

C:\reports\LESE            D057A.LIS                  28/01/2009 09:00:43
C:\reports\JCSW            D075B.LIS                  05/02/2009 10:07:15
C:\reports\BMA3            D081A.LIS                  01/09/2008 14:51:36

Anyone have any ideas?

[edit] Some good ideas and answers to this question. Unfortunately I can't mark all as accepted, but EBGreen's (edited) answer worked without modification. I'll add working solutions here as I verify them.

bash:

 ls -lR --time-style=long-iso | awk 'BEGIN{OFS="\t"}{print $5,$6,$7,$8}' | grep ".LIS" | sort -k4 -k2r -k3r | uniq -f3
 ls -lR --time-style=long-iso | awk 'BEGIN{OFS="\t"}{print $5,$6,$7,$8}' | grep ".LIS" | sort -k4 -k2r -k3r | awk '!x[$4]++'

PowerShell:

  ls -r -fi *.lis | sort @{expression={$_.Name}}, @{expression={$_.LastWriteTime};Descending=$true} | select Directory, Name, lastwritetime | Group-Object Name | %{$_.Group | Select -first 1}
  ls -r . *.lis | sort -desc LastWriteTime | group Name | %{$_.Group[0]} | ft Directory,Name,LastWriteTime
  ls -r -fi *.lis | sort @{expression={$_.Name}}, @{expression={$_.LastWriteTime};Descending=$true} | unique | ft Directory,Name,LastWriteTime
+8  A: 
ls -r -fi *.lis | sort @{expression={$_.Name}}, @{expression={$_.LastWriteTime};Descending=$true} | select Directory, Name, lastwritetime | Group-Object Name | %{$_.Group | Select -first 1}
EBGreen
That only gives the first 3 in the list. It's not what I want at all. I need a list of the most recent version of every file.
ilitirit
Aaah...I misunderstood. I'll work on that.
EBGreen
See if that works now.
EBGreen
Fantastic, thanks
ilitirit
Epic pipe! I haven't ever used the @{Name;Expression}-type hashtables in a sort.
Peter Seale
+2  A: 

In bash you could pipe your answers through uniq. I'm not sure of the exact structure for the results of your bash 1-liner but the right arguments to -w N and -s N ought to do it.

Nick Fortescue
Good idea, I'll try it.
ilitirit
It worked, thanks:ls -lR --time-style=long-iso | awk 'BEGIN{OFS="\t"}{print $5,$6,$7,$8}' | grep ".LIS" | sort -k4 -k2r -k3r | uniq -f3
ilitirit
cool. You could probably do it all in awk (it has associative arrays) but what you've just commented is just as readable.
Nick Fortescue
A: 

Can you use perl? Something like:

your command | perl 'while (<STDIN>) { ($dir,$name,$date) = split; $hash{$name} = ($dir,$date);} foreach (keys %hash) { print "$hash{$}[0] $ $hash{$_}[1]\n"; }'

This could be wrong in the details (it's been too long since I used perl in anger) but the basic idea being to keep a hash of results keyed on filename and always overwriting the previous entry when encountering a new entry. That way, as long as the order of lines coming in is right, you'll only get the most recently touched files coming out.

AndyB
+1  A: 

The problem seems to be finding unique based on particular field. awk can be used to solve this problem. Saw this blog entry which has one approach. For eg, in bash one could do:

find . -name "*.lis" -print | xargs ls -tr | awk -F/ '!x[$NF]++'

I find it easier just to use ls, rather than streaming the results from find because of the way the data is ordered. I've added your suggestion of using awk as a possible solution though.
ilitirit
A: 

$f = ls -r -fi *.lis | sort name,lastWriteTime -desc

# Remove -whatIf to delete the files

$f[1..$f.length] | Remove-Item -whatIf

Shay Levy
This just lists all the files (and deletes them afterwards). I don't want to see all the files, just the most recent versions.
ilitirit
+1  A: 

Powershell:

ls -r . *.lis | sort -desc LastWriteTime | sort -u Name | ft Directory,Name,LastWriteTime

Explanation:

  1. get the files recursively
  2. sort the files descending by LastWriteTime
  3. sort the files by Name, selecting unique files (only the first).
  4. format the resulting FileInfo objects in a table with Directory, Name and Time

Alternative which does not rely on sort being stable:

ls -r . *.lis | sort -desc LastWriteTime | group Name | %{$_.Group[0]} | ft Directory,Name,LastWriteTime
  1. get the files recursively
  2. sort the files descending by LastWriteTime
  3. group the files by name
  4. for each group select the first (index zero) item of the group
  5. format the resulting FileInfo objects in a table with Directory, Name and Time
This doesn't work "sort -u Name" resorts the data. It doesn't honour the original sort by date. That's why I used "sort @{expression={$_.Name}}, @{expression={$_.LastWriteTime};Descending=$true}"
ilitirit
Ah, but "sort" is stable, i.e. if sort keys are equal, it preserves their initial order. However, I must admit that this is empirical and I haven't been able to verify that from the docs. I have edited and added an alternative.
I've tested the original and it doesn't work, unfortunately. I'll try your second version in a bit.
ilitirit
At least one of the examples in the official doc mentions that *group* is stable (it sorts before grouping).
I've just confirmed that your second example works.
ilitirit
+1  A: 

Another alternative in PowerShell, more "script" like:

ls -r . *.lis | sort LastWriteTime | %{$f=@{}} {$f[$_.Name]=$_} {$f.Values} | ft Directory,Name,LastWriteTime
  1. get the files recursively
  2. sort them ascending by last write time
  3. initialize a hashmap (associative array)
  4. for each file assign it using the name as key - later entries will overwrite previous ones
  5. get the Values of the hashmap (excluding keys)
  6. format as a table

Note, the FileInfo objects are retained throughout the pipeline. You can still access any property/method of the objects or format them any way you like.

A: 

ls -ARFlrt | awk '{print $6,$7,$8}'|grep 2010|sort -n

Was looking for similar. The above has helped me get listing I was after in bash. The grep is optional (of course). \thanks

Fregus