tags:

views:

77

answers:

6

I am new to shell scripting so i need some help need how to go about with this problem.

I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data

AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat

As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat

What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.

+2  A: 

Try:

$ ls -lr

Hope it helps.

Pablo Santa Cruz
Hi, Wouldnt that sort it using the system timestamp for the file? I was interested on the timestamp on the actual filename. Thanks
ziggy
No, it sorts the files by name according to your locale. If you wanted to sort by the system timestamp, you would need the `-t` flag.
igor
+1  A: 

Use:

ls -r -1 AA_*.dat | head -n 1

(assuming there are no other files matching AA_*.dat)

igor
+1  A: 
ls -1 AA* |sort -r|tail -1
ghostdog74
+2  A: 

For those who just want an answer, here it is:

ls | sort -n -t _ -k 2 | tail -1

Here's the thought process that led me here.

I'm going to assume the [RANGE] portion could be anything.

Start with what we know.

  • Working Directory: /incoming/external/data
  • Format of the Files: [RANGE]_[YYYYMMDD].dat

We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.

Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):

I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):

NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`

Then it's just a matter of putting the underscore back in, which shouldn't be too hard.

EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.

Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):

ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'

... while writing this out, I discovered that you can just do this:

ls | sort -n -t _ -k 2 | tail -1

I'll break it down into parts.

ls

Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.

awk -F_ '{print $1 " " $2}'

This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ") , followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.

Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:

sort -n -k 2

This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:

sort -n -t _ -k 2

This achieves the same result. Now you only want the last file, so:

tail -1

If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:

sed 's/ /_/'

Some good info here, but I'm sure most people aren't going to read down to the bottom like this.

Marc Reside
I tried this but it didnt work. Could you explain what exactly it is doing. thanks
ziggy
Well, I updated after testing. I had to fix something in my awk command, and then discovered how it really wasn't needed. Solution is at the top, explanation is long and not necessary, but I enjoyed writing it.
Marc Reside
+3  A: 

This should work:

newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))

mv "$newest" newdir
mv "${others[@]}" otherdir

It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.

Dennis Williamson
Hi, what are the round brackets for?
ziggy
@ziggy: Do you mean the outer set on the second line? They create an array which is used in the last line.
Dennis Williamson
Hi Dennis, I was refering to both the inner and outer round brackets. I tried running the above but the brackets are causing syntax errors. I am using the bourne shell. Are these korn shell specific constructs?
ziggy
@ziggy: The inner parentheses (really `$()`) are for command substitution. They work better than, but perform the same function as, backticks. The syntax I've shown is for Bash, which you tagged your question. It should also work with ksh. The `$()` should work in `sh` but the array syntax won't because the Bourne shell doesn't have arrays.
Dennis Williamson
I like this solution. If ziggy wants to do this in a script, he can always specify #!/usr/sh at the start of the script, can't he? There's not much in the Bourne shell that can't be done in sh.
Marc Reside
@Marc: The Bourne shell *is* `sh` (perhaps you mean "Bash" - aka "Bourne Again"). There's an awful lot that Bash can do that the Bourne shell can't. For example, the array syntax shown requires Bash (or ksh or zsh) since, as I said, Bourne (`sh`) doesn't have arrays. So the shebang line would need to be `#!/bin/bash` (rather than `#!/bin/sh` - where you have "/usr").
Dennis Williamson
@Dennis: Thanks. Too many shells for me. Now that I dig deeper, I tend to use bash as my default go-to, which was indeed what I was thinking. I don't tend to need to write too many complicated scripts day-to-day, though, so all my scripting is in sh, or on occasion ksh.
Marc Reside
+1  A: 

Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.

Therefore, in bash

mv $(ls | tail -1) first-directory
mv * second-directory

Should do the trick.

If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat

Beano
This also works but i am trying to avoid relying on the system to do the sorting for me (i.e. via the ls cmd) . Thanks
ziggy
Why do you not want to rely on `ls` - what do you mean by 'system'?
Beano