ansaurus

Question

Modification of date format within a text file

Answer 1

+2 A:

A bit clunky, but you could do:

sed -e 's/^\(..\)JAN\(..\)/\2\/01\/\1/'
sed -e 's/^\(..\)FEB\(..\)/\2\/02\/\1/'
...

In order to run sed in-place, see the -i commandline option:

sed -i -e ...

Edit

Just to point out that this answers a previous version of the question where AWK was not specified.

Draemon 2009-07-21 11:22:43

Thanks ! I'll take a look at this.

Ackheron 2009-07-21 11:25:51

I have to ask - why the downvote? I said it was clunky, but it took seconds to write, and it works. The AWK solutions are nice, but more complex.

Draemon 2009-07-22 11:04:23

Ehm sorry, I don't really get the point on the votes . . . I didn't accepted your post as final answer, I wanted to wait. But I haven't "downvoted" it I think. Or if so, it wasn't intentional. As it is another way to solve the issue, it's helpful! I "upvoted" it now. :) Thx !

Ackheron 2009-07-22 23:06:59

its clunky because you also executed sed 12 times(for each month), making it inefficient. think that's why it gets down voted.

ghostdog74 2009-07-22 23:48:59

Ackheron: No you did the right thing - you accepted the best answer - it doesn't downvote unless you specifically click the down arrow.ghostdog74: sure it's clunky (like I said), but I'm sure the performance difference would be negligible in real terms.

Draemon 2009-07-23 09:14:06

you are calling the file 12 times. if the file is a huge file that would be a problem in terms of performance. you can "improve" on it, by taking out the extra "sed". just use it one time.

ghostdog74 2009-07-24 00:53:25

The OP didn't say the file was huge, or that performance was a primary concern. My solution answers the original question, performance perfectly well for most real-life cases, and overall is simple to understand. You could trade simplicity for performance, but not until you can justify it for the particular scenario in question.

Draemon 2009-07-26 13:03:32

a good programmer/coder have to look at all possible scenarios to make good and resilient code. why wait for things to happen.

ghostdog74 2009-07-26 13:33:47

No. A good programmer does not write code for all possible scenarios. A good programmer writes code for all *probable* scenarios, and ensures his code is easy to change/extend when unexpected requirements emerge. Maybe you should have suggested handwritten assembly for ultimate performance? Clarity is way more important.

Draemon 2009-07-26 15:16:57

Answer 2

+4 A:

I don't think grep is the right tool for the job myself. You need something a little more expressive like Perl or awk:

echo '07JAN01, -0.24729E+07, -0.46713E+07, 0.35581E+07
      07JAN02, -0.24729E+07, -0.46713E+07, 0.35581E+07
      07AUG03, -0.24729E+07, -0.46713E+07, 0.35581E+07' | awk -F, '
{
    yy=substr($1,1,2);
    mm=substr($1,3,3);
    mm=(index(":JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",mm)+2)/4;
    dd=substr($1,6,2);
    printf "%02d/%02d/%02d,%s,%s,%s\n",dd,mm,yy,$2,$3,$4
}'

which generates:

01/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
02/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
03/08/07, -0.24729E+07, -0.46713E+07, 0.35581E+07

Obviously, that's just pumping some test data through a command line awk script. You'd be better off putting that into an actual awk script file and running your input through it.

If datchg.awk contains:

{
    yy=substr($1,1,2);
    mm=substr($1,3,3);
    mm=(index(":JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",mm)+2)/4;
    dd=substr($1,6,2);
    printf "%02d/%02d/%02d,%s,%s,%s\n",dd,mm,yy,$2,$3,$4
}

then:

echo '07JAN01, -0.24729E+07, -0.46713E+07, 0.35581E+07
      07JAN02, -0.24729E+07, -0.46713E+07, 0.35581E+07
      07AUG03, -0.24729E+07, -0.46713E+07, 0.35581E+07' | awk -F, -fdatechg.awk

also produces:

01/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
02/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
03/08/07, -0.24729E+07, -0.46713E+07, 0.35581E+07

The way this works is as follows. Each line is split into fields (-F, sets the field separator to a comma) and we extract and process the relevant parts of field 1 (the date). By this I mean the year and day are reversed and the textual month is turned into a numeric month by searching a string for it and manipulating the index where it was found, so that it falls in the range 1 through 12.

This is the only (relatively) tricky bit and is done with some basic mathematics: the index function simply finds the position within the string of your month (where the first char is 1). So JAN is at position 2, FEB at 6, MAR at 10, ..., DEC at 46 (the set {2, 6, 10, ..., 46}). They're 4 apart so we're going to need to divide by 4 eventually to get consecutive month numbers but first we add 2 so the division will work well. Adding that 2 gives you the set {4, 8, 12, ..., 48}. Then you divide by 4 to get {1, 2, 3, ... 12} and there's your month number:

Text   Pos   +2   /4
----   ---   --   --
JAN      2    4    1
FEB      6    8    2
MAR     10   12    3
APR     14   16    4
MAY     18   20    5
JUN     22   24    6
JUL     26   28    7
AUG     30   32    8
SEP     34   36    9
OCT     38   40   10
NOV     42   44   11
DEC     46   48   12

Then we just output the new information. Obviously, this is likely to barf if you provide bad data but I'm assuming either:

the data is good; or
you'll add your own error checks.

Regarding modifying the files directly, the time-honored UNIX tradition is to use a shell script to save the current file elsewhere, process it to create a new file, then overwrite the old file with the new file (but not touching the saved file, in case something goes horribly wrong).

I won't make my answer any longer by detailing that, you've probably fallen asleep already :-)

paxdiablo 2009-07-21 12:04:17

Thanks a lot ... I've just pasted the code, and it works perfectly. Now I need to study the syntax to understand how it works really ;)Especially this: mm=(index(":JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",mm)+2)/4;Thanks a LOT, pax! It's nice to see that some people are still willing to help newbies, with precise and concise answers ;)

Ackheron 2009-07-21 12:27:57

@Ackheron: the index simply finds the position within the string of your month (first char is 1). So JAN = 2, FEB = 6, MAR = 10, ..., DEC = 46. Then you add 2 to get 4, 8, 12, ..., 48. Then you divide by 4 to get 1, 2, 3, ... 12. See the update.

paxdiablo 2009-07-21 14:14:37

awk is the best. Take any line-based input with fields separated by whitespace and pipe it into awk; you can access each field individually just using $0, $1 etc. e.g. cat myapachelog | awk '{print $10}' to display just the bytes transferred in a single col or cat myapachelog | awk '{total += $10} END {print total}' to output the total bytes served from the logfile

Flubba 2009-07-22 00:11:02

@Pax: Thanks a lot for the update, really helps me to understand how it works by now ! ;) Really clear. And that's smart!@Flubba: Thank you for suggesting AWK as the best, but I guess there must be some case where AWK must struggle to answer the programmer's needs ? Right ? ;)

Ackheron 2009-07-22 18:30:31

Answer 3

+1 A:

awk 'BEGIN{
    OFS=FS=","
    # create table of mapping of months to numbers
    s=split("JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",d,":")
    for(o=1;o<=s;o++){
        m=sprintf("%02s",o)   # add 0 is single digit    
        date[d[o]]=m
    }
}
{
    yr=substr($1,1,2)
    mth=substr($1,3,3)
    day=substr($1,6,2)
    $1=day"/"date[mth]"/"yr    
}1' file

ghostdog74 2009-07-21 14:35:22

Thanks for your solution ghostdog74 I'll give try on this one aswell.

Ackheron 2009-07-22 18:34:17

ansaurus

tags:

views:

answers:

Modification of date format within a text file

related questions