views:

38

answers:

0

I have a field in a text file exported from a database. The field contains addresses but sometimes they are quite long and the database allows them to contain multiple lines. When exported, the newline character gets replaced with a dollar sign like this:

first part of very long address$second part of very long address$third part of very long address

Not every address has multiple lines and no address contains more than three lines. The length of each line is variable.

I'm massaging the data for import into MS Access which is used for a mailmerge. I want to split the field on the $ sign if it's there but if the field only contains 1 line, I want to set my two extra output fields to a zero length string so that I don't wind up with blank lines in the address when it gets printed.

I have an awk file that's working correctly on all the other data in the textfile but I need to get this last bit working. I tried the below code. Aside from the fact that I get a syntax error at the else, I'm not sure this is a good way to do what I want. This is being done with gawk on Windows.

BEGIN { FS = "|" }
$1 != "HEADER" {
    if ($6 ~ /\$/)
        split($6, arr, "$")
        address = arr[1]
        addresstwo = arr[2]
        addressthree = arr[3]
        addressLength = length(address)
        addressTwoLength = length(addresstwo)
        addressThreeLength = length(addressthree)

    else {
        address = $6
        addressLength = length($6)
        addresstwo = ""
        addressTwoLength = length(addresstwo)
    addressthree = ""
        addressThreeLength = length(addressthree)
        }

    printf("%*s\t%*s\t\%*s\n",
          addressLength, address, addressTwoLength, addresstwo, addressThreeLength, addressthree)
}

EDIT: Sorry about that. Here's a sample

HEADER|0000000130|0000527350|0000171250|0000058000|0000756600|0000814753|0000819455|100106 rec1|ILL/COLORADO COLLEGE$TUTT LIBRARY|1021 N CASCADE$COLORADO SPRINGS, CO 80903| rec2|ILL /PIKES PEAK LIBRARY DISTRICT|20 N. CASCADE AVE. / PO BOX 1579$COLORADO SPRINGS, CO 80903| rec3|DOE,JOHN|PO Box 8034| rec4|ILL/GEORGIA INSTITUTE OF TECHNOLOGY|INFORMATION DELIVERY DEPT$704 CHERRY ST$ATLANTA, GA 30332-0900

I match only lines without HEADER in them. I need to split the textstrings on the $ signs. The string between the pipes should not be padded (which is why I was trying to get the length in my original code). For this example, there are 6 output fields and any field for which there is no data is simply an empty string (also what I was trying to do in the code).

rec1|ILL/COLORADO COLLEGE|TUTT LIBRARY|1021 N CASCADE|COLORADO SPRINGS, CO 80903|| rec2|ILL /PIKES PEAK LIBRARY DISTRICT||20 N. CASCADE AVE. / PO BOX 1579|COLORADO SPRINGS, CO 80903|| rec3|DOE,JOHN||PO Box 8034||| rec4|ILL/GEORGIA INSTITUTE OF TECHNOLOGY||INFORMATION DELIVERY DEPT|704 CHERRY ST|ATLANTA, GA 30332-0900|

Hope that helps! Let me know if this still isn't clear.