views:

66

answers:

2

I have a HUGE file with a lot of HL7 segments. It must be split into 1000 (or so ) smaller files. Since it has HL7 data, there is a pattern (logic) to go by. Each data chunk starts with "MSH|" and ends when next segment starts with "MSH|".

The script must be windows (cmd) based or VBS as I cannot install any software on that machine.

File structure:

MSH|abc|123|....
s2|sdsd|2323|
...
..
MSH|ns|43|...
...
..
.. 
MSH|sdfns|4343|...
...
..
asds|sds

MSH|sfns|3|...
...
..
as|ss

File in above example, must be split into 2 or 3 files. Also, the files comes from UNIX, so newlines must remain as they are in the source file.

Any help? _Ub

A: 

HL7 has a lot of segments - I assume that you know that your file has only MSH segments. So, have you tried parsing the file for the string "(newline)MSH|"? Just keep a running buffer and dump that into an output file when it gets too big.

Greg Harman
Yes that is what I want to do. I tried that using DOS commands, but could not get too far. Also, the newline's are not Windows (CR + LF), they are just CR.I could figure out a solution in Perl or using some smart TextEditor. But I can't install any software on this server.thanks,_U
UB
I am no VBS expert, but it does appear that it supports regular expressions: http://www.aspfree.com/c/a/Windows-Scripting/Regular-Expressions-in-VBScript/
Greg Harman
+1  A: 

This is a sample script that I used to parse large hl7 files into separate files with the new file names based on the data file. Uses REBOL which does not require installation ie. the core version does not make any registry entries.

I have a more generalised version that scans an incoming directory and splits them into single files and then waits for the next file to arrive.

Rebol [
    file: %split-hl7.r
    author: "Graham Chiu"
    date: 17-Feb-2010
    purpose: {split HL7 messages into single messages}
]

fn: %05112010_0730.dat
outdir: %05112010_0730/

if not exists? outdir [
    make-dir outdir
]

data: read fn
cnt: 0
filename: join copy/part form fn -4 + length? form fn "-"
separator: rejoin [ newline "MSH"]
parse/all data [
    some [
        [ copy result to separator | copy result to end ]
        (
            write to-file rejoin [ outdir filename cnt ".txt" ] result
            print "Got result"
            ?? result
            cnt: cnt + 1
        )
        1 skip
    ]
]
Graham Chiu
Thanks, I'll see if I can figure this new language out.
UB