The best I was able to do with sed, was this script:
s/[\s\t]*|[\s\t]*/|/g
s/[\s\t]*$//
s/^|/null|/
In my tests, this ran about 30% faster than your sed script. The increase in performance comes from combining the first two regexen and omitting the "g" flag where it's not needed.
However, 30% faster is only a mild improvement (it should still take about an hour and a half to run the above script on your 1GB data file). I wanted to see if I could do any better.
In the end, no other method I tried (awk, perl, and other approaches with sed) fared any better, except -- of course -- a plain ol' C implementation. As would be expected with C, the code is a bit verbose for posting here, but if you want a program that's likely going to be faster than any other method out there, you may want to take a look at it.
In my tests, the C implementation finishes in about 20% of the time it takes for your sed script. So it might take about 25 minutes or so to run on your Unix server.
I didn't spend much time optimizing the C implementation. No doubt there are a number of places where the algorithm could be improved, but frankly, I don't know if it's possible to shave a significant amount of time beyond what it already achieves. If anything, I think it certainly places an upper limit on what kind of performance you can expect from other methods (sed, awk, perl, python, etc).
Edit: The original version had a minor bug that caused it to possibly print the wrong thing at the end of the output (e.g. could print a "null" that shouldn't be there). I had some time today to take a look at it and fixed that. I also optimized away a call to strlen()
that gave it another slight performance boost.