views:

83

answers:

7

My code passes a big bunch of text data to a legacy lib, which is responsible for storing it. However, it tends to remove trailing whitespace. This is a problem when I read the data back. Since I cannot change the legacy code, I thought about replacing the all spaces with some uncommon ASCII character. When I read back the text, I can replace them back.

  1. Is this a bad idea, considering that I cannot touch the legacy storage code?
  2. Which character can I use as a substitute? I was considering some char upwards of 180.

There will only be spaces - no tabs or newlines - in the data. The data is alphanumeric, with special characters.

A: 

Well you could use ASCII 254 to replace space into the lagacy system.

Raj
ASCII is a 7-bit encoding
anon
Yep! Extended ASCII is only 8 bit. :)
1s2a3n4j5e6e7v
@1s2a3n4j5e6e7v: "Extended ASCII" is a misnomer at best. It doesn't refer to any defined encoding.
Joachim Sauer
@Joachim Sauer http://www.asciitable.com/
1s2a3n4j5e6e7v
@1s2a3n4j5e6e7v as that site says: "The most popular is presented below." - Latin 1 I think.
Douglas Leeder
Not Latin 1 - I'm not sure what encoding that is showing but Latin 1 has the letters later in the sequence.
Douglas Leeder
yep! Wikipedia says,The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue
1s2a3n4j5e6e7v
@1s2a3n4j5e6e7v: I know that site. I get it as an answer whenever I point out that "Extended ASCII" is not a single encoding. However, it's crap. It contains some pretty misleading statements and doesn't even contain links to further documentation that could resolve any misunderstandings that could arise from the text.
Joachim Sauer
+1  A: 

How about a control character (below 32, except CR/LF/TAB/NULL)?

Amnon
I like ☺, personally.
JAB
+2  A: 

You can use Tilda (~) symbol. It doesn't occur mostly in the texts. You can use a '\' if you want to escape it.

1s2a3n4j5e6e7v
+1  A: 

Since you can't change the legacy code, it's essentially a black box (even if you somehow know what's going on intellectually). Therefore the only correct answer is: try out which character works, and use that. (And if no character works, the problem is impossible. That's why legacy code sucks.)

Kilian Foth
+6  A: 

If all you need to protect is the trailing space (embedded space is ok), then what about putting '$' or similar at the end of every text.

Then you can simply remove it when reading it back.

You might have problem if the legacy system already has data in it, but you can read all the existing data, to find a character (or string) which is never used on the end of any of the existing data, and use that to mark new strings (and protect whitespace in them).

Douglas Leeder
But what when the text already starts or ends with $?
Steven
add another one. It's like dot stuffing in SMTP.
jdizzle
Or you can always add a $ when storing and remove it when reading. It's different than in SMTP since the character isn't used to mark the end of the stream.
Amnon
@Amnon - that's what I meant - add a $ to every string you put in, and remove the $ from every string you extract.
Douglas Leeder
+4  A: 

How about using Base64 coding for the whole text? That way it could also handle non-ASCII character sets like UTF-8. The drawback is that you'll lose some of the space (if the legacy system has restrictions on text lengths).

oherrala
+1. This is the first lossless solution.
Steven
This solution would certainly work. There's just more overhead than Leeder's solution.
jdizzle
+1  A: 

All the answers thus far gave solutions that will break once your replacements character already was in the text you supplied. It doesn't matter whether it is a tilde, control character or $. The only right solution is to encode the text before saving it, and decode it when you retrieve it.

What you must do if find an encoding schema that encodes the space character. For instance, you can use URL encoding / decoding, since this will encode space characters.

Steven