views:

250

answers:

2

I have a form that accepts a list of values, each value being listed on a separate line of textArea. In my Servlet, I am tokenizing the string I recieve from that textArea based on the new line characters "\r\n", like so:

String[] partNumberList = originalPartNumberString.split("\r\n");

This appears to work fine. I get an array of values as expected. I believe this is because the browser handles standardizing the way newlines are sent to the server, regardless of what OS / browser the form data is being sent from (see this post). I've tested in IE, Firefox, Chrome ... everything appears to work fine with that and I feel pretty confident about it.

After receiving the values on the server side, I then use those values for some looks ups, etc., then I write them back to the textArea for the response. In order to do so, I am writing it back in the same fashion I am receiving it ... I just build a new String, and separate each value with a "\r\n". I then set the value of the textArea to that String.

StringBuffer invalidReturnPartList = new StringBuffer("");

for (int i = 0; i < requestedPartList.length; i++)
{
    invalidReturnPartList.append(requestedPartList[i]);
    invalidReturnPartList.append("\r\n");
}

return invalidReturnPartList.toString();

This also tests OK for me in all browsers I have tried. However, I'm just nervous about whether I'm covering all my bases here ... if someone is running a Mac, will the "\r\n" translate correctly on their browser? What about Linux? I would think everything would be handled in the browser, but I am just not sure here... so my question is, does this look right to you, or have I missed something?

Thanks in advance.

+1  A: 

If you look up the HTTP protocol definition, you'll find that:

HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications). The end-of-line marker within an entity-body is defined by its associated media type, as described in section 3.7.

But that is not valid for the body. I assume you send the form information with a post request, so I assume the content type text/plain is used, and in that case I think the following applies:

3.7.1 Canonicalization and Text Defaults

Internet media types are registered with a canonical form. An
entity-body transferred via HTTP messages MUST be represented in the
appropriate canonical form prior to its transmission except for "text" types, as defined in the next paragraph.

When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body. HTTP applications MUST accept CRLF, bare CR, and bare LF as being representative of a line break in text media received via HTTP.

That means, it would be okay for a browser to send you UNIX style endlines.

(Both paragraphs are from http://www.ietf.org/rfc/rfc2616.txt)

chris166
I see your point. However, I think when the form is submitted, its submitted with Content Type "application/x-www-form-urlencoded".This post(http://stackoverflow.com/questions/760282/do-line-endings-distinctions-apply-for-html-forms) references a document (http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1) which seems to imply that line breaks are always replaced with "CR LF" when a form is submitted with the "application/x-www-form-urlencoded" content type ... "Line breaks, as in multi-line text field values, are represented as CR LF pairs, i.e. `%0D%0A'."
JasonStoltz
Yes you're right. I thought the content type also started with text/* . Well, then your code should work ;)
chris166
This post is old, but still, I wanted to give you some credit ... I just gave you an upvote for this answer. I ended up answering my own question, but I couldn't have without this answer to put me on the right track.
JasonStoltz
+1  A: 

I'm going to attempt to answer my own question here.

Since the values of the textArea are form data, and the form is submitted to the server with Content Type "application/x-www-form-urlencoded", the new lines are converted to "CR LF" by the browser before submitting to the server according to the HTML spec (see http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1).

So in this case, my code should work consistently, regardless of browser or OS.

However, if I were trying to implement the same code client-side (let's say, with JavaScript), perhaps to validate the form before submission ... that may be a different story. Since the form data has not been canonicalized at this point, it is most likely dependent on whatever the platform/browser uses for new lines. In that case, I would probably need to check not only for "\r\n", but also for "\r" and "\n".

JasonStoltz