tags:

views:

55

answers:

1

I have an ASP.NET MVC application that allows the user to upload a file that should only contain plain text.

I am looking for a simple approach to validate that the file does indeed contain only text.

For my purposes I am happy to define text as any of the characters that I can see printed on my GB QWERTY keyboard.

Business rules mean that my uploaded file won't contain any accented characters, so it doesn't matter if the code accepts or rejects these.

Approaches so far that have not worked:

  • Checking the content-type; no good as this is dependant on the file extension
  • Checking char.IsControl for each character; no good as the file can contain pipe (|) characters which are considered to be control characters

I'd rather avoid using a lengthy Regex pattern to get this to work.

+2  A: 

It sounds like you want ASCII characters 32-126 plus a few odds and ends like 9 (horizontal tab), carriage return & linefeed, etc..

I'd rather avoid using a lengthy Regex pattern to get this to work.

As long as that doesn't mean 'no regular expressions at all', you can use the accepted answer from this stack overflow question (I've added the horizontal tab character to the original):

^([^\x09\x0d\x0a\x20-\x7e\t]*)$
Jeff Sternal
In the end we settled on using a very simple Regex of [A-Za-z0-9/|], which covers just the characters that are allowed in the file.
Richard Ev