ansaurus

Question

System.IO.File.ReadAllText(path) does not read the html file.

Answer 1

+1 A:

I'll take a wild guess:
The file contains unicode sequences for extended chars and the diagnose is based on (mismatched) length.

if I debug the code in the it looks like "<\0h\0t\0m\0l\0>\0<\0h\0e\0a\0d\0>\0\r\0\n\0<\0M\0E\0T\0A\0 \0h\0t\0t\0p\0-\0e\0q\0u\0i\0v\0=\0\"\0C\0o\0n\0t\0e\0n

Which is a valid beginning of a HTML file except for the very first char. The file is probably damaged by missing a unicode marker at the start. This damage was probably caused when it was written and is not (easy) repairable now.

You could try setting the WebClient.Encoding to UTF8 (and try a few ASCII as well).

Henk Holterman 2010-03-15 11:43:12

@Henk Holterman..Only one file can be damaged but there are two files which displays like<\0h\0t\0m\0l\0>\0<\0h\0e\0a\0d\0>\0\r\0\n\0<\0M\0E\0T\0A\0 \0h\0t\0t\0p\0-\0e\0q\0u\0i\0v\0=\0\"\0C\0o\0n\0t\0e\0n.Both of files can not be read.

Harikrishna 2010-03-15 12:58:09

Please use ReadAllBytes (it does work) and post the first 10 bytes as Hex.

Henk Holterman 2010-03-15 13:04:00

And why can't the other file be damaged?

Henk Holterman 2010-03-15 13:06:31

@Henk holterman..Thanks for the answer.It was the perfect answer that both of files were damaged.I had copied the html content of that file into new notepad file and saved that file as html file and now that new file with same html content can be read now.

Harikrishna 2010-03-16 05:45:41

@Henk holterman..But can any file be such that we can not read it due to security reason.That, is there any webpage that we can not read its html content because of security purpose or anything else.

Harikrishna 2010-03-16 05:47:56

@Harikrishna: No, this is not a security issue but something with Encoding and marker-bytes. Good old Notepad to the rescue.

Henk Holterman 2010-03-16 08:11:12

Answer 2

A:

Does MsgBox shows anything? Any error? What does varText.Length show?

string varText = File.ReadAllText(varFile, Encoding.Default); 
MessageBox.Show(varFile + " Text: " + varText + " Lenght: " + varText.Length);

Verify in MessageBox that the path to file is correct, verify that the access rights from inside your application are the same as if you would be reading the file with notepad.

MadBoy 2010-03-15 11:54:50

@MadBoy..In messagebox only the path is displayed nothing else for text or length only one other symbol is displayed with the path that is < .

Harikrishna 2010-03-15 12:03:03

@MadBoy..Path displaying on the messagebox is correct.

Harikrishna 2010-03-15 12:04:27

Can you post the html file somewhere?

MadBoy 2010-03-15 12:39:01

Also please check different Encoding (change Encoding.Default to something else).

MadBoy 2010-03-15 12:58:57

@MadBoy..Ok.......

Harikrishna 2010-03-15 13:00:40

ansaurus

tags:

views:

answers:

System.IO.File.ReadAllText(path) does not read the html file.

related questions