tags:

views:

571

answers:

3

I have n asp.net 2.0 app. I am trying to upload a file and read lines and display them in a textbox. This works fine for a .txt file. But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text. Here is my code...

    Dim s As New StringBuilder
    Dim rdr As StreamReader

    If FileUpload1.HasFile Then

        rdr = New StreamReader(FileUpload1.FileContent)

        Do Until rdr.EndOfStream
            s.Append(rdr.ReadLine() & ControlChars.NewLine)
        Loop

        TextBox1.Text = s.toString()

    End If
+1  A: 

StreamReader doesn't support Word-formatted files. It just reads streams of characters. You need to use some kind of specifically-Word-capable library. This isn't an easy problem at all - it's not always clear how you would convert any portion of a Word document into plaintext.

Wahnfrieden
A: 

You can use the"Word.ApplicationClass" class

However you should read Considerations for server-side Automation of Office

Liberated from another donor:

 Word.ApplicationClass wordApp=new ApplicationClass();

    object file=path;

    object nullobj=System.Reflection.Missing.Value;  

    Word.Document doc = wordApp.Documents.Open(

    ref file, ref nullobj, ref nullobj,

                                          ref nullobj, ref nullobj, ref nullobj,

                                          ref nullobj, ref nullobj, ref nullobj,

                                          ref nullobj, ref nullobj, ref nullobj);

    doc.ActiveWindow.Selection.WholeStory();

    doc.ActiveWindow.Selection.Copy();

    IDataObject data=Clipboard.GetDataObject();

    txtFileContent.Text=data.GetData(DataFormats.Text).ToString();

    doc.Close();

As mentioned in my comment below this may work for you as ell: http://npoi.codeplex.com/

Jay
-1: This is a very bad idea to do in a server application like an ASP.NET application. It's unsupported, may have licensing implications, and very often, fails in unpredictable ways that are difficult to debug. The best bet is: just don't do it in a server application.
John Saunders
I agree this is not the best solution, can cause locking, not recommended but it can also work if done correctly. Its worth a try/mention for his scenario.Here is another option too:http://npoi.codeplex.com/
Jay
+1: Although what John says is absolutely true, for my Form application it works perfectly.
ajdams
+1  A: 

But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text.

That's because the Word document file contains that xml-based formatting. You will see the same thing, if you use a dumb text reader (e.g. Notepad.exe, or e.g. type from the command-line) to see what's in the file.

To extract the text from the surrounding formatting, you'll need to use software (e.g. Word itself, winword.exe) to save or get the document in plain-text format.

ChrisW
Imagine a 97-style Word doc. Sweet dreams ;)
Kawa
A 97-style Word doc wouldn't have "xml-based formatting", but does support COM automation (which may allow you to automate using it to save a document as text).
ChrisW