views:

326

answers:

1

How to find the position or location of string in given document.I have one word document and i want to store all its words and word positions in database so thats why i need to find the position of the words.

so please tell me how can i find position or location of word or string in given document.

i intend to use vb.net or c# for and .doc documents

Thanks in Advance

A: 

Mmmm... I haven´t found a more smart solution :-/ but maybe this helps you... We´ll suppose that you have some version of MS Office installed in your system.

First of all, you have to add a reference in your project to a Microsoft COM component called "Microsoft Word ?* object library"

*? It deppends of the version of your MS Office

After you´ve added the reference, you could test this code:

using System;
using System.Collections.Generic;
using System.Text;
using Word;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // Find the full path of our document

            System.IO.FileInfo ExecutableFileInfo = new System.IO.FileInfo(System.Reflection.Assembly.GetEntryAssembly().Location);            
            object docFileName = System.IO.Path.Combine(ExecutableFileInfo.DirectoryName, "document.doc");

            // Create the needed Word.Application and Word.Document objects

            object nullObject = System.Reflection.Missing.Value;
            Word.Application application = new Word.ApplicationClass();
            Word.Document document = application.Documents.Open(ref docFileName, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject);


            string wholeTextContent = document.Content.Text; 
            wholeTextContent = wholeTextContent.Replace('\r', ' '); // Delete lines between paragraphs
            string[] splittedTextContent = wholeTextContent.Split(' '); // Get the separate words

            int index = 1;
            foreach (string singleWord in splittedTextContent)
            {
                if (singleWord.Trim().Length > 0) // We don´t need to store white spaces
                {
                    Console.WriteLine("Word: " + singleWord + "(position: " + index.ToString() + ")");
                    index++;
                }
            }

            // Dispose Word.Application and Word.Document objects resources

            document.Close(ref nullObject, ref nullObject, ref nullObject);
            application.Quit(ref nullObject, ref nullObject, ref nullObject);
            document = null;
            application = null;

            Console.ReadLine(); 
        }
    }
}

I´ll test it and it looks that it works =)

Javier Morillo
thanks alot it works .....i just increased the missing agrument to 16 cos of my version of word COMand it needs a few touches as it only displays the last paragraph of any document thanks so so much tho
ryder1211212
=) I´m glad that it helped you. Greetings
Javier Morillo
While I was writing this awfull code I thought a good idea would like to learn more about open office libraries for .NET...
Javier Morillo
not a bad idea boss ....at the moment am neck deep in projectsthanks alot again for the help i have added stopword removal and able to save what i need in the databaseps.in the code above am trying to change the filename dynamicaly from an aspx page using process.start and geting the values from main()but no luck yet .
ryder1211212
sorted thanks you
ryder1211212