



To illustrate my requirements consider the following directory structure:

C:\Dev\Projects\Test Project
C:\Dev\Projects\Test Project\Test.cs
C:\Dev\Projects\Foo\foo.cs (containing the word test)

The basic document will have id, type, name and content fields, where type will be file or folder and name will be ether file name or folder name.

When searching for "test" I should get:

C:\Dev (ancestor of a result)
C:\Dev\Projects (ancestor of a result)
C:\Dev\Projects\Test Project (result)
C:\Dev (ancestor of a result)
C:\Dev\Projects (ancestor of a result)
C:\Dev\Projects\Test Project (ancestor of a result)
C:\Dev\Projects\Test Project\Test.cs (result)
C:\Dev (ancestor of a result)
C:\Dev\Projects (ancestor of a result)
C:\Dev\Projects\Foo (ancestor of a result)
C:\Dev\Projects\Foo\foo.cs (result)

Even better if it possible to avoid duplications:

C:\Dev (ancestor of a result)
C:\Dev\Projects (ancestor of a result)
C:\Dev\Projects\Test Project (result)
C:\Dev\Projects\Test Project\Test.cs (result)
C:\Dev\Projects\Foo (ancestor of a result)
C:\Dev\Projects\Foo\foo.cs (result)

When searching for "project" I should get:

C:\Dev (ancestor of a result)
C:\Dev\Projects (ancestor of a result)
C:\Dev\Projects\Test Project (result)

When searching for "foo" I should get:

C:\Dev (ancestor of a result)
C:\Dev\Projects (ancestor of a result)
C:\Dev\Projects\Foo (result) C:\Dev\Projects\Foo\foo.cs (result)

Thanks for any help


If you generate your index once or have a very small number of writes you could set up a solution in the indexing of the documents.

So for each document you would save another field called "path" and have it hold a tokenized list of all words from the sub elements of the path:

name: C:\Dev\Projects
path: C:, Dev, Projects, Test, Test Project, Test.cs, Foo, Foo.cs (use whatever tokenizer you want)

then index the field as INDEXED:true STORED:false and use it for searching for matches:

query: +path:"Foo"

Should return all the documents that have Foo as a child element. Keep in mind this solution is very costly for writes and may be impractical for a very large tree structure where you have many thousands of leafs.
