views:

343

answers:

2

Are there any good examples (websites or books) around of how to build a full text search engine in F#?

+1  A: 

Do you want to write this yourself? Or do you simply need the functionality?

If you need the functionality, an embedded/in-memory database with Full Text Search support may do the trick. Since it's .Net, I'd recommend SQLite ADO.Net Provider as the open-source contender. It's really good ( Support LINQ before any other provider out there, design-time support, etc.), and the FTS support is under very active development. I think Google is working on that. There is also VistaDB Database. I'm using that mainly now. It should have FTS support. Entirely .Net, which gives it some integration advantages.

If you have to do it yourself check-out books on Information Retrieval. I've read a few, but know nothing that stands out from the crowd. Amazon might help there.

kervin
I want it to write myself to have a learning experience.
Michiel Borkent
+1  A: 

I have written a search engine in F# using just a few lines of code. You can read about that in my poster and access the full implementation in

Stefan Savev's home page

The basic idea is shown in the code below, but more explanations are actually needed than the code itself. Those are available at my website as well.

This code creates the index on disk of a collection of documents. Indexing is done in external memory.

1.   let create_postings in_name tmp_dir out_name =
2.     let process_doc (doc_id, doc_text) = 
3.         doc_text |> tokenize |> stopword |> stem 
4a.        |> List.count
4b.        |> ListExt.map(fun (word, tf) -> (word, (doc_id, tf)) 
5.     in_name 
6.     |> as_lines
7.     |> Seq.map_concat extract_docs 
8.     |> Seq.map_concat process_doc
9a.    |> External.group_by (fun (w, _) -> w) 
9b.       (fun (_, docid_and_tf) -> docid_and_tf) 
9c.       (fun lst -> (List.length lst, lst)) 
9d.       tmp_dir
9e.       (External.ElemDesc())
10.    |> output out_name