views:

89

answers:

2

I would like to design a web app that allows me to sort, browse, and display various attributes (e.g. title, tag, description) for a collection of man pages.

Specifically, these are R documentation files within an R package that houses a collection of data sets, maintained by several people in an SVN repository. The format of these files is .Rd, which is LaTeX-like, but different.

R has functions for converting these man pages to html or pdf, but I'd like to be able to have a web interface that allows users to click on a particular keyword, and bring up a list (and brief excerpts) for those man pages that have that keyword within the \keyword{} tag.

Also, the generated html is somewhat ugly and I'd like to be able to provide my own CSS.

One obvious option is to load all the metadata I desire into a database like MySQL and design my site to run queries and fetch the appropriate data.

I'd like to avoid that to minimize upkeep for future maintainers. The number of files is small (<500) and the amount of data is small (only a couple of hundred lines per file).

My current leaning is to have a script that pulls the desired metadata from each file into a summary JSON file and load this summary.json file in PHP, decode it, and loop through the array looking for those items that have attributes that match the current query (e.g. all docs with keyword1 AND keyword2).

I was starting in that direction with the following...

$contents=file_get_contents("summary.json");
$c=json_decode($contents,true);
foreach ($c as $ind=>$val ) { .... etc

Another idea was to write a script that would convert these .Rd files to xml. In that case, are there any lightweight frameworks that make it easy to sort and search a small collection of xml files?

I'm not sure if xQuery is overkill or if I have time to dig into it...

I think I'm suffering from too-many-options-syndrome with all the AJAX temptations. Any help is greatly appreciated.

I'm looking for a super simple solution. How might some of you out there approach this?

+1  A: 

My approach would be parsing the keywords (from your description i assume they have a special notation to distinguish them from normal words/text) from the files and storing this data as searchindex somewhere. Does not have to be mySQL, sqlite would surely be enough for your project. A search would then be very simple.

Parsing files could be automated as post-commit-hook to your subversion repository.

Karsten
A: 

Why don't you create table SUMMARIES with column for each of summary's fields? Then you could index that with full-text index, assigning different weight to each field.

You don't need MySQL, you can use SQLite which has the the Google's full-text indexing (FTS3) built in.

vartec