views:

720

answers:

1

We currently use print2flash (http://print2flash.com) to convert user submitted documents (Word documents, RTF, PowerPoint, etc) into Flash-based documents that can be viewed online (a la docstoc and scribd).

We would like to index the text inside these files for full-text indexing. Are there any tools or libraries we can use to accomplish this?

We are developing in ASP.NET / C# and have tried working with 3rd party tools such as SWFTools (http://www.swftools.org) but the results have been inconsistent and subpar.

PS: We would like to do the indexing after the original document has been converted to flash because that gives us fewer file formats to deal with.

A: 

Your best bet is a third-party Flash parsing library. Flash has a very dense format and it's painful to parse. Having said that, the format is well-understood. You can find the official specification here: http://www.adobe.com/devnet/swf/

ashes999