Hey,
I'm trying to figure out the best practice for implementing a complex algorithm on stored information in a relational DB.
Specifically: I want to implement a variation of the k-means algorithm (a document clustering algorithm) on a large MS SQL Server database containing TFxIDF vectors of many documents (these vectors are used as input for the algorithm).
My first thought was doing the entire thing in SQL using stored procedures, functions, views and all the other basic SQL Server tools, but then I thought maybe I should write managed code (I'm fluent in C#) that will be executed on the SQL Server.
Performance is an issue here, so I need to take that in consideration also.
I would appreciate any advice on the path I should take.
Thank you!