views:

110

answers:

1

Short Desc:

I'm curious to see if I can use SQL Analysis services or some other SQL Server service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset.

Long Desc

I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow.

Closing Note

I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it.

Alternatively, if no one knows of an out of the box solution, could you recommend any algorithms I might be able to use in code, where I could do string comparisons against an array of values?

Hopefully all this makes sense. Please let me know in the comments if anything needs clarification.

+1  A: 

You can use the text mining transforms in SQL Server Integration Services (Enterprise and Dev editions) to perform term lookup and term extraction, and then use Association Rules in Analysis Services to build a term association model. There's an aging tutorial that should still work (free login required)

We'll be announcing something related to this later this year - you can leave feedback on our website (predixionsoftware.com) if you're interested in beta testing.

-Jamie MacLennan CTO Predixion Software

Jamie MacLennan
Thank you very much for the response. Your article walked me through exactly what I needed. I posted some feedback on your site.
regex