views:

208

answers:

4

I'm working on a web based medical billing code search engine for my software start-up that will let users search for ICD-9 (and related ICD-10, clinical codes) medical codes used in medical diagnoses and medical billing.

The problem with building the data files for the search is that the Center for Disease Control only releases the file in .rtf Rich Text Format files that are ok for printing and reading, but difficult to programatically load into a database.

There are a few existing ICD-9 search engines out there and I know dozens of other programming teams have tackled this problem for every crappy practice management system on the market. But due to the nature of the field things are very closed and no one has released their data publicly or released code to parse these .rtf files or a repository for the data. The reason is that each company sees the X amount of hours they put into parsing these files into database data as a barrier to entry for competitors.

What other types of specific programming problems due programmers keep solving over and over again, and how do you break the vicious cycle and be the first person to release data or a solution? Or is it ever worth breaking this cycle and releasing your solution and making it easier for competitors? How do you profit or benefit from releasing a solution for a problem like this?

+1  A: 

I'd think the basic CRUD operations on a database would be another type though I'm not sure this is specific enough. This is a simple problem in some respects but is likely something that is done over and over again.

JB King
Projects like Ruby on Rails and Django seemed to solve a lot of the CRUD problems with web based database CRUD operations.
MikeN
+1  A: 

Looks to me like you can pay $20 and get it on CD ROM. I bet most of the companies do something like this.

http://www.cdc.gov/nchs/products/elec_prods/subject/icd96ed.htm

Also there was an email address you might inquire into:

Questions and comments regarding the function or format of these 
files should be addressed to the Data Dissemination Branch, 
National Center for Health Statistics, at (301) 458-4636, or by 
e-mail to:  [email protected].

One more reference to buy it is at: https://catalog.ama-assn.org/Catalog/product/product_detail.jsp?productId=prod1270004

Although they want $200.

Chris Lively
I actually tried to order the CD-rom and the Government printing office told me they don't have the CD's and there is an unknown delay in getting more of them. I'm assuming that the data on them is not much more useful than the .rtf files.
MikeN
Also, this wouldn't address updates to the codes which occur every year.
MikeN
That's $200 for one of the two code sets. And not for resale.
le dorfier
I'm just trying to point out that a few simple google searches yielded the ability to purchase the information in a usable format. Mike needs to decide how much money he's willing to invest to get a data export versus the amount of time it would take to parse the data on a recurring basis.
Chris Lively
+1  A: 

What you do is you build the piece that parses the files, the build a database to manage it. Then you approach your "competitors" with an option for them to access your database at a fee. Make it well known that you're up to date with all the records. Try to get an in at the CfDC so that you can get all the new files first.

Remember, a competitor is just a customer who doesn't know he wants to buy your product yet. :)

Drew
That is exactly what every other person who solved this problem thought! But here I am solving it again and not willing to give them a dime for their old out-dated solutions!
MikeN
Heh. I'm sure you being a much better developer since their solutions are old and out-dated, can create something that won't become out-dated and can corner this market. Right? ;P
Drew
+4  A: 

There's a whole ecosystem built on organizing, reviewing, distilling, authorizing, selling, distributing, analyzing and supporting these codes and it includes for instance the American Medical Association (who basically owns the diagnostic codes), and the insurance companies who live or die by filtering claims for payment (based on the service codes correlated with the diagnostic codes using arcane formulae) as closely as possible. They all have cash flow requirements that require recurring churn. Once you have a product, you'll be part of the ecosystem too.

le dorfier