I'm working on a web based medical billing code search engine for my software start-up that will let users search for ICD-9 (and related ICD-10, clinical codes) medical codes used in medical diagnoses and medical billing.
The problem with building the data files for the search is that the Center for Disease Control only releases the file in .rtf Rich Text Format files that are ok for printing and reading, but difficult to programatically load into a database.
There are a few existing ICD-9 search engines out there and I know dozens of other programming teams have tackled this problem for every crappy practice management system on the market. But due to the nature of the field things are very closed and no one has released their data publicly or released code to parse these .rtf files or a repository for the data. The reason is that each company sees the X amount of hours they put into parsing these files into database data as a barrier to entry for competitors.
What other types of specific programming problems due programmers keep solving over and over again, and how do you break the vicious cycle and be the first person to release data or a solution? Or is it ever worth breaking this cycle and releasing your solution and making it easier for competitors? How do you profit or benefit from releasing a solution for a problem like this?