Hello!
Im trying to figure out a way to strip out all html tags from records in a database, then create xml?
Any ideas?
Built on asp.net 2.0 with sql server
Hello!
Im trying to figure out a way to strip out all html tags from records in a database, then create xml?
Any ideas?
Built on asp.net 2.0 with sql server
Check this question : Using C# regular expressions to remove HTML tags. What exactly did you mean by creating xml?
Why not just parse the page, ensuring that you make it into a DOM tree, and then just go through the elements pulling out the appropriate values that you need, and perhaps any attributes you deem necessary.
If you wrote the html files then they should be well-formed, so this would be easy.
Don't strip the HTML with the database or with sql. Instead, strip it out at the last mile in your application code with a scraper.
Google this: "HTML Scraper". HTML screen scraping tools read HTML content and output the content, less the HTML. Or, alternatively, Stack Overflow this: "Screen-scraping HTML".