I was wondering how stackoverflow parses all sorts of different code and identifies keywords, special characters, whitespace formatting, etc. It does this for most code I believe, and I've noticed it's even sophisticated enough to understand the relationships between everything it parses, like so:
String mystring1 = "inquotes"; //incomment
String mystring2 = "inquotes//incomment";
String mystring3 = //incomment"inquotes";
Many IDEs do this also. How is this done?
Edit: Further explaination - I am not asking about the parsing of the text, my question is, once I am past that part.. is there something like a universal XML Scheme, or cross-code format hierarchy that describes which strings are keywords, which characters denote comments, text strings, logic operators, etc. Or must I become a syntax guru for any language I wish to parse accurately?