Try prefixing your regex with ^(.*?)
(match any characters from the beginning of the string, non-greedy). Thus it will match anything at all that occurs at the start of the string, but it will match as little as it can while still having the rest of the regex match. Thus you'll grab all of the stuff that wasn't matched normally in that first capture group.
Why don't you use an HTML parser for this?
You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.
Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it's unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you're wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.
I think trying to parse and validate the entire text in one regular expression is likely to give you problems. The text you are parsing is not a regular language, so regular expressions are not well designed for this purpose.
Instead I would recommend that you first tokenize the input to single tags and text between the tags. You can use a simple regular expression to find single tags - this is a much simpler problem that regular expressions can handle quite well. Once you have tokenized it, you can iterate over the tokens with an ordinary loop and apply formatting to the text as appropriate.