If you can target NET Framework 3.5 and above, you don't need to scan the document on every change: Just subscribe to the TextChanged event and use the TextChangedEventArgs.Changes property to get a list of changes.
Whenever you receive a TextChanged event, iterate through the Changes collection and construct a TextRange from the Offset, AddedLength, and RemovedLength. Then expand this TextRange as appropriate for recalculating formatting, then do the formatting calculation and update as a separate step (in a Dispatcher.BeginInvoke callback) so you don't end up having recursive TextChanged events.
richTextBox.TextChanged += (obj, e)
{
var document = richTextBox.Document;
var length = document.ContentStart.GetOffsetToPosition(document.ContentEnd);
int totalAdd = 0;
int totalRemove = 0;
foreach(var change in e.Changes)
{
var expandBy = Math.Max(totalAdd,totalRemove);
var startIndex = change.Offset - expandBy;
var endIndex = changed.Offset + expandBy + Math.Max(totalAdd, totalRemove);
startIndex = Math.Max(startIndex, 0);
endIndex = Math.Min(endIndex, length);
var startPointer = document.ContentStart.GetPositionAtOffset(startIndex);
var endPointer = startPointer.GetPositionAtOffset(endIndex - startIndex);
var range = new TextRange(startPointer, endPointer);
Dispatcher.BeginInvoke(DispatcherPriority.Normal, new Action(() =>
{
DoParsingAndFormatting(ExpandRangeToUnitOfParsing(range));
});
totalAdd += change.AddedLength;
totalRemove += change.RemovedLength;
}
};
If you want to find the paragraph where a change begins or ends, you can use range.Start.Paragraph
and range.End.Paragraph
.
Also, for many situations it will be helpful to store a copy of all the text in the document separately from the FlowDocument itself. Then as you apply changes to that document you can update the formatting as you go without having to reread the document. Note that the text should not be stored in a single large array, but rather snipped into small pieces (perhaps around 1000 characters) and accessed through a tree that organizes the pieces by index. The reason is that inserting a character at the beginning of a huge array is very expensive.