views:

61

answers:

1

Hello every one, I have made a HTML syntax highlighter in C# and it works great, but there's one problem. First off It runs pretty fast because it syntax highlights line by line, but when I paste more than one line of code or open a file I have to highlight the whole file which can take up to a minute for a file with only 150 lines of code. I tried just highlighting visible lines in the richtextbox but then when I try to scroll I can't it to highlight the new visible text. Here is my code:(note: I need to use regex so I can get the stuff in between < & > characters)

Highlight Whole File:

  public void AllMarkup()
    {
        int selectionstart = richTextBox1.SelectionStart;



        Regex rex = new Regex("<html>|</html>|<head.*?>|</head>|<body.*?>|</body>|<div.*?>|</div>|<span.*?>|</span>|<title.*?>|</title>|<style.*?>|</style>|<script.*?>|</script>|<link.*?/>|<meta.*?/>|<base.*?/>|<center.*?>|</center>|<a.*?>|</a>");
        foreach (Match m in rex.Matches(richTextBox1.Text))
        {
            richTextBox1.Select(m.Index, m.Value.Length);
            richTextBox1.SelectionColor = Color.Blue;
            richTextBox1.Select(selectionstart, -1);
            richTextBox1.SelectionColor = Color.Black;
        }

        richTextBox1.SelectionStart = selectionstart;
    }


    private void pasteToolStripMenuItem_Click(object sender, EventArgs e)
    {
        try
        {
            LockWindowUpdate(richTextBox1.Handle);//Stops text from flashing flashing
            richTextBox1.Paste();
            AllMarkup();

        }finally { LockWindowUpdate(IntPtr.Zero); }
    }

I want to know if there's a better way to highlight this and make it faster or if someone can help me make it highlight only the visible text.

Please help. :) Thanks, Tanner.

+1  A: 

I agree with RCIX - you'll have a hard time overall with combining Regex and HTML parsing :)

If you're going for a high-quality solution that always highlights syntax properly, you're going to need a full-blown parser. You can either use one that's already created, or you can create your own using a tool like ANTLR.

The creators of ANTLR have already created an HTML parser grammar. You can find it here.

If you're looking for a pre-built one, here's a few I've found:

  1. HTML Agility Pack
  2. Majestic 12 HTML Parser
  3. SGML Reader

I'm sure there are others -- this is a pretty common requirement.

Long story short, if this is anything but a simple, disposable project, I'd get a full-blown parser. Otherwise, you can continue to try and hack it with Regex.

Doug