ansaurus

Question

Is there a regular expression for finding/replacing the common start of all lines in a chunk of text?

Answer 1

+5 A:

The ^ symbol in a regular expression matches the beginning of a line. So:

/^\t\t//g

Would remove two tabs at the beginning of a line.

David Crawshaw 2009-09-09 00:09:11

Since there is only one beginning of each line, the 'g' modifier doesn't have anything to do. It's harmless, though.

pavium 2009-09-09 00:13:33

Answer 2

+1 A:

In general (i.e. if you want to match an arbitrary prefix, not necessarily two tabs), there may or may not be a way. It depends on which regular expression engine you're using. I would imagine that maybe something roughly like this might work:

\B^(.+).*?$(?:^\1.*?$)+\E

note that I've probably screwed up the regex syntax, just think of it as regex pseudocode of sorts (\B is beginning of string, ^ is beginning of line, $ is end of line, \E is end of string)

But this really isn't a job I would do with a regular expression. A simple character-by-character parser seems much better suited.

David Zaslavsky 2009-09-09 00:15:11

Answer 3

+1 A:

It's absolutely possible. As everyone points out, I'd never inflict this on a real project, though.

My answer, if you're curious, is here. I tried writing it in perl, but it doesn't support variable-length lookbehinds.

EDIT: Fixed it! The linked code now works. If you'd like hints, just comment -- I don't want to give it away if you want to solve it yourself, though.

ojrac 2009-09-09 00:50:10

I tried this in Java using only lookahead, and got as close as you did--my solution failed to indent the *last* line. :-/

Alan Moore 2009-09-09 01:35:19

With a lookahead, that's the best you can do: each line will delete characters it has in common with the lines in front of it. The last line has no lines to compare with.

ojrac 2009-09-09 04:35:34

Answer 4

+1 A:

Not in one regex. You need to make two passes: matches() to find the longest common prefix, then replaceAll() to remove it. Here's my best solution:

import java.util.regex.*;

public class Test
{
  public static void main(String[] args) throws Exception 
  {
    String target = 
        "\t\tif(editorPart instanceof ITextEditor){\n"
      + "\t\t\tITextEditor editor = (ITextEditor)editorPart;\n"
      + "\t\t\tselection = (ITextSelection) fee.fie().fum();\n"
      + "\t\t}else if( editorPart instanceof MultiPageEditorPart){\n"
      + "\t\t\t//this would be the case for the XML editor\n"
      + "\t\t\tselection = (ITextSelection) fee.fie().foe().fum();\n"
      + "\t\t}";
    System.out.printf("%n%s%n", target);

    Pattern p = Pattern.compile("^(\\s+).*+(?:\n\\1.*+)*+");
    Matcher m = p.matcher(target);
    if (m.matches())
    {
      String indent = m.group(1);
      String result = target.replaceAll("(?m)^" + indent, "");
      System.out.printf("%n%s%n", result);
    }
  }
}

Of course, this assumes (as Jonathan Leffler hinted at in his comment to your question) that the target string is not part of a larger string, and you're only removing whitespace. Without those assumptions the task becomes a lot more complex.

Alan Moore 2009-09-09 01:29:49

ansaurus

tags:

views:

answers:

Is there a regular expression for finding/replacing the common start of all lines in a chunk of text?

related questions