tags:

views:

160

answers:

4

Imagine this string:

 if(editorPart instanceof ITextEditor){
  ITextEditor editor = (ITextEditor)editorPart;
  selection = (ITextSelection) editor.getSelectionProvider().getSelection();
 }else if( editorPart instanceof MultiPageEditorPart){
  //this would be the case for the XML editor
  selection = (ITextSelection) editorPart.getEditorSite().getSelectionProvider().getSelection();
 }

I can see, visually, that the "common" start in each of these lines is two tab characters. Is there a regular expression that would replace -- only at the beginning of each line (including the first and last line), this common start, such that after the regex I'd end up with that same string, only essentially un-indented?

I can't simply search for "two tabs" in this case because there might be two tabs elsewhere in the text but not at the start of a line.

I've implemented this functionality with a different method but thought it'd be a fun regex challenge, if it's possible at all

+5  A: 

The ^ symbol in a regular expression matches the beginning of a line. So:

/^\t\t//g

Would remove two tabs at the beginning of a line.

David Crawshaw
Since there is only one beginning of each line, the 'g' modifier doesn't have anything to do. It's harmless, though.
pavium
+1  A: 

In general (i.e. if you want to match an arbitrary prefix, not necessarily two tabs), there may or may not be a way. It depends on which regular expression engine you're using. I would imagine that maybe something roughly like this might work:

\B^(.+).*?$(?:^\1.*?$)+\E

note that I've probably screwed up the regex syntax, just think of it as regex pseudocode of sorts (\B is beginning of string, ^ is beginning of line, $ is end of line, \E is end of string)

But this really isn't a job I would do with a regular expression. A simple character-by-character parser seems much better suited.

David Zaslavsky
+1  A: 

It's absolutely possible. As everyone points out, I'd never inflict this on a real project, though.

My answer, if you're curious, is here. I tried writing it in perl, but it doesn't support variable-length lookbehinds.

EDIT: Fixed it! The linked code now works. If you'd like hints, just comment -- I don't want to give it away if you want to solve it yourself, though.

ojrac
I tried this in Java using only lookahead, and got as close as you did--my solution failed to indent the *last* line. :-/
Alan Moore
With a lookahead, that's the best you can do: each line will delete characters it has in common with the lines in front of it. The last line has no lines to compare with.
ojrac
+1  A: 

Not in one regex. You need to make two passes: matches() to find the longest common prefix, then replaceAll() to remove it. Here's my best solution:

import java.util.regex.*;

public class Test
{
  public static void main(String[] args) throws Exception 
  {
    String target = 
        "\t\tif(editorPart instanceof ITextEditor){\n"
      + "\t\t\tITextEditor editor = (ITextEditor)editorPart;\n"
      + "\t\t\tselection = (ITextSelection) fee.fie().fum();\n"
      + "\t\t}else if( editorPart instanceof MultiPageEditorPart){\n"
      + "\t\t\t//this would be the case for the XML editor\n"
      + "\t\t\tselection = (ITextSelection) fee.fie().foe().fum();\n"
      + "\t\t}";
    System.out.printf("%n%s%n", target);

    Pattern p = Pattern.compile("^(\\s+).*+(?:\n\\1.*+)*+");
    Matcher m = p.matcher(target);
    if (m.matches())
    {
      String indent = m.group(1);
      String result = target.replaceAll("(?m)^" + indent, "");
      System.out.printf("%n%s%n", result);
    }
  }
}

Of course, this assumes (as Jonathan Leffler hinted at in his comment to your question) that the target string is not part of a larger string, and you're only removing whitespace. Without those assumptions the task becomes a lot more complex.

Alan Moore