tags:

views:

87

answers:

4

Hi all,

I am new to java regex.

Sorry for the long posting.

I have a three requirements:

1a) I have a string that contains three occurences of the word 'TEST'. Each of this word is followed by ^ and I need to check if the content between 2 and and 3rd occurence of ^ is blank, if it is blank/empty search further to see if the content between the 5th and 6th occurence of the ^ is "" If it is "", then replace it to be blank/empty Example: Existing string:

aaaa^ 
TEST^x^^y^z^""^cccc^bbb^ 

Expected string:

aaaa^ 
TEST^x^^y^z^^cccc^bbb^ 

1b) If the content between 2nd and 3rd occurence of ^ is not blank and not "", then do not change the content between 5th anc 6th occurence

Existing string:

TEST^p^^q^r^""^lll^mmm^ 

Expected string:

TEST^p^^q^r^""^lll^mmm^

I need to repeat this logic check whenever TEST word is found.

1c) If the content between 5th and 6th occurence of ^ is not blank and not "" and if the content between 2nd and 3rd is blank/empty then replace it with STR.

Existing string:

TEST^g^^q^r^YYY^lll^mmm^ 

Expected string:

TEST^g^STR^q^r^YYY^lll^mmm^ 

I need to accomplish allt he above cases in a java regex. I could make case 1 work BASED ON valuable input from my previous posting in this forum. I could not make case 2 and 3 work.

How to accomplish case 2 and 3 in the same regex expression ( I am not sure as what the regex expressions are for not empty contnet check and for 'OR' check). In non regex world, in plain if else approach, I can take care of the 3 cases as follows:

if (the content between 2nd and 3rd occurence of ^ is empty) 
{ 

if(content between 5th and 6th occurence of ^ is "") 
{ 
make this content empty 
} 
else 
{ 
set the content between 2nd and 3rd occurence of ^ as STR 
} 


} 

But since I need to make this check for each line which starts with the word TEST in the String, I am leaning towards regex.

So far the regex that works for case 1 is as follows:

str.replaceAll("(TEST\\^[^^]*\\^\\^[^^]*\\^[^^]*\\^)\"\"", "$1") 

For 2nd case, I tried modifying the above regex as follows but in vain(Tried to search for not empty VALUE BETWEEN 2ND AND 3RD OCCURENCE where i asSumed *\\d0$ represents EMPTY and [\\d0$] implies NOT EMPTY):

str.replaceAll("(TEST\\^[^^]*\\^[^\\d0$]\\^[^^]*\\^[^^]*\\^)\"\"", "$1") 

Any help to come up with regex expression that takes care of the above 3 usecases is highly appreciated as I have a deadline to meet for this task.

Any help is HIGHLY appreciated.

Thanks in advance.

A: 

Hi, Ill first try to solve your (1b) problem. I am sorry but I think you forgot to mention what action should be taken in (1b) if content between 2nd and 3rd occurance of ^ IS BLANK.

1b) If the content between 2nd and 3rd occurence of ^ is not blank and not "", then do not change the content between 5th anc 6th occurence

Shekhar
1b usecase - If it is BLANK/EMPTY, ignore.
joe robles
1b usecase - If it is BLANK/EMPTY ignore => no chnages to be done.
joe robles
+1  A: 

It looks to me, that ^ is a delimiter. So It could make life much easier if you just split the String at the delimiter and work with array:

List<String> lines = FileUtils(myFile, myEncoding);
List<String[]> allValues = new ArrayList<String[]>();
for (String line: lines)
    allValues.add(line.split("\\^"));

The above example shows a method to process a whole csv file using apache-commons-io.

Andreas_D
A: 

I have updated the code for your new requirement. ^ and | both have special meaning in regex, so if your delimiter is one of the special character, it needs to be handled more carefully. The new code is

public class Main {

    public static void main(String[] args) {
        System.out.println(replace("TEST^x^^y^z^\"\"^cccc^bbb^", '^'));//case 1a
        System.out.println(replace("TEST^x^^y^z^\"\"Something^cccc^bbb^", '^'));//case 1a
        System.out.println(replace("TEST^x^^y^z^Something\"\"^cccc^bbb^", '^'));//case 1a
        System.out.println(replace("TEST^x^Something^y^z^\"\"^cccc^bbb^", '^'));//case 1b
        System.out.println(replace("TEST^x^^y^z^\"Something\"^cccc^bbb^", '^'));//case 1c

        System.out.println(replace("TEST|x||y|z|\"\"|cccc|bbb|", '|'));//case 1a
        System.out.println(replace("TEST|x||y|z|\"\"Something|cccc|bbb|", '|'));//case 1a
        System.out.println(replace("TEST|x||y|z|Something\"\"|cccc|bbb|", '|'));//case 1a
        System.out.println(replace("TEST|x|Something|y|z|\"\"|cccc|bbb|", '|'));//case 1b
        System.out.println(replace("TEST|x||y|z|\"Something\"|cccc|bbb|", '|'));//case 1c
    }

    /*
    private static String replace(String in) {
        String intermediateResult = in.replaceAll("(TEST\\^[^^]*\\^\\^[^^]*\\^[^^]*\\^)\"\"\\^", "$1^");
        String finalResult = intermediateResult.replaceAll(
                "(TEST\\^[^^]*\\^)(\\^[^^]*\\^[^^]*\\^([^\"\\^].*|\"[^\"].*))", "$1STR$2");
        return finalResult;
    }*/

    private static String replace(String in, char deliminator) {
        String delim = "\\"+deliminator;
        String intermediateResult = in.replaceAll(
                "(TEST" + delim +
                "[^" + delim + "]*" +
                delim + delim +
                "[^" + delim + "]*" + delim +
                "[^" + delim + "]*" + delim +
                ")\"\"" + delim,
                "$1"+deliminator);

        String finalResult = intermediateResult.replaceAll(
                "(TEST" + delim +
                "[^" + delim + "]*" 
                + delim + ")(" + delim +
                "[^" + delim + "]*" + delim +
                "[^" + delim + "]*" + delim +
                "([^\"" + delim + "].*|\"[^\"].*))", "$1STR$2");
        return finalResult;
    }
}

the output is

TEST^x^^y^z^^cccc^bbb^
TEST^x^^y^z^""Something^cccc^bbb^
TEST^x^STR^y^z^Something""^cccc^bbb^
TEST^x^Something^y^z^""^cccc^bbb^
TEST^x^STR^y^z^"Something"^cccc^bbb^
TEST|x||y|z||cccc|bbb|
TEST|x||y|z|""Something|cccc|bbb|
TEST|x|STR|y|z|Something""|cccc|bbb|
TEST|x|Something|y|z|""|cccc|bbb|
TEST|x|STR|y|z|"Something"|cccc|bbb|
Hemang
My advice is to go to http://www.regular-expressions.info/ for learning it.
Hemang
Thank you. I hate to ask but what is the significance of ^ versus the search string ^. it works fine when the search char is ^ but fails when the search char is | (I tried \\| but in vain). I mean I need to make it work also in this case - TEST|x||y|z||cccc|bbb|. Thanks in advance.
joe robles
I tried to make regex work for 1a scenario (TEST|x||||""|ccc|)in the case of search literal being | but had to do the following:String intermediateResult = in.replaceAll("(TEST\\\\|[\\|\\|]*\\\\|\\\\|[\\|\\|]*\\\\|[\\|\\|]*\\\\|)\"\"\\|","$1\\|" ); But it does not work if I use \\|...I do not understand as why it needs \\\\| for search literal? I wish I had enough time to get better at regex understanding given my deadline.
joe robles
Also - the | approach is not working when the content of the 5th occurence has xx"". It must ignore it but it is removing the "". In other scenario - ""xxx - It does not change it which is the correct way.My requirement is to remove "" ONLY when the content is only "" between 5th and 6th occurence of the search literal | Any help is highly appreciated to understand where I am going wrong...I wish I had enough time to get better at regex given my deadline.
joe robles
I tried out few things but in vain hate to give up but since of the time/dealine restirctions, I would like to REQUEST you as how to make the replaceall work in all the three scenarios mentioned originally when the search literal is | instead of ^ Your help is HIGHLY appreciated as there is no other other help to look for as all of us at work are new to regex and given the time constraints for this prodcution issue/deadline. Thanks a bunch in advance.
joe robles
This is in reference to the second replaceall sent by you - I am unable to understand the significance of .*|\" in the match that you used in the second regex pattern ([^\"\\^].*|\"[^\"].*)). I am trying to make your regex pattern for | instead of ^ ( as I per my other requirement) but in vain...I am missing something to say that if 5th occurance of | has content other than "" or empty/blank, replace the EMPTY content between second and thrid occurence of | Any help is HIGHLY appreciated. Thanks!
joe robles
Your second regex modified to take care of | instead of ^ which is as follows: intermediateResult.replaceAll( "(TEST\\|[\\|\\|]*\\\\|)(\\\\|[\\|\\|]*\\\\|[\\|\\|]*\\|([^\"\\^].*|\"[^\"].*))", "$1ST$2" ); Your input is appreciated. I tried few things as to tell it to chk for alphanumeric and no special chars - ^[A-Za-z0-9]+$ but in vain.
joe robles
Sorry to bother you with load of messages. I modified your first regex to take care of | instead of ^. Your regex works like a charm for ^. When I use my modified one, it is making "" to blank when found|""| in any occurence other than 5th one..I do not understand as why same regex is not working when changed to look for |. Here is the modified one - in.replaceAll( "(TEST\\|[\\|\\|]*\\\\|[\\|\\|]*\\\\|[\\|\\|]*\\\\|[\\|\\|]*\\\\|)\"\"\\|", "$1\\|" ) Any help is appreciated to make it work.
joe robles
This replace-method is unreadable. You should use a single string with an "X" everywhere a delimiter goes to and then use a regex to replace it ;-)
Arian
+1  A: 

Don't use a regex to parse this use a split eg

var arr;
arr=str.split("\^","g");

and perform the logic you describe on each appropriate item in the the resulting array.

Apologies if my java syntax is not correct!

El Ronnoco