All your example strings satisfy the following regex:
('(''|[^'])*'|\d+)(\s*,\s*('(''|[^'])*'|\d+))*
Meaning:
( # open group 1
' # match a single quote
(''|[^'])* # match two single quotes OR a single character other than a single quote, zero or more times
' # match a single quote
| # OR
\d+ # match one or more digits
) # close group 1
( # open group 3
\s*,\s* # match a comma possibly surrounded my white space characters
( # open group 4
' # match a single quote
(''|[^'])* # match two single quotes OR a single character other than a single quote, zero or more times
' # match a single quote
| # OR
\d+ # match one or more digits
) # close group 4
)* # close group 3 and repeat it zero or more times
A small demo:
import java.util.*;
import java.util.regex.*;
public class Main {
public static List<String> tokens(String line) {
if(!line.matches("('(''|[^'])*'|\\d+)(\\s*,\\s*('(''|[^'])*'|\\d+))*")) {
return null;
}
Matcher m = Pattern.compile("'(''|[^'])*+'|\\d++").matcher(line);
List<String> tok = new ArrayList<String>();
while(m.find()) tok.add(m.group());
return tok;
}
public static void main(String[] args) {
String[] tests = {
"1, 2, 3",
"'a', 'b', 'c'",
"'a','b','c'",
"1, 'a', 'b'",
"'this''is''one string', 1, 2",
"'''this'' is a weird one', 1, 2",
"'''''''', 1, 2",
/* and some invalid ones */
"''', 1, 2",
"1 2, 3, 4, 'aaa'",
"'a', 'b', 'c"
};
for(String t : tests) {
System.out.println(t+" --tokens()--> "+tokens(t));
}
}
}
Output:
1, 2, 3 --tokens()--> [1, 2, 3]
'a', 'b', 'c' --tokens()--> ['a', 'b', 'c']
'a','b','c' --tokens()--> ['a', 'b', 'c']
1, 'a', 'b' --tokens()--> [1, 'a', 'b']
'this''is''one string', 1, 2 --tokens()--> ['this''is''one string', 1, 2]
'''this'' is a weird one', 1, 2 --tokens()--> ['''this'' is a weird one', 1, 2]
'''''''', 1, 2 --tokens()--> ['''''''', 1, 2]
''', 1, 2 --tokens()--> null
1 2, 3, 4, 'aaa' --tokens()--> null
'a', 'b', 'c --tokens()--> null
But, can't you simply use an existing (and proven) CSV parser instead? Ostermiller's CSV parser comes to mind.