You need to escape the special characters (like +). Also the 'express' bit, should have a space on either side.
leppie
2008-10-06 14:59:43
You need to escape the special characters (like +). Also the 'express' bit, should have a space on either side.
In the case without an Express, you are looking for 2 spaces before the year. That is no good. Try this:
"Visual (Basic|C\+\+|Studio) (Express )?2008"
Depending on the input, it might be enough to use:
"Visual [^ ]+ (Express )?2008"
It should be
"Visual (Basic|C\+\+|Studio)( Express)? 2008"
>>> import re
>>> repl = 'Visual Studio 2005'
>>> regexp = re.compile('Visual (Studio|Basic|C\+\+)( Express)? 2008')
>>> test1 = 'Visual Studio 2008'
>>> test2 = 'Visual Studio Express 2008'
>>> test3 = 'Visual C++ Express 2008'
>>> test4 = 'Visual C++ Express 1008'
>>> re.sub(regexp,repl,test1)
'Visual Studio 2005'
>>> re.sub(regexp,repl,test2)
'Visual Studio 2005'
>>> re.sub(regexp,repl,test3)
'Visual Studio 2005'
>>> re.sub(regexp,repl,test4)
'Visual C++ Express 1008'
Try with:
Visual (Basic|C\+\+|Studio)( Express)? 2008
that is, quote the '+' of 'C++' and include the space in "Express"
Since it's Python and you don't need the parenthesized parts:
Visual (?:Basic|C\+\+|Studio)(?: Express)? 2008
This is more explicit with spaces:
Visual\s(Basic|C\+\+|Studio)(\sExpress)?\s2008
Unless your sample input is riddled with all sorts of permutations of your keywords, you could simplify it immensely with this:
Visual .+? 2008
i think this should works
/visual (studio|basic|c\+\+)? (express)?\s?2008/i