ansaurus

Question

Python compile all non-words except dot[.]

Answer 1

A:

Python has a convenience function for that

>>> s = "ab.cd.ef.gh"
>>> s.split(".")
['ab', 'cd', 'ef', 'gh']

Kit 2010-08-11 23:34:51

And how does that help the OP with "all non-word patterns except dot"?! This splits by dot only -- poles apart from what the OP asked.

Alex Martelli 2010-08-11 23:38:07

D'oh! Poles apart indeed. Haven't had my coffee yet. Sorry about that.

Kit 2010-08-12 02:24:28

Answer 2

+2 A:

>>> import re
>>> the_string="http://hello-world.com"
>>> re.findall(r'[\w.]+',the_string)
['http', 'hello', 'world.com']

gnibbler 2010-08-11 23:34:59

Just perfect, Thanks :)could you explain me this??

learner 2010-08-11 23:44:26

`[\w^[.]]` is for the delimiters, while `[\w.]+` is for the words, thus we call `findall`.

Satoru.Logic 2010-08-11 23:58:41

Answer 3

A:

I'm assuming that you want to split a string on all non-word patterns except a dot.

Edit: Python doesn't support the Java-style regex syntax that you are using. I'd suggest first replacing all dots with a long string, then splitting the string, then putting the dots back in.

import re
long_str = "ABCDEFGH"
str = str.replace('.', long_str)
result = re.split(r'\W', str)

Then as you are using result, replace all the long_str sequences with a dot again.

This is a very bad solution, but it works.

Dumb Guy 2010-08-11 23:36:25

Answer 4

+1 A:

A very good reference for Python's regular expression module is available here. Following should do the trick for you.

import re
re.split(r'[\w.]+', text_string)

Or,

import re
re.findall('[^\w.]+', text_string)

Ashish 2010-08-11 23:38:54

try `text_string="foo|bar."`

gnibbler 2010-08-11 23:41:50

@Ashish, nope: almost every special character is "disabled" within "sets" (i.e., between brackets) in a pattern, and in particular so is the vertical bar (in its "or" sense which it would have outside brackets).

Alex Martelli 2010-08-11 23:44:36

Fixed it. How's it now?

Ashish 2010-08-11 23:47:35

@Alex: I remember your post something on the lines of how Python became a part of Google. Excellent read.

Ashish 2010-08-12 00:21:42

Your regexes work now, but you've got them reversed. The `split` regex should be `[^\w.]+` and the `findall` regex should be `[\w.]+`.

Alan Moore 2010-08-12 00:36:21

I felt the OP wants non-words but if words are wanted then yes, they are reversed.

Ashish 2010-08-12 06:08:42

Answer 5

A:

Your Java syntax is off, to begin with. This is what you were trying for:

[\W&&[^.]]

That matches a character from the intersection of the sets described by "any non-word character" and "any character except ." But that's overkill when you can just use:

[^\w.]

...or, "any character that's not a word character or .". It's the same in Python (and in most other flavors, too), though you probably want to match one or more of the characters:

re.split(r'[^\w.]+', the_string)

But it's probably simpler to use @gnibbler's approach of matching the parts that you want to keep, not the ones you want to throw away:

re.findall(r'[\w.]+', the_string)

Alan Moore 2010-08-12 00:31:19

Thanka Alan, that is really helpful

learner 2010-08-12 16:23:32

ansaurus

tags:

views:

answers:

Python compile all non-words except dot[.]

related questions