ansaurus

Question

Python : match string inside double quotes and bracket

Answer 1

A:

You want to use the groups feature of regular expressions:

import re
myRegExp = re.compile('"(?P<val1>.*?)".*?\((?P<val2>.*?)\)')
myRegExp.finall(YourStringHere)

Josiah 2010-07-13 04:13:23

Answer 2

+1 A:

>>> import re
>>> s = u"""“作為”(act) ，用於罪行或民事過失時，包括一連串作為、任何違法的不作為和一連串違法的不作為；
    “行政上訴委員會”(Administrative Appeals Board) 指根據《行政上訴委員會條例》(第442章)設立的行政上訴委員會；(由1994年第6號第32條增補)
    “成人”、“成年人”(adult)* 指年滿18歲的人；  (由1990年第32號第6條修訂)
    “飛機”、“航空器”(aircraft) 指任何可憑空氣的反作用而在大氣中獲得支承力的機器；
    “外籍人士”(alien) 指並非中國公民的人；  (由1998年第26號第4條增補)
    “修訂”(amend) 包括廢除、增補或更改，亦指同時進行，或以同一條例或文書進行上述全部或其中任何事項；  (由1993年第89號第3條修訂)
    “可逮捕的罪行”(arrestable offence) 指由法律規限固定刑罰的罪行，或根據、憑藉法例對犯者可處超過12個月監禁的罪行，亦指犯任何這類罪行的企圖；  (由1971年第30號第2條增補)
    “《基本法》”(Basic Law) 指《中華人民共和國香港特別行政區基本法》；  (由1998年第26號第4條增補)
    “行政長官”(Chief Executive) 指─"""
>>> for x,y in re.findall(u"“(.*?)”\((.*?)\)",s):
...     print x, y
... 
作為 act
行政上訴委員會 Administrative Appeals Board
成年人 adult
航空器 aircraft
外籍人士 alien
修訂 amend
可逮捕的罪行 arrestable offence
《基本法》 Basic Law
行政長官 Chief Executive

If you want to use this in a program, you should use

# -*- coding: utf-8 -*-

at the top of the file, so the “ and ” are interpreted correctly

gnibbler 2010-07-13 04:19:54

I prefer a greedy pattern `u'“([^”]+)”\\(([^)]+)\\)'`.

KennyTM 2010-07-13 06:46:29

I don't want ”、“ between Chinese words, thank you very much

Walapa 2010-07-13 09:35:58

Answer 3

A:

To match multiple definitions you need multiple regexes.

# Assume Python 3.x. Use u'...' instead of '...' for Python 2.x.
import re
collector_re = re.compile('((?:“[^”]+”、?)+)\\(([^)]+)\\)')
splitter_re = re.compile('“([^”]+)”')

def find_all_definitions(text):
    def_pairs = collector_re.finditer(text)
    for match in def_pairs:
        (chinese, english) = match.groups()
        terms = splitter_re.findall(chinese)
        yield (terms, english)

Usage:

text = '''“作為”(act) ，用於罪行或民事過失時，包括一連串作為、任何違法的不作為和一連串違法的不作為；
“行政上訴委員會”(Administrative Appeals Board) 指根據《行政上訴委員會條例》(第442章)設立的行政上訴委員會；(由1994年第6號第32條增補)
“成人”、“成年人”(adult)* 指年滿18歲的人； (由1990年第32號第6條修訂)
“飛機”、“航空器”(aircraft) 指任何可憑空氣的反作用而在大氣中獲得支承力的機器；
“外籍人士”(alien) 指並非中國公民的人；  (由1998年第26號第4條增補)
“修訂”(amend) 包括廢除、增補或更改，亦指同時進行，或以同一條例或文書進行上述全部或其中任何事項；  (由1993年第89號第3條修訂)
“可逮捕的罪行”(arrestable offence) 指由法律規限固定刑罰的罪行，或根據、憑藉法例對犯者可處超過12個月監禁的罪行，亦指犯任何這類罪行的企圖；  (由1971年第30號第2條增補)
“《基本法》”(Basic Law) 指《中華人民共和國香港特別行政區基本法》；  (由1998年第26號第4條增補)
“行政長官”(Chief Executive) 指─'''

for terms, english in find_all_definitions(text):
    print (', '.join(terms), "\t", english)

KennyTM 2010-07-13 07:01:15

Answer 4

A:

If you want to get both Chinese phrases when there are two of them (as in adult and aircraft), you'll need to work harder. The code below is for Python 3.x.

#coding: utf8
import re
s = """“作為”(act) ，用於罪行或民事過失時，包括一連串作為、任何違法的不作為和一連串違法的不作為；
    “行政上訴委員會”(Administrative Appeals Board) 指根據《行政上訴委員會條例》(第442章)設立的行政上訴委員會；(由1994年第6號第32條增補)
    “成人”、“成年人”(adult)* 指年滿18歲的人；  (由1990年第32號第6條修訂)
    “飛機”、“航空器”(aircraft) 指任何可憑空氣的反作用而在大氣中獲得支承力的機器；
    “外籍人士”(alien) 指並非中國公民的人；  (由1998年第26號第4條增補)
    “修訂”(amend) 包括廢除、增補或更改，亦指同時進行，或以同一條例或文書進行上述全部或其中任何事項；  (由1993年第89號第3條修訂)
    “可逮捕的罪行”(arrestable offence) 指由法律規限固定刑罰的罪行，或根據、憑藉法例對犯者可處超過12個月監禁的罪行，亦指犯任何這類罪行的企圖；  (由1971年第30號第2條增補)
    “《基本法》”(Basic Law) 指《中華人民共和國香港特別行政區基本法》；  (由1998年第26號第4條增補)
    “行政長官”(Chief Executive) 指─"""
for zh1, zh2, en in re.findall(r"“([^”]*)”(?:、“([^”]*)”)?\((.*?)\)",s):
    print(ascii((zh1, zh2, en)))

resulting in:

('\u4f5c\u70ba', '', 'act')
('\u884c\u653f\u4e0a\u8a34\u59d4\u54e1\u6703', '', 'Administrative Appeals Board')
('\u6210\u4eba', '\u6210\u5e74\u4eba', 'adult')
('\u98db\u6a5f', '\u822a\u7a7a\u5668', 'aircraft')
('\u5916\u7c4d\u4eba\u58eb', '', 'alien')
('\u4fee\u8a02', '', 'amend')
('\u53ef\u902e\u6355\u7684\u7f6a\u884c', '', 'arrestable offence')
('\u300a\u57fa\u672c\u6cd5\u300b', '', 'Basic Law')
('\u884c\u653f\u9577\u5b98', '', 'Chief Executive')

John Machin 2010-07-13 07:24:04

Yes it work, Thank you very much

Walapa 2010-07-13 09:37:45

ansaurus

tags:

views:

answers:

Python : match string inside double quotes and bracket

related questions