tags:

views:

140

answers:

3

I'm trying to match these kinds of strings

{@csm.foo.bar}

without matching any of these

{@[email protected]}
{@csm.foo.bar-42}

The regex I use is

r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}"

It gets dog slow if the string contains multiple matches. Why? It runs very fast if I take away the brace matching, like this

r"@csm.((?:[a-zA-Z0-9_]+\.?)+)"

but that's not what I want.

Any ideas?

Here is sample input:

<dockLayout id="popup" y="0" x="0" width="{@csm.screenWidth}" height="{@csm.screenHeight}">
  <dataNumber id="selopacity_Volt" name="selopacity_Volt" value="0" />
  <dataNumber id="selopacity_Amp" name="selopacity_Amp" value="0" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" triggerOn="*"  targetNode="selopacity_Volt"  targetAttr="value" to="1" dur="0ms" ease="in" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" triggerOn="65024" targetNode="selopacity_Volt"  targetAttr="value" to="0" dur="0ms" ease="in" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" triggerOn="*"  targetNode="selopacity_Amp" targetAttr="value" to="1" dur="0ms" ease="in" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" triggerOn="65024" targetNode="selopacity_Amp"  targetAttr="value" to="0" dur="0ms" ease="in" />
  <dockLayout id="item" width="{@csm.screenWidth}" height="{@csm.screenHeight}" depth="-1" clip="false" xmlns="http://www.tat.se/kastor/kml" >
 <dockLayout id="list_item_title" x="0" width="{@csm.screenWidth}" height="{@[email protected]_y}">
   <text id="volt_amp_text" x="0" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemUnselColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="{ItemTitle}" />   
 </dockLayout>    
 <dockLayout id="gear_layout" y="0" x="0" width="{@csm.screenWidth}" height="{@[email protected]_y}">
   <image id="battery_image" x="0" dockLayout.halign="left" dockLayout.valign="bottom" opacity="1" src="{@m_MenuModel.Gauges.VoltAmpereMeter.image}"/>
 </dockLayout>
 <!--DockLayout for Voltage Value-->
 <dockLayout id="volt_value" x="0" width="{@[email protected]_x}" height="{@[email protected]_y}">
   <text id="volt_value_text" x="0" opacity="{selopacity_Volt*selopacity_Amp}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="right" dockLayout.valign="bottom" string="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" >  
   </text>
 </dockLayout>   
 <!--DockLayout for Voltage Unit-->
 <dockLayout id="volt_unit" x="{@[email protected]_x}" width="{@csm.screenWidth}" height="{@[email protected]_y}">
   <text id="volt_unit_text" x="0" opacity="{selopacity_Volt*selopacity_Amp}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="V" >   
   </text>
 </dockLayout>
 <!--DockLayout for Ampere Value-->
 <dockLayout id="ampere_value" x="0" width="{@[email protected]_x}" height="{@[email protected]_y}">
   <text id="ampere_value_text" x="0" opacity="{selopacity_Amp*selopacity_Volt}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="right" dockLayout.valign="bottom" string="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" > 
   </text>
 </dockLayout>
 <!--DockLayout for Ampere Unit-->
 <dockLayout id="ampere_unit" x="{@[email protected]_x}" width="{@csm.screenWidth}" height="{@[email protected]_y}">
   <text id="ampere_unit_text" x="0" opacity="{selopacity_Amp*selopacity_Volt}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="A" >   
   </text>
 </dockLayout>
 <!--DockLayout for containing Data Not Available text-->
 <dockLayout id="no_data_textline" x="{@[email protected]_x}" width="{@csm.screenWidth}" height="{@[email protected]_y}">
   <text id="no_data_text" x="0" opacity="{1-(selopacity_Amp*selopacity_Volt)}" ellipsize="false" font="{@csm.listSelFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="{text1}" >   
   </text>
 </dockLayout>
 <!--<rect id="test_rect1" x="{151-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
     <rect id="test_rect1" x="{237-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
     <rect id="test_rect1" x="{160-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
     <rect id="test_rect1" x="{246-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
     <rect id="test_rect8" x="0" y="{161-40}" width="320" height="1" opacity="1" fill="#00ff00" />
     <rect id="test_rect1" x="{109-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />-->
  </dockLayout>  
</dockLayout>
A: 

I'm not exactly a regex expert, but it might be due to the brace at the end of the match. You might try to match r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)" and just check manually whether a closing brace occurs at the end or not.

David Zaslavsky
Yes, that does give a big difference.
Pär Bohrarper
But then I can't use re.sub..
Pär Bohrarper
+4  A: 

Can you supply a test case of a string for which the first match is "dog slow"? BTW, though I don't know if that matters to performance, there's an imprecision in the RE -- it matches any single character after the {@csm start, not just a dot; maybe a better expression (possibly faster as it doesn't make any dots "optional") could be:

r'\{@csm((?:\.\w+)+)\}'
Alex Martelli
more consise as well
Bartosz Radaczyński
That fixed it! Thank you very much.
Pär Bohrarper
A: 

You probably need to give a better example of exactly what's slow. For a reasonably long string containing stuff that does and doesn't match:

x="".join(['{@csm.foo.bar-%d}\n{@csm.foo.%dx.baz}\n' % (a,a)
            for a in xrange(10000)])
mymatch=r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}"

for y in re.finditer(mymatch,x):
    print y.group(0)

works fine, but if you've got a long enough string and you're searching it poorly you could have problems.

Anthony Towns