views:

82

answers:

1

This is more out of curiosity than anything else, as I'm failing to find any useful info on Google about this function (CORE::substcont)

In profiling and optimising some old, slow, XML parsing code I've found that the following regex is calling substcont 31 times for each time the line is executed, and taking a huge amount of time:

Calls: 10000 Time: 2.65s Sub calls: 320000 Time in subs: 1.15s`

  $handle =~s/(>)\s*(<)/$1\n$2/g;
  # spent  1.09s making 310000 calls to main::CORE:substcont, avg 4µs/call
  # spent  58.8ms making  10000 calls to main::CORE:subst, avg 6µs/call

Compared to the immediately preceding line:

Calls: 10000 Time: 371ms Sub calls: 30000 Time in subs: 221ms

  $handle =~s/(.*)\s*(<\?)/$1\n$2/g;
    # spent   136ms making 10000 calls to main::CORE:subst, avg 14µs/call
    # spent  84.6ms making 20000 calls to main::CORE:substcont, avg 4µs/call

The number of substcont calls is quite surprising, especially seeing as I would've thought that the second regex would be more expensive. This is, obviously, why profiling is a Good Thing ;-)

I've subsequently changed both these line to remove the unneccessary backrefs, with dramatic results for the badly-behaving line:

Calls:10000 Time: 393ms Sub calls: 10000 Time in subs: 341ms

$handle =~s/>\s*</>\n</g;
  # spent   341ms making 10000 calls to main::CORE:subst, avg 34µs/call
  • So, my question is - why should the original have been making SO many calls to substcont, and what does substcont even do in the regex engine that takes so long?
+4  A: 

substcont is Perl's internal name for the "substitution iterator". Something to do with s///. Based on what little information I have, it seems substcont is triggered when doing a backref. That is, when $1 is present. You can play with it a bit using B::Concise.

Here's the opcodes of a simple regex without a backref.

$ perl -MO=Concise,-exec -we'$foo = "foo";  $foo =~ s/(foo)/bar/ig'
1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <$> const[PV "foo"] s
4  <#> gvsv[*foo] s
5  <2> sassign vKS/2
6  <;> nextstate(main 1 -e:1) v:{
7  <#> gvsv[*foo] s
8  <$> const[PV "bar"] s
9  </> subst(/"(foo)"/) vKS
a  <@> leave[1 ref] vKP/REFC
-e syntax OK

And one with.

$ perl -MO=Concise,-exec -we'$foo = "foo";  $foo =~ s/(foo)/$1/ig'
1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <$> const[PV "foo"] s
4  <#> gvsv[*foo] s
5  <2> sassign vKS/2
6  <;> nextstate(main 1 -e:1) v:{
7  <#> gvsv[*foo] s
8  </> subst(/"(foo)"/ replstart->9) vKS
9      <#> gvsv[*1] s
a      <|> substcont(other->8) sK/1
b  <@> leave[1 ref] vKP/REFC
-e syntax OK

That's all I can offer. You may want to try Rx, mjd's old regex debugger.

Schwern
Thanks, I guess it's time to learn about perl opcodes.... ;-)
paulw1128