views:

263

answers:

4

How can I optimize the following code, which currently takes over 2 minutes to retrieve and loop through 800+ records from a pool of over 100K records, returning 6 fields per record (adds approximately 20 seconds per additional field):

<cfset dllPath="C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\System.DirectoryServices.dll" />
<cfset LDAPPath="LDAP://" & arguments.searchPath />
<cfset theLookUp=CreateObject(".NET","System.DirectoryServices.DirectoryEntry", dllPath).init(LDAPPath) />
<cfset theSearch=CreateObject(".NET","System.DirectoryServices.DirectorySearcher", dllPath).init(theLookUp) />
<cfset theSearch.Set_Filter(arguments.theFilter) />
<cfset theObject = theSearch.FindAll() />

<cfloop index="row" from="#startRow#" to="#endRow#">
   <cfset QueryAddRow(theQuery) />
   <cfloop list="#columnList#" index="col">
     <cfloop from="0" to="#theObject.Get_Item(row).Get_Properties().Get_Item(col).Get_Count()-1#" index="item">
       <cftry>
         <cfset theQuery[col][theQuery.recordCount]=ListAppend(theQuery[col][theQuery.recordCount],theObject.Get_Item(row).Get_Properties().Get_Item(col).Get_Item(item),"|") />
         <cfcatch type="any">
         </cfcatch>
        </cftry>
      </cfloop>
    </cfloop>
  </cfloop>
+2  A: 

It's been a long time since I touched CF, but I can give some hints in pseudo-code. For one thing, this expression is extremely inefficent:

#theObject.Get_Item(row).Get_Properties().Get_Item(col).Get_Count()-1#

Take the first part for example, Get_Item(row) - your code causes CF to go retrieve the row and its properties for each iteration of the #columnList# loop; and to top it all, you're doing that TWICE per iteration of columnlist (once for loop and again for the inner cfset). If you think about it, it only needs to retrieve the row for each iteration of the outer loop (from #sfstart# to #cfend). So, in pseudo-code do this:

for each row between start and end

cfset props = #theobject.get_item(row).get_properties()#

for each col in #columnlist#

cfset currentcol = #props.getitem(col)#

cfset count = #currentcol.getcount() - 1#

foreach item from 0 to #count#

cfset #currentcol.getItem(item)# etc...

Make sense? Every time you enter a loop, cache objects that will be reused in that scope (or child scopes) in a variable. That means you are only grabbing the column object once per iteration of the column loop. All variables defined in outer scopes are available in the inner scopes, as you can see in what I've done above. I know its tempting to cut and paste from previous lines, but don't. It only hurts you in the end.

hope this helps,

Oisin

x0n
The count variable/caching is not necessary - CF will cache the from/to values for cfloop. (though not for a loop statement within cfscript)
Peter Boughton
i would have thought so, but I was trying to demonstrate a point.
x0n
+1  A: 

Additionally, using a cftry block in each loop is likely slowing this down quite a bit. Unless you are expecting individual rows to fail (and you need to continue from that point), I would suggest a single try/catch block for the entire process. Try/catch is an expensive operation.

Ben Doom
Yeah, but in this instance I expect bad or missing data...so that hit I have to live with. Question: Given the other suggestions, I assume that I could bypass that by checking the itemCount at the lowest scope
Brian
again, I can't talk for CF, but in any other language with try/catch and exceptions it's not the try/catch that is expensive, it's the creation of the exception. This is because an exception contains a stack trace - that is to say, a list of all the methods that were called up until this point.
x0n
+1  A: 

How large is the list of items for the inner loop?

Switching to an array might be faster if there is a significantly large number of items.

I have implemented this alongside x0n's suggestions...

<cfset dllPath="C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\System.DirectoryServices.dll" />
<cfset LDAPPath="LDAP://" & arguments.searchPath />
<cfset theLookUp=CreateObject(".NET","System.DirectoryServices.DirectoryEntry", dllPath).init(LDAPPath) />
<cfset theSearch=CreateObject(".NET","System.DirectoryServices.DirectorySearcher", dllPath).init(theLookUp) />
<cfset theSearch.Set_Filter(arguments.theFilter) />
<cfset theObject = theSearch.FindAll() />

<cfloop index="row" from="#startRow#" to="#endRow#">

    <cfset Props = theObject.get_item(row).get_properties() />

    <cfset QueryAddRow(theQuery) />

    <cfloop list="#columnList#" index="col">

     <cfset CurrentCol = Props.getItem(col) />

     <cfset ItemArray = ArrayNew(1)/>
     <cfloop from="0" to="#CurrentCol.getcount() - 1#" index="item">
      <cftry>
       <cfset ArrayAppend( ItemArray , CurrentCol.Get_Item(item) )/>
       <cfcatch type="any">
       </cfcatch>
      </cftry>
     </cfloop>
     <cfset theQuery[col][theQuery.recordCount] = ArrayToList( ItemArray , '|' )/>

    </cfloop>

</cfloop>
Peter Boughton
Okay, so I tested this out...and I see about a 30% reduction in the time it takes to run...which seems to make senes, because We've effectively reduced one third of the calls for items that are singular. I'm definitely interested in any othe optimzations that you can think of...
Brian
A: 

I would think that you'd want to stop doing so many evaluations inside of your loops and instead use variables to hold counts, pointers to the col object and to hold your pipe-delim string until you're ready to commit to the query object. If I've done the refactoring correctly, you should notice an improvement if you use the code below:

<cfloop index="row" from="#startRow#" to="#endRow#">
<cfset QueryAddRow(theQuery) />
<cfloop list="#columnList#" index="col">
 <cfset PipedVals = "">
 <cfset theItem = theObject.Get_Item(row).Get_Properties().Get_Item(col)>
 <cfset ColCount = theItem.Get_Count()-1>
 <cfloop from="0" to="#ColCount#" index="item">
  <cftry>
  <cfset PipedVals = ListAppend(PipedVals,theItem.Get_Item(item),"|")>
  <cfcatch type="any"></cfcatch>
  </cftry>
 </cfloop>
 <cfset QuerySetCell(theQuery,col) = PipedVals>
</cfloop>

kooshmoose
Okay, so let me digest a little...That call through the object is not akin to an array, but rather a set of chained calls? [slaps forehead] No wonder it's so heavy... I'm gonna go run this code back through...I'll bring some results back in about an hour
Brian
I think that's correct since you're instantiating a .NET object you should be able to set a variable to point to an object farther down the chain thus reducing the number of times you'd have to dig through the first object, especially if all you're looking for is a count of objects.
kooshmoose
Did you compare against my slightly different solution? I doubt you'd see much of a performance difference, but it would be interesting to get some metrics on the difference between storing iterations in an array versus a string.
kooshmoose
It works out about the same...There are only two multi-value fields (memberOf and proxyAddresses), and so the load seems to spread out better... I suspect if I just retrieved the multi-values, it'd be quicker (I saw recently that array only had perf boost when randomly searching for values)
Brian
Ah, then you might receive a little more optimization by using a switch on the ColCount variable and doing the loop only when the value is NEQ 0. Otherwise, just do a QuerySetCell without first going to a string via the loop.
kooshmoose
Also, if you know with 100% certainty the names of all the multi-value fields, then you could skip setting the count and looping over items for all single item columns.
kooshmoose
Good Point about the count test...
Brian