views:

353

answers:

2

Howdy, CFers! We've got an incredibly frustrating situation with a CF Web Services-based API that we wrote and maintain. We had an API in place for years that was stable and working happily with Ruby, PHP, and ColdFusion clients. Then this year a .NET client came along, and we found that our web service was not interoperable with statically-typed languages due to our extensive use of structs.

We eventually realized we had to re-write the API without structs, and we've done so. It now uses scaler values, arrays, and CFCs (which get translated to SOAP complexTypes). The .NET client is happy, and we wrote proof-of-concept clients in about 6 different languages to ensure that we'd be interoperable this time around.

To our great dismay, it appears that our ColdFusion 7 servers can't serve the new API reliably. It works for about a day or so after restarting, then the clients start getting errors like:

Error: coldfusion.xml.rpc.CFCInvocationException [java.lang.ClassNotFoundException : tafkan.remote_api.pfapi.v.trunk.rsp_pf_survey_status_array]

and

java.lang.NoClassDefFoundError: tafkan/remote_api/pfapi/v/trunk/pf_unit

Restarting the CF instances is the only way to make the problem go away. A lot of time and money was put into rebuilding the API, so everyone is really at wit's end about this.

We've noticed that the WEB-INF/cfc-skeletons directories of our CF instances eventually seem to have two copies of the classes for each of the CFCs used by the API. For example:

-rw-r--r--  Feb 17 09:15 remote_api.pfapi.v.trunk.pf_datum.class
-rw-r--r--  Feb  3 12:20 tafkan.remote_api.pfapi.v.trunk.pf_datum.class

It seems like the errors are coming from a namespace or class search path problem, so we tried switching all CFC references to be fully-qualified (dot notation starting with a mapping) instead of just simple references to CFCs in the current directory. This seemed promising, but the problem came back within 24 hours.

Environment:

  • ColdFusion 7,0,2,142559 with hf702-70523, 2-instance cluster
  • Sun Java 1.4.2_13
  • Apache 2.0.52
  • Centos 4.5 32-bit

Maybe upgrading one of these venerable pieces of software would help? Maybe upgrading just AXIS?

We need help! I'm sure that there is someone out there with more CF/AXIS/SOAP experience than us that can help us get this problem resolved. Adobe support doesn't seem to be an option, as CF7 is EOL'ed and in extended-extended support (and that just for a few more days). We will pay the right person good money to help us figure this out. If you're that person, or think you might know who they are, please contact me ASAP!

Thanks for reading this mega-post! Leon

Update:

Thanks to all who've joined this discussion! Here's an update on where things stand at the moment.

The service just crapped out for the first time today. One of the cluster instances was still able to generate the WSDL, while the other instance said:

AXIS error
Sorry, something seems to have gone wrong... here are the details:
Exception - java.lang.NoClassDefFoundError: tafkan/remote_api/pfapi/v/trunk/rsp_pf_numeric_array

Both cfc-skeletons directories contain a file called tafkan.remote_api.pfapi.v.trunk.rsp_pf_numeric_array.class, and did not appear to contain the otherly-named files we've sometimes seen (remote_api.pfapi.v.trunk.rsp_pf_numeric_array.class). The files in cfc-skeletons do not appear to have been modified since the servers were started yesterday.

The uptime on both instances was about 21.5 hours. I was running without JIT (-Xint).

I've now restarted both instances. They're now running on Sun Java 1.4.2_19 (instead of _13), and JIT has been re-enabled as it clearly wasn't causing this error and was things were dramatically slower without it. I've also cleared the "save class files" check boxes.

And now, we wait again...

Update 2 The problem persists. I'm not sure what else to try at this point. Arg!

FYI, this is cross-posted at http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:60922

A: 

How are the external clients interacting with your webservice? Just via the WSDL I presume?

Is it possible that some client app, a unit test... something, anything ... has a wrong URL... has a URL to your WSDL file with the "tafkan" in it?

If I were working on it, probably the first avenue I'd look down would be figuring out what could possibly result in that problem. Is "tafkan" a valid directory in your system? Where do the .cfc files actually live on the file system, what if any mappings are there to these paths in CF Admin, and what are the URLs that people are using to access your webservice?

The key here, I believe, is getting inside CF's head and asking it "why would you generate, and be looking for, a class with "tafkan" as a package?

marc esher
Thanks, Marc. Everyone's just using the WSDL endpoint. "tafkan" is a CF mapping that points to the web root of our application (/var/www/tafkan/htdocs). The CFCs live at /var/www/tafkan/htdocs/remote_api/pfapi/v/trunk/ . I'd prefer not to list the full URLs here, but they're of the form (https://CLIENT_SITE_URL/remote/pfapi/v/trunk/pfapi.cfc/wsdl). Your suggestion about getting inside CF's head is a good one, but I just have no idea about how to go about doing it.
sbleon
Marc, I'm pretty sure that the class names are right, and that CF just stops being able to find the class after some time. The WSDL's got targetNamespace="http://trunk.v.pfapi.remote_api.tafkan" in its schema tag, so the class name pf_unit.trunk.v.pfapi.remote_api.tafkan seems right.
sbleon
I'm at a loss man. Maybe contact Steven Erat (talkingtree.com), who used to be a CF support engineer with Allaire, MM, and Adobe
marc esher
+2  A: 

I've read this thread, and the CFTalk thread. My initial thoughts about workarounds appear to have been already suggested by Mark Kruger and Dave Watts. The only other workaround idea I had was to catch the error and refresh the webservice stub using the Service Factory methods. (In CF8-9 there is a Admin API method to do this, not sure about CF7).

Researching the error I narrowed down possible matches to these:

http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:144821 This was a match but unresolved

http://blog.coldfusionpowered.com/?p=28 This was a very similar error, resolved by "fixing case issues" on all CFCs & invocations.

http://stackoverflow.com/questions/1288466/cold-fusion-google-adwords-business-component-error Resolved by rewriting code and removing cfcomments (I suspect that other factors were actually responsible for solving it here)

http://forums.crystaltech.com/index.php?topic=22364.0 We're getting closer now. Resolution involved mistakenly having two document roots

http://qaix.com/coldfusion/313-410-web-service-on-cfmx-6-1-jrun-suddenly-not-working-read.shtml Exact match for error message. Exact match for having CFC mapping to doc root. Resolution was to have only 1 mapping pointing to docroot, just "/". This could be the solution. In MX 6/6.1 and maybe 7, there was a default mapping for "/" pointing to docroot. If you have another mapping pointing to docroot, then I can see how this problem might arise. Check the physical paths for mappings and try the solution here, to use only the "/" mapping.

Steven Erat
Thanks, Steven, for your detailed research. I checked and there is no "/" mapping defined. There is a /tafkan, which I'm using to find my CFCs within the API code. Since you can't use a "/" in a dot-delimited object path, it seems like my only options are to use unqualified, local object paths (E.g. "pf_unit"), which didn't work (the same problem I have now), or to use "fully-qualified" paths starting with "tafkan.", which I'm already doing. I also noticed that the last reference you gave, which sounded promising, was an every-time error, not an intermittent one. Any other ideas?
sbleon