views:

184

answers:

3

Hi all,

I am rephrasing this question to make it a little more straightforward and easy to understand, hopefully.

I have roughly 30 components (internal) that go into a single web application. That means 30 different projects with their own separate POM. I use inheritance quite a bit in my POMs so one of the things they inherit is a PMD/CPD configuration to prevent code duplication.

Even though I have CPD/PMD running, it only detects duplicate code within the same project. I would like it to detect in any of my projects if there is code shared among the projects that can be refactored out. Moreover, I was looking for something that could (using the same concept/pattern) verify that no code is shared between other open source dependencies.

It would be CPD/PMD, except it would operate on the source jars. This task would consume a large amount of memory if you scan all projects and their dependencies for duplication. Right now, I would just like to apply that to internal projects. If it works, then it would be relatively easy/straightforward to scale that out.

Walter

A: 

It sounds like you want to find duplicate code anywhere in your 30 projects. I can't speak for PMD; I assume you tell it to make one giant project containing all the source files from the union of the projects. But yes, this would take a lot of RAM and CPU.

Another tool that does is the Java CloneDR. The CloneDR finds duplicate code whether it is exactly the same or close (e.g., a few edits) regardless of source code layout or intervening comments. It is pretty easy to set it up to process all the files in your set of projects.

Ira Baxter
Yes, Ira, that is correct. I have 30 projects which I split out so I can better understand how projects are related as well as isolate concerns a bit. It is a pain in the but for how frequently I have to update poms, but the project is much more modular. Sub-projects are much more focused and have a clear purpose.
+1  A: 

I'm not sure I got everything but...

I'd create an aggregating module with all projects as dependencies, use the maven-dependency-plugin and it's unpack-dependencies mojo to get all dependencies sources jar (the mojo can take a classifier as parameter) and unpack-them (maybe in target/generated-sources/java, the maven build helper plugin may help here) and finally run pmd:cpd on the whole source base.

This may need some tweaking, I didn't test this at all.

Pascal Thivent
Pascal, thanks for your comments. I will give that a try in the future. That sounds like my best option at this point. If it works, perhaps I should contribute back to the pmd:cpd plugin. I know most organizations don't split up projects like this, but any duplicate code is time wasted reinventing the wheel which leads to more bugs and a bumpy ride.
Your use case is perfectly valid IMHO. Running a similarity analyzer on code spread across several modules really makes sense.
Pascal Thivent
A: 

Just run PMD:CPD as a stand-alone program. All it needs is a directory, and it will recurse. At least, it did for me. I moved all my source to one directory and ran the CPD gui from the batch file distributed with PMD-4.2.5 .

anubistheta