Short version: What's the best way to automate compatibility testing against a large number of third-party programs?
The details:
I develop a program whose core feature is interacting with a variety of different pieces of music player software via their respective RPC interfaces. The RPC itself typically happens either via D-Bus or via some client library specific to a particular player. Since each music player has its own unique RPC interface, my program requires special code to handle each.
Testing all this code is increasingly a problem for me. At last count there are fifteen (!) different music players my program knows how to talk to, and the interface details can vary from one version of a player to the next. Manually testing my program against the latest version of each of the players I'm trying to support, as well as a few older versions, is tedious and error-prone, so I'm looking for a way to automate this as much as possible.
The test cases themselves aren't the problem; those are just a matter of calling a sequence of functions on a player's RPC interface and checking the return values and/or asynchronous callbacks for the expected result. No, the problem is having a framework to run the tests automatically.
Here are the challenges I see:
Each player maintains persistent state, usually as dotfiles under the user's home directory. The state consists of things like the music library, playlists, etc. These files need to be reverted to a known initial state before each test. (Deleting it entirely isn't always an option, since then the GUI-based players will present a setup wizard the next time they start instead of running normally.)
Those initial states may be partially dynamic. For example, a music library will contain full paths to the music files within it, but the paths to the actual "music" files used for testing will vary from machine to machine and won't be known until runtime.
The players to test against will probably be installed under non-standard locations which will vary from system to system, in order to have multiple versions of each installed in parallel. The framework will probably need to know which player and version it's testing against before the player is started, so it can initialize the player's state files accordingly.
Since I don't have any control over development of the music players my program interacts with, I can't modify their behavior to make it easier for me to test against them.
What I'd like to do is set up a VM with a bunch of different players (and a bunch of different versions of each player) installed, and then be able to test my program against each of them in turn automatically. Ideally, it would be possible for someone else to set up their own VM to run tests in themselves, presumably only needing to tell the test framework which players are installed where.
So, what's the best way to automate compatibility testing against a large number (several dozen) of third-party programs?
In case it affects the recommendations, my program is written in Python, and I'm using GNU autotools as the build framework.