views:

454

answers:

7

As a summer project, I have decided to practice basic algorithms and learn Haskell at the same time, by implementing large chunks of CLRS in Haskell.

Things are going pretty swimmingly, so far, but since I started graph algorithms, I've noticed a lack: test input and output! For example, I've implemented Floyd-Warshall's algorithm, and it seems to operate shiny, but I can't really figure out how to easily verify if it is actually correct! Same thing with Kruskal's, etc.

It is easy enough to make test cases in Python, but for anything large enough to be interesting (and perhaps contain a border case my code doesn't deal with), I can't really verify the answer. Finding all shortest paths, or an MST, of even a 40-vertex graph can be rather gut-wrenching by hand. :)

So, to restate the question title: is there an online repository for sample algorithm input, especially graphs? The larger the samples the better, and if it has input for other algorithms (FFT, Geometric Algorithms, etc.) that'd be swell too.

Note: formatting shouldn't be an issue, unless if the input's encrypted or something :).

Thanks! -Agor

Edit (perhaps, P.S.?)... A few notes raised by the current answers:

  • I'm really looking for plaintext input and output examples. Automated testers online for contests are always appreciated, but I'd like to see what the correct output is, and they obviously don't do that.
  • Graph algorithms are indeed the focus of this question, as they are the hardest (I think) to verify (please no snark about NP-Complete problems. :P). But I hope the title and explanation hasn't dissuaded anyone for posting example repositories for, say, dynamic programminng problems, or any other relatively complex examples!

(Second edit: updated tags, realized most were pretty inept, and this question doesn't really require haskell for an answer.)

+3  A: 

Have you looked at open-source projects that implement graph-theory algorithms? They might have good test cases you can reuse. There are quite a few such projects, for example:

  1. BGL
  2. JGraphT
  3. Lemon
  4. QuickGraph
  5. annas
  6. RGL
  7. NetworkX
Tim Sylvester
Thanks for the links! Those are both incredibly encouraging and discouraging at the smae time. :) As I've explained in other comments (and, soon, in the question itself via an edit), I'm really looking for plaintext input and output examples. The NetworkX one (its always the last one I check. :)) seems to have some nice tests in the source package complete with terminal output, so that's nice. What's nice is that, if there really is no such repository of problems, I could try to use these to essentially make my own. So, +1, thanks, etc. :)
Agor
@Agor go build it!
pageman
+4  A: 

I don't know much about graph algorithms or graph test sets, so this isn't really an answer to your question. But since your implementing this is Haskell: Have you looked at QuickCheck?

In case you already know QuickCheck, this isn't relevant, but in case you don't, here's a little teaser:

QuickCheck is a testing framework for automated tests in Haskell that works by specifying properties that should hold instead of manual test cases. QuickCheck will generate a large number of random test data to see if these properties hold. If the property fails to hold, it will try to find the smallest input for which the property fails and report that error.

For example, if you've written a list-sort function and want to test if the length of the list doesn't change during sorting:

prop_lengthDoesntChange :: [Int] -> Bool
prop_lengthDoesntChange xs = length xs == length (sort xs)

And you can call run QuickCheck like so:

Test.QuickCheck> quickCheck prop_lengthDoesntChange
OK, passed 100 tests.

The 100 tests that were run are all based on the type of the property. Since prop_lengthDoesntChange takes a list of Ints, QuickCheck will generate things like empty lists, lists will negative numbers, singleton lists and larger lists.

In case of the sort function, testing whether the length doesn't change isn't sufficient enough, you'd probably also want other properties. In fact, in general you write a lot of simple properties to test a single function, but these properties are mostly small and simple.

I think QuickCheck is a good testing and debugging tool, if you're not familiar with it, you should definitely check it out.

Tom Lokhorst
That's really cool! It reminds me of the little-known programming language ATS (http://www.ats-lang.org/#simple_ats_programs). Sadly, while it may be nice to say that the MST is always size |V|-1, I don't think it is powerful enough to verify that it actually spans all vertices. :) But this resource may be quite handy later on! +1, thank you, etc.
Agor
It's certainly powerful enough. It's just a simple Haskell function, after all. Assuming you have a generator for random graphs and a `vertices` function:`prop_MST_spans_all_vertices :: Graph -> Bool``prop_MST_spans_all_vertices g = sort $ vertices g == sort $ vertices $ mst g`
Alexey Romanov
Alexey: It appears that your example simply asserts that the MST contains the same number of nodes as the original graph. Though that is quite necessary for an MST, it is not sufficient. I think QuickCheck is pretty nifty, and I'll definitely use it for the easier-to-verify algorithms (factoring, sorting), but the core issue that I'm trying to address is that there is no easy way to verify that the results of my graph algorithms are correct. While there may be an algorithmic way of asserting as such in QuickCheck, it would be just as difficult (and potentially buggy) as the original algorithm!
Agor
+11  A: 

I think Knuth has created a large collection of data for the purpose of testing algorithms against. It's called the Stanford Graphbase.

Another idea is to work on some problems from UVA or SPOJ. Just search for the ones that are graph related. You can even see which problems are best suited for a particular type of algorithm here and here.

Mike
Thanks for the links! I looked at UVA and SPOJ, as well as the (terrifying) Stanford Graphbase. The two websites seem are pretty interesting, and I've bookmarked them. The thing is, (and I'm going to clarify this with an edit to the question), I'm looking for plaintext input *and* output. While it would be great to have those robot judges say whatever algorithm I have works *correctly*, I am more concerned with being able to compare output so I can potentially debug. But some of those questions have such input and output, so its very helpful! Thanks, +1, etc.
Agor
A: 

You might want to check the problems created by the Universidad de Valladolid for the International Collegiate Programming Contest (ACM-ICPC).

All problems are available at the UVa Online Judge.

Anyone is free to register and try to solve the problems as many times as you want to. The performance is also tested if you're concerned about it. All problems are very clearly specified, with examples of inputs and its acceptable outputs.

They do not disclose any solutions, but there is a forum dedicated to the resolution of the problems. I'll consistently find tips about stuff you might be mistaking there, and you don't need to be afraid of spoilers. The only down side is that, if you give up on a problem, it might be hard to find a working solution.

I don't know if it is a problem, but the only acceptable languages are ANSI C, C++, Java, and Pascal.

jpbochi
I just realized *medikgt* already suggested UVa... :(
jpbochi
+1  A: 

Several others have suggested UVa etc, but I didn't notice anyone recommending TopCoder. The summary page for any successful solution lists all inputs and outputs used by the System Tester (generally hundreds for each problem, including randomly generated inputs, edge cases and data crafted to trip up a specific program). For example, see this page (possibly requires registration) (screenshot)

Of course, it would require a little hassle to scrape the solutions, and considerable work to implement each problem (even with the library and problem analyses), but this would also teach you a lot!

I hope this helps!

Rodrigo Queiro
+2  A: 

DIMACS, the center for discrete mathematics and theoretical computer science at Rutgers University, used to hold implementation competitions of graph algorithms. Its public ftp server still holds problem instances and example implementations (in C and fortran). The input and output are text files.

I managed to find instances of:

There may be other fields covered as well.

Yuval F
A: 

Most of ACM regionals competitions reveals contest test data (after the event). You can easily Google for lots of them.

kuszi