ansaurus

Question

finding ALL cycles in a huge sparse matrix

Answer 1

A:

c) Is this actually an appropriate solution for the problem? My understanding of "elementary" cycles is that in the graph below, rather than picking out A-B-C-D-E-F it'll pick out A-B-F, B-C-D etc. but that's not the end of the world given the task.
    E
   / \
  D---F
 / \ / \
C---B---A

No. "Elementary" in the sense of Donald Johnson's paper means simple, that is, no node appears twice in the circle. That means the algorithm won't pick AFBCDBA, but will pick ABCDEF.

d) If necessary, I can simplify the problem by enforcing mutuality in relations - i.e. A-friends-with-B <==> B-friends-with-A, and if really necessary I can maybe cut down the data size, but realistically it is always going to be around the 1mil mark.

Undirected graphs have vector space of (non-simple) cycles (which has a nice basis, etc.), but I don't know if it'll help you.

z) Is this a P or NP task?

It is an enumeration problem, which, by itself, cannot be in P nor NP.

jpalecek 2010-05-30 16:33:27

Answer 2

A:

Finding every cycle does not sound reasonable. There will be exponential number of cycles, all overlapping each other.

As for P or NP, the slowest part is actually enumerating each cycle (because there can be so many of them). If there are no cycles, a linear algorithm exists.

Maybe you actually want to divide the graph in biconnected components? See http://en.wikipedia.org/wiki/Biconnected_component

Also it is almost NEVER reasonable to store graphs in matrices. It sounds nice in theory, but does not scale in practice, Use adjacency lists instead ( http://en.wikipedia.org/wiki/Adjacency_list )

extropy 2010-05-30 16:47:22

Answer 3

+2 A:

What you're doing is similar to a very well studied problem in data mining, known as association rule mining or more generally frequent itemset mining. What you can find with frequent itemset mining, is a little bit more specific than what you're doing, but also more useful.

We'll go with closed frequent itemset mining, what this will do is find all groups of friends, where everyone is friends with each other.

I'm going to say this right now, that Java can't do what you want it to do. It can't load that much memory and it's not efficient enough to process that data in any reasonable amount of time, you're going to need to use C/C++. I'd suggest using LCM which is a closed frequent itemset miner, but you're also going to need set the support pretty high, because of the amount of data that you have.

Another thing you might want to consider is reading up on Large Graph Mining, which is also a fairly large area of research, but Java isn't going to cut it. Also you're not going to be able to find all the cycles in your data (unless it is incredibly sparse), there's going to be far too many of them. They're also going to be overlapping and not very meaningful, what you're going to maybe possibly be able to find is several of the largest cycles.

Jacob Schlather 2010-05-30 17:53:51

Answer 4

A:

Based on your comment about fraud detection, perhaps you might find this useful: Cycle Space.

Basically you treat each cycle of the graph as a binary vector of dimension m (where m is the number of edges in the graph), denoting the edges in the cycle by 1 and edges not in the cycle by 0.

We define the sum operation on cycles as the binary vector addition: 1+1 = 0, 1+0 = 1, 0+0 = 0 and so (1,0,1) + (1,1,1) = (0,1,0) etc.

The cycle space consists of each those vectors which are either cycles or disjoint union of cycles. You can find a basis of this vector space by finding the Fundamental Cycles of some spanning tree of the graph.

This basis has m-n+1 (n is number of nodes in the graph) vectors and any member of the cycle space (and hence any cycle) can be written as a 0-1 linear combination of those basis vectors.

Since this is a banking graph, I am guessing (I have no clue, though) m-n+1 will be a small enough number for this to be a feasible approach.

Moron 2010-05-30 18:08:45

ansaurus

tags:

views:

answers:

finding ALL cycles in a huge sparse matrix

related questions