views:

31

answers:

1

I want to run some analysis on networked data having multiple modes(i.e. multiple types of network nodes) and multiplex relations(i.e. multiples types of network edges).

The analysis is probably about SNA or applying any algorithm from graph theory, e.g. tie strength, centrality, betweenness, node distance, block, cluster, etc.

The source data is rather unstructured, therefore I should at first think about how I represent, store, and retrieve the data.

Following are some ideas. I would appreciate any feedback or further suggestion.:)

I know that there are already some great NoSQL databases, for example Neo4J, InfoGrid, for such kind of application. But for some extensibility reasons (e.g. licence, web standard...) I would like to prefer using RDF to store and represent my data. The tools to use would be SESAME or JENA.

the idea to represent network/graph data with RDF is trivial. For example:

Network/Graph data

         *Alice* ----lend 100USD----> *Bob* ----- likes ----> *Skiing*

represented with RDF

         *Alice* --src--> *lend_relation* <---target--- *Bob* ---likes---> *Skiing*
                                  |
                               has_value                                   
                                 \|/
                               *100USD*  

         [Alice         src       lend_relation]
         [Bob           target    lend_relation]
         [lend_relation has_value 100USD] 
         [Bob           likes     Skiing]

However, the problem is that RDF as well as SPARQL lacks of perspectives of graph model. It is not efficient to traverse between nodes or find (the shortest) distance with RDF query. It must be done with some extra analysis tools, for example JUNG or JGarphT, and I must at first construct a sub graph by querying RDF storage and then convert it into the data model used by JUNG or JGraphT. If I want extra visualization (neither from JUNG nor JGraphT), then I must construct another data model for the visualization toolkit. I don't know if that is a clear or efficient integration.

thanks again for any suggestion!

+1  A: 

If you want to do network analysis of your RDF data with SPARQL you can have a look at SPARQL 1.1 Property Paths. I believe that in Jena/ARQ it's been already implemented ARQ - Property Paths.

Property Paths, from the new spec of SPARQL, allows you to query the RDF data model by defining graph patterns. Graph patterns that are a bit more complex than the ones you could define in SPARQL 1.0.

With this feature plus some logic at the application level you might be able to implement some interesting network analysis over your data.

msalvadores