ansaurus

Question

Answer 1

A:

If you want an adjacency matrix, you can do something like:

from scipy.sparse import *
from scipy import *
from numpy import *
import csv
S = dok_matrix((10000,10000), dtype=bool)
f = open("your_file_name")
reader = csv.reader(f)
for line in reader:
    S[int(line[0]),int(line[1])] = True

tkerwin 2009-12-21 09:04:45

Answer 2

A:

You might also be interested in Networkx, a pure python network/graphing package.

From the website:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

>>> import networkx as nx
>>> G=nx.Graph()
>>> G.add_edge(1,2)
>>> G.add_node("spam")
>>> print G.nodes()
[1, 2, 'spam']
>>> print G.edges()
[(1, 2)]

mavnn 2009-12-21 09:25:29

Answer 3

+1 A:

Example using lil_matrix (list of list matrix) of scipy.

Row-based linked list matrix.

This contains a list (self.rows) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data) of lists of these elements.

$ cat 1938894-simplified.csv
0,32
1,21
1,23
1,32
2,23
2,53
2,82
3,82
4,46
5,75
7,86
8,28

Code:

#!/usr/bin/env python

import csv
from scipy import sparse

rows, columns = 10, 100
matrix = sparse.lil_matrix( (rows, columns) )

csvreader = csv.reader(open('1938894-simplified.csv'))
for line in csvreader:
    row, column = map(int, line)
    matrix.data[row].append(column)

print matrix.data

Output:

[[32] [21, 23, 32] [23, 53, 82] [82] [46] [75] [] [86] [28] []]

The MYYN 2009-12-21 09:29:16

Exactly what I needed. Any good resources for scipy that you can recommend?

Ankur Chauhan 2009-12-21 09:54:20

i guess http://docs.scipy.org/doc/ would be a starting point ..

The MYYN 2009-12-21 09:56:26

One small question. The numbers in the csv are not the indices. they are Ids ie the file starts with 0001001,93040450001001,93081220001001,93090970001001,93110420001001,94011390001001,94041510001001,94070870001001,94080990001001,95010300001001,9503124So how do i convert these IDs to numerical indices, the ID server the purpose of just identifying nodes, they may be replaced by equivalent indices if they are unique.How do I accomplish this. I know I can just make rows and columns as big as the largest ID but that seems wasteful as the nodes like with indices 0 - 1001 are wasted.

Ankur Chauhan 2009-12-21 10:01:24

i understand your concern and i assume, there is no one best way to 'compress' your data to the relevant elements. it depends largely on your goal, what you want to do with the data later. e.g. you could use a 'mapping dictionary' which maps the actual ids to some smaller numerical values ...

The MYYN 2009-12-21 10:17:25

If you do want to 'squeeze' your indices so that they start at 0 and go up in increments of 1 to some maximum, why not (1) sort them producing `sorted_ixs` (`sorted_ixs = ixs; sorted_ixs.sort()`), (2) `zip(sorted_ixs, range(len(sorted_ixs))` producing a list of pairs matching an index with a 'squeezed index', (3) use the list as a 'translation table' from old to new indices.

Michał Marczyk 2009-12-21 21:36:58

Actually this will also sort `ixs`, I think; use `sorted_ixs = ixs[:]` if you want to keep your unsorted `ixs` around.

Michał Marczyk 2009-12-21 21:37:49

ansaurus

tags:

views:

answers:

csv to sparse matrix in python

related questions