tags:

views:

2121

answers:

4
+4  Q: 

Matrices in python

Yesterday I had the need for a matrix type in python.

Apparently, a trivial answer to this need would be to use numpy.matrix(), but the additional issue I have is that I would like a matrix to store arbitrary values with mixed types, similarly to a list. numpy.matrix does not perform this. An example is

>>> numpy.matrix([[1,2,3],[4,"5",6]])
matrix([['1', '2', '3'],
        ['4', '5', '6']], 
       dtype='|S4')
>>> numpy.matrix([[1,2,3],[4,5,6]])
matrix([[1, 2, 3],
        [4, 5, 6]])

As you can see, the numpy.matrix must be homogeneous in content. If a string value is present in my initialization, every value gets implicitly stored as a string. This is also confirmed by accessing the single values

>>> numpy.matrix([[1,2,3],[4,"5",6]])[1,1]
'5'
>>> numpy.matrix([[1,2,3],[4,"5",6]])[1,2]
'6'

Now, the python list type can instead accept mixed types. You can have a list containing an integer and a string, both conserving their type. What I would need is something similar to a list, but operating in a matrix-like behavior.

Therefore, I had to implement my own type. I had two choices for the internal implementation: list containing lists, and dictionaries. Both solutions have shortcomings:

  • list of lists require careful synchronization of the various lists' sizes. Swapping two rows is easy. Swapping two columns is less easy. Removing a row is easy as well.
  • dictionaries (with a tuple as a key) are slightly better, but you have to define the limits of your key (eg. you cannot insert element 5,5 if your matrix is 3x3), and they are more complex to use to insert, remove, or swap columns or rows.

Edit: clarification. The concrete reason on why I need this functionality is because I am reading csv files. Once I collect the values from a csv files (values that can be string, integers, floats) I would like to perform swapping, removal, insertion and other operations alike. For this reason I need a "matrix list".

My curiosities are:

  • do you know if a python data type providing this service already exists (maybe in a "non-battery included" library out there) ?
  • why is this data type not provided in the standard library ? Too restricted interest maybe?
  • How would you have solved this need ? dictionary, list, other smarter solution ?
+3  A: 

I'm curious why you want this functionality; as I understand it, the reason for having matrices (in numpy), is primarily for doing linear math (matrix transformations and so on).

I'm not sure what the mathematical definition would be for the product of a decimal and a String.

Internally, you'll probably want to look at sparse matrix implementations (http://www.inf.ethz.ch/personal/arbenz/pycon03_contrib.pdf). There are lots of ways to do this (hash, list, linked list), and each has its own advantages and drawbacks. If your matrix isn't going to have a lot of nulls or zeroes, then you can ditch the sparse implementations.

Don Werve
The reason is that I need to store arbitrary values read from a csv file, and these values can be either strings, or floats, or integers.The csv module allows me to read a csv file line by line, but for storing it as a "matrix" it's up to me.
Stefano Borini
You're not describing any kind of "matrix" at all. You're describing a list of tuples.
kquinn
Yes, it is. But unfortunately, I want to be able to perform operations on this list of tuples, similar to operations I would do on a matrix: swapping rows, swapping columns, inserting/removing columns, etc...
Stefano Borini
@Stefano: just because a matrix object happens to have methods to swap/insert/remove rows and columns doesn't mean it's the data structure you should use. Matrices are for things you can multiply, add, subtract, exponentiate, compute eigenvalues and eigenvectors of, etc.
David Zaslavsky
@David: a mathematical matrix yes. I agree. But this is not a mathematical matrix. It's more of a table. So I would say that I used the word "matrix" in its very general meaning, rather than mathematical.
Stefano Borini
+6  A: 

you can have inhomogenious types if your dtype is object:

In [1]: m = numpy.matrix([[1, 2, 3], [4, '5', 6]], dtype=numpy.object)
In [2]: m
Out[2]: 
matrix([[1, 2, 3],
        [4, 5, 6]], dtype=object)
In [3]: m[1, 1]
Out[3]: '5'
In [4]: m[1, 2]
Out[4]: 6

i have no idea what good this does you other than fancy indexing, because as Don pointed out, you can't do math with this matrix.

Autoplectic
+3  A: 

Have you looked at the numpy.recarray capabilities?

For instance here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html

It's designed to allow arrays with mixed datatypes.

I don't know if an array will suit your purposes, or if you really need a matrix - I haven't worked with the numpy matrices. But if an array is good enough, recarray might work.

the only issue i see with using recarrays (perhaps i'm wrong, i don't have any real experience with them) is that he would be unable to exchange columns easily.
Autoplectic
I would say that is swapping rows and columns is necessary, an array is simple not the right data structure. You need each item to be independent from each other.
David Cournapeau
A: 

Have you considered the csv module for working with csv files?

Python docs for csv module

saffsd
csv provides data row by row, and returns the record only as strings. I am using csv for reading the file, but then I need a smarter data type to perform the tranformations I need.
Stefano Borini