scipy

Vectorization of index operation for a scipy.sparse matrix

The following code runs too slowly even though everything seems to be vectorized. from numpy import * from scipy.sparse import * n = 100000; i = xrange(n); j = xrange(n); data = ones(n); A=csr_matrix((data,(i,j))); x = A[i,j] The problem seems to be that the indexing operation is implemented as a python function, and invoking A[i,...

Python to MATLAB: exporting list of strings using scipy.io

I am trying to export a list of text strings from Python to MATLAB using scipy.io. I would like to use scipy.io because my desired .mat file should include both numerical matrices (which I learned to do here) and text cell arrays. I tried: import scipy.io my_list = ['abc', 'def', 'ghi'] scipy.io.savemat('test.mat', mdict={'my_list': my...

Python Least-Squares Natural Splines

I am trying to find a numerical package which will fit a natural spline which minimizes weighted least squares. There is a package in scipy which does what I want for unnatural splines. import numpy as np import matplotlib.pyplot as plt from scipy import interpolate, randn x = np.arange(0,5,1.0/6) xs = np.arange(0,5,1.0/500) y = np....

Reordering matrix elements to reflect column and row clustering in naiive python

Hello, I'm looking for a way to perform clustering separately on matrix rows and than on its columns, reorder the data in the matrix to reflect the clustering and putting it all together. The clustering problem is easily solvable, so is the dendrogram creation (for example in this blog or in "Programming collective intelligence"). Howev...

STFT and ISTFT in Python

Is there any form of short-time Fourier transform with corresponding inverse transform built into SciPy or NumPy or whatever? There's the pyplot specgram function in matplotlib, which calls ax.specgram(), which calls mlab.specgram(), which calls _spectral_helper(): #The checks for if y is x are so that we can use the same function to ...

Numpy histogram of large arrays

I have a bunch of csv datasets, about 10Gb in size each. I'd like to generate histograms from their columns. But it seems like the only way to do this in numpy is to first load the entire column into a numpy array and then call numpy.histogram on that array. This consumes an unnecessary amount of memory. Does numpy support online binnin...

Calculate Matrix Rank using scipy

I'd like to calculate the mathematical rank of a matrix using scipy. The most obvious function numpy.rank calculates the dimension of an array (ie. scalars have dimension 0, vectors 1, matrices 2, etc...). I am aware that the numpy.linalg.lstsq module has this capability, but I was wondering if such a fundamental operation is built into ...

Stretch array (Numpy, Python)

I have a numpy array [1,2,3,4,5,6,7,8,9,10,11,12,13,14] and want to have an array structured like [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]]. Sure this is possible by looping over the large array and adding arrays of length four to the new array, but I'm curious if there is some secret 'magic' Python method doing just this :)...

Using numpy.apply

What's wrong with this snippet of code? import numpy as np from scipy import stats d = np.arange(10.0) cutoffs = [stats.scoreatpercentile(d, pct) for pct in range(0, 100, 20)] f = lambda x: np.sum(x > cutoffs) fv = np.vectorize(f) # why don't these two lines output the same values? [f(x) for x in d] # => [0, 1, 2, 2, 3, 3, 4, 4, 5, 5]...

Scipy sparse... arrays?

Hey, folks. So, I'm doing some Kmeans classification using numpy arrays that are quite sparse-- lots and lots of zeroes. I figured that I'd use scipy's 'sparse' package to reduce the storage overhead, but I'm a little confused about how to create arrays, not matrices. I've gone through this tutorial on how to create sparse matrices: h...

Scipy.cluster.hierarchy.fclusterdata + distance measure

1) I am using scipy's hcluster module. so the variable that I have control over is the threshold variable. How do I know my performance per threshold? i.e. In Kmeans, this performance will be the sum of all the points to their centroids. Of course, this has to be adjusted since more clusters = less distance generally. Is there an obse...

nonzeros in csr_matrix in scipy.sparse matrices

Dear all, There is a nonzero() method for the csr_matrix of scipy library, however trying to use that function for csr matrices result in an error, according to the manual that should return a tuple with row and colum arrays. Any ideas on this problem? Best regards, Umut ...

Scipy Negative Distance? What?

I have a input file which are all floating point numbers to 4 decimal place. i.e. 13359 0.0000 0.0000 0.0001 0.0001 0.0002` 0.0003 0.0007 ... (the first is the id). My class uses the loadVectorsFromFile method which multiplies it by 10000 and then int() these numbers. On top of that, I also loop through each ...

Compiling scipy on Windows 32-bit: linker error with libf77blas.a

Has anyone tried compiling SciPy 0.7.1 on Windows using numpy-1.3.0 that was built with the pre-built ATLAS libraries (atlas3.6.0_WinNT_P4SSE2.zip) linked in the installation document. I get the following linker error, and have no ideas as to how to fix this issue. $ python setup.py config --compiler=mingw32 build --compiler=mingw32 i...

sampling integers uniformly efficiently in python using numpy/scipy

I have a problem where depending on the result of a random coin flip, I have to sample a random starting position from a string. If the sampling of this random position is uniform over the string, I thought of two approaches to do it: one using multinomial from numpy.random, the other using the simple randint function of Python standard...

efficiently finding the interval with non-zeros in scipy/numpy in Python?

suppose I have a python list or a python 1-d array (represented in numpy). assume that there is a contiguous stretch of elements how can I find the start and end coordinates (i.e. indices) of the stretch of non-zeros in this list or array? for example, a = [0, 0, 0, 0, 1, 2, 3, 4] nonzero_coords(a) should return [4, 7]. for: b = ...

Building up an array in numpy/scipy by iteration in Python?

Often, I am building an array by iterating through some data, e.g.: my_array = [] for n in range(1000): # do operation, get value my_array.append(value) # cast to array my_array = array(my_array) I find that I have to first build a list and then cast it (using "array") to an array. Is there a way around these? all these casting c...

vectorizing a for loop in numpy/scipy?

I'm trying to vectorize a for loop that I have inside of a class method. The for loop has the following form: it iterates through a bunch of points and depending on whether a certain variable (called "self.condition_met" below) is true, calls a pair of functions on the point, and adds the result to a list. Each point here is an element i...

Adding a numpy array to a scipy.sparse.dok_matrix

Hi together, I have a scipy.sparse.dok_matrix (dimensions m x n), wanting to add a flat numpy-array with length m. for col in xrange(n): dense_array = ... dok_matrix[:,col] = dense_array However, this code raises an Exception in dok_matrix.__setitem__ when it tries to delete a non existing key (del self[(i,j)]). So, for now ...

Fitting Gaussian KDE in numpy/scipy in Python

I am fitting a Gaussian kernel density estimator to a variable that is the difference of two vectors, called "diff", as follows: gaussian_kde_covfact(diff, smoothing_param) -- where gaussian_kde_covfact is defined as: class gaussian_kde_covfact(stats.gaussian_kde): def __init__(self, dataset, covfact = 'scotts'): self.covfac...