scipy

How to calculate cointegrations of two lists?

Hello everybody! Thank you in advance for your help! I have two lists with some stocks prices, example: a = [10.23, 11.65, 12.36, 12.96] b = [5.23, 6.10, 8.3, 4.98] I can calculate the correlation of these two lists, with: import scipy.stats scipy.stats.pearsonr(a, b)[0] But, I didn't found a method to calculate the co-integrat...

Fitting a gamma distribution with (python) Scipy

Hi folks, Can anyone help me out in fitting a gamma distribution in python? Well, I've got some data : X and Y coordinates, and I want to find the gamma parameters that fit this distribution... In the Scipy doc, it turns out that a fit method actually exists but I don't know how to use it :s.. first, in wich format the argument "data" ...

hierarchical clustering on correlations in Python scipy/numpy?

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically clustering by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy array "X" that contains ...

scipy.io typeerror:buffer too small for requested array

I have a problem in python. I'm using scipy, where i use scipy.io to load a .mat file. The .mat file was created using MATLAB. listOfFiles = os.listdir(loadpathTrain) for f in listOfFiles: fullPath = loadpathTrain + '/' + f mat_contents = sio.loadmat(fullPath) print fullPath Here's the error: Tra...

ANCOVA in Python with Scipy/Numpy stats

I would like to know a way of performing ANCOVA(analysis of covariance) using Python with scipy. It is basically a statistical comparison of regression lines. I know Python can do ANOVA and it can also do regression line fitting with Scipy.stats. I'm not sure how to put those together to get an effective ANCOVA though, if it is possible....

problem with hierarchical clustering in Python

I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (i.e. 1 - Pearson correlation). My code is the following (the data is in a variable called "data"): from hcluster import * Y = pdist(data, 'correlation') cluster_type = 'average' Z = linkage(Y, cluster_type) dendrogram(Z) The error I get is: ...

taking intersection of N-many lists in python

what's the easiest way to take the intersection of N-many lists in python? if I have two lists a and b, I know I can do: a = set(a) b = set(b) intersect = a.intersection(b) but I want to do something like a & b & c & d & ... for an arbitrary set of lists (ideally without converting to a set first, but if that's the easiest / most eff...

getting smallest of coordinates that differ by N or more in Python

suppose I have a list of coordinates: data = [ [(10, 20), (100, 120), (0, 5), (50, 60)], [(13, 20), (300, 400), (100, 120), (51, 62)] ] and I want to take all tuples that either appear in each list in data, or any tuple that differs from all tuples in lists other than its own by 3 or less. How can I do this efficiently in Pyt...

Easiest way to plot values as symbols in scatter plot?

In an answer to an earlier question of mine regarding fixing the colorspace for scatter images of 4D data, Tom10 suggested plotting values as symbols in order to double-check my data. An excellent idea. I've run some similar demos in the past, but I can't for the life of me find the demo I remember being quite simple. So, what's the e...

Dendrogram generated by scipy-cluster does not show

I am using scipy-cluster to generate a hierarchical clustering on some data. As a final step of the application, I call the dendrogram function to plot the clustering. I am running on Mac OS X Snow Leopard using the built-in Python 2.6.1 and this matplotlib package. The program runs fine, but at the end the Rocket Ship icon (as I underst...

hierarchical clustering with gene expression matrix in python

how can I do a hierarchical clustering (in this case for gene expression data) in Python in a way that shows the matrix of gene expression values along with the dendrogram? What I mean is like the example here: http://www.mathworks.cn/access/helpdesk/help/toolbox/bioinfo/ug/a1060813239b1.html shown after bullet point 6 (Figure 1), whe...

averaging matrix efficiently

in Python, given an n x p matrix, e.g. 4 x 4, how can I return a matrix that's 4 x 2 that simply averages the first two columns and the last two columns for all 4 rows of the matrix? e.g. given: a = array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]) return a matrix that has the aver...

Definitive way to parse alphanumeric CSVs in Python with scipy/numpy

I've been trying to find a good and flexible way to parse CSV files in Python but none of the standard options seem to fit the bill. I am tempted to write my own but I think that some combination of what exists in numpy/scipy and the csv module can do what I need, and so I don't want to reinvent the wheel. I'd like the standard feature...

speeding up parsing of files

the following function parses a CSV file into a list of dictionaries, where each element in the list is a dictionary where the values are indexed by the header of the file (assumed to be the first line.) this function is very very slow, taking ~6 seconds for a file that's relatively small (less than 30,000 lines.) how can I speed it up...

plotting results of hierarchical clustering ontop of a matrix of data in python

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is in the bottom of the following figure: http://www.coriell.org/images/microarray.gif I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. H...

computing z-scores for 2D matrices in scipy/numpy in Python

How can I compute the z-score for matrices in Python? Suppose I have the array: a = array([[ 1, 2, 3], [ 30, 35, 36], [2000, 6000, 8000]]) and I want to compute the z-score for each row. The solution I came up with is: array([zs(item) for item in a]) where zs is in scipy.stats.stats. Is there a...

Selecting dictionary items by key efficiently in Python

suppose I have a dictionary whose keys are strings. How can I efficiently make a new dictionary from that which contains only the keys present in some list? for example: # a dictionary mapping strings to stuff mydict = {'quux': ..., 'bar': ..., 'foo': ...} # list of keys to be selected from mydict keys_to_select = ...

A good matplot tutorial (from beginner to intermidiate)?

Can anyone recommend a good matplot tutorial. I am a complete beginner - but have used similar software (matlab, R etc), in my halcyon days at University (i.e. a long time ago). A google search brings up a list of dubious quality, and the 'official' docs are too terse, or provide examples that are more 'edge case' (e.g. drawing dolphins...

displaying a colored 2d array in matplotlib in Python

I'd like to plot a 2-d matrix from numpy as a colored matrix in Matplotlib. I have the following 9-by-9 array: my_array = diag(ones(9)) # plot the array pcolor(my_array) I'd like to set the first three elements of the diagonal to be a certain color, the next three to be a different color, and the last three a different color. I'd l...

how to set a fixed color bar for pcolor in python matplotlib?

I am using pcolor with a custom color map to plot a matrix of values. I set my color map so that low values are white and high values are red, as shown below. All of my matrices have values between 0 and 20 (inclusive) and I'd like 20 to always be pure red and 0 to always be pure white, even if the matrix has values that don't span the...