views:

1478

answers:

3

I have a set of X,Y data points (about 10k) that are easy to plot as a scatter plot but that I would like to represent as a heatmap.

I looked through the examples in MatPlotLib and they all seem to already start with heatmap cell values to generate the image.

Is there a method that converts a bunch of x,y, all different, to a heatmap (where zones with higher frequency of x,y would be "warmer")?

+2  A: 

Make a 2-dimensional array that corresponds to the cells in your final image, called say heatmap_cells and instantiate it as all zeroes.

Choose two scaling factors that define the difference between each array element in real units, for each dimension, say x_scale and y_scale. Choose these such that all your datapoints will fall within the bounds of the heatmap array.

For each raw datapoint with x_value and y_value:

heatmap_cells[floor(x_value/x_scale),floor(y_value/y_scale)]+=1

meepmeep
Numpy has a function for that...
ptomato
+6  A: 

In matplotlib's plotting lexicon, i think you want a "hexbin". If you're not familiar with hexbin, it's just a bivariate histogram in which the xy plane is tessellated by a regular grid of hexagons. Once you've done that, you can just count the number of points falling in each hexagon, map some value windows to colors, and you've got a hexbin diagram. (The choice of hexagon as the bin geometry is intuitive--hexagons have nearest-neighbor symmetry (e.g., square bins don't) and hexagon is the highest n-polygon that gives regular plane tessellation).

You want a 'heat map' from x, y data, so:

from matplotlib import pyplot as PLT
from matplotlib import cm as CM
from matplotlib import mlab as ML
import numpy as NP

n = 1e5
x = y = NP.linspace(-5, 5, 100)
X, Y = NP.meshgrid(x, y)
Z1 = ML.bivariate_normal(X, Y, 2, 2, 0, 0)
Z2 = ML.bivariate_normal(X, Y, 4, 1, 1, 1)
ZD = Z2 - Z1
x = X.ravel()
y = Y.ravel()
z = ZD.ravel()
gridsize=30
PLT.subplot(111)
# if "bins=None", then color of each hexagon corresponds directly to its count
# "C" is optional--it maps values to x, y coordinates; if C is None (default) then 
# the result is a pure 2D histogram 
PLT.hexbin(x, y, C=z, gridsize=gridsize, cmap=CM.jet, bins=None)
PLT.axis([x.min(), x.max(), y.min(), y.max()])
cb = PLT.colorbar()
cb.set_label('mean value')
PLT.show()   

alt text

doug
+4  A: 

If you don't want hexagons, you can use numpy's histogram2d function:

import numpy as N
import numpy.random
import matplotlib.pyplot as P

# Generate some test data
x = N.random.randn(8873)
y = N.random.randn(8873)

heatmap, xedges, yedges = N.histogram2d(x, y, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]

P.clf()
P.imshow(heatmap, extent=extent)
P.show()

This makes a 50x50 heatmap. If you want, say, 512x384, you can put bins=(512, 384) in the call to histogram2d.

ptomato