clustering

Clustering in ServiceMix 4

Hello, I'm trying to configure Apache ServiceMix 4 to provide load balancing feature mentioned in it's documentation (for example here: http://servicemix.apache.org/clustering.html). Although it's mentioned, I couldn't find the exact way how to do it. The idea is to have 2 ServiceMixes (in LAN, for example) with the same OSGi service i...

hierarchical clustering on correlations in Python scipy/numpy?

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically clustering by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy array "X" that contains ...

Which machine learning library to use

I am looking for a library that, ideally, has the following features: implements hierarchical clustering of multidimensional data (ideally on similiarity or distance matrix) implements support vector machines is in C++ is somewhat documented (this one seems to be hardest) I would like this to be in C++, as I am most comfortable with ...

Ejabberd clustering problem with amazon EC2 server

Hello Guys! I have been trying to install ejabberd server on Amazons EC2 instance. I am kinds a stuck at this step right now. I am following this guide: http://tdewolf.blogspot.com/2009/07/clustering-ejabberd-nodes-using-mnes... From the guide I have sucessfully completed the Set up First Node (on ejabberd1) part. But am stuck in part 4 ...

Physically cluster a MYSQL table for better performance

I have a 30M row table, which is indexed by ID now. However, a lot of the requests are around a much smaller subset, which is identified by app_user=1 Does it make sense to physically cluster the table according to this? If so, how can I do so? ...

Clustering [assessment] algorithm with distance matrix as an input

Can anyone suggest some clustering algorithm which can work with distance matrix as an input? Or the algorithm which can assess the "goodness" of the clustering also based on the distance matrix? At this moment I'm using a modification of Kruskal's algorithm (http://en.wikipedia.org/wiki/Kruskal%27s_algorithm) to split data into two clu...

Out of memory error while using clusterdata in MATLAB

Hi, I am trying to cluster a Matrix (size: 20057x2).: T = clusterdata(X,cutoff); but I get this error: ??? Error using ==> pdistmex Out of memory. Type HELP MEMORY for your options. Error in ==> pdist at 211 Y = pdistmex(X',dist,additionalArg); Error in ==> linkage at 139 Z = linkagemex(Y,method,pdistArg); Error in ==>...

What cluster node should be active?

There is some cluster and there is some unix network daemon. This daemon is started on each cluster node, but only one can be active. When active daemon breaks (whether program breaks of node breaks), other node should become active. I could think of few possible algorithms, but I think there is some already done research on this and s...

Which clustering method is suitable for which kind of data?

I would like to know K-means is best suited for clustering of which type of data? When k-means fails? for which type of data set k-means does not give accurate answer? COBWEB is best suited for clustering of which type of data? When COBWEB fails? for which type of data set COBWEB does not give accurate answer? ...

I want to do some experiment on SQL Server cluster to gain some experience on it, where should I start?

Hi guys, I want to do some experiment on SQL Server cluster to gain some experience on it, where should I start? I just have some basic concept of cluster and don't have any experience. Anyone can tell me some material that I can reference to? Great thanks. ...

plotting results of hierarchical clustering ontop of a matrix of data in python

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is in the bottom of the following figure: http://www.coriell.org/images/microarray.gif I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. H...

computing z-scores for 2D matrices in scipy/numpy in Python

How can I compute the z-score for matrices in Python? Suppose I have the array: a = array([[ 1, 2, 3], [ 30, 35, 36], [2000, 6000, 8000]]) and I want to compute the z-score for each row. The solution I came up with is: array([zs(item) for item in a]) where zs is in scipy.stats.stats. Is there a...

How to use R-Tree for plotting large number of map markers on google maps

After searching SO and multiple articles I haven't found a solution to my problem. What I am trying to achieve is to load 20,000 markers on Google Maps. R-Tree seems like a good approach but it's only helpful when searching for points within the visible part of the map. When the map is zoomed out it will return all of the points and....

is there a way to get a "subtree" from hclust ? (R)

Hello all, I wish to create a "subtree" from an hclust object. For example, let's say I have the following object: a <- list() # initialize empty object a$merge <- matrix(c(-1, -2, -3, -4, 1, 2, -5,-6, 3,4), nc=2, byrow=TRUE ) a$height <- c(1, 1.5, 3,4,4.5) # def...

How to create a "Clustergram" plot ? (in R)

Hi all, I came across this interesting website, with an idea of a way to visualize a clustering algorithm called "Clustergram": I am not sure how useful this really is, but in order to play with it I would like to reproduce it with R, but am not sure how to go about doing it. How would you create a line for each item so it would sta...

k-means clustering in R on very large, sparse matrix?

Hello, I am trying to do some k-means clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have th...

What technologies exist for app level clustering?

Let's say I have an application that is guaranteed to overwhelm one server even after optimizations. Is there a technology that allows a web application to be split over multiple servers while maintaining its state? Take for example a multiple player online game. Usually in World of War craft there are multiple servers with each one hav...

C# - Data Clustering approach

Hi all, I am writing a program in C# in which I have a set of 200 points displayed on an image. However, the points tend to cluster in various regions, and I am looking to find a way to "cluster." In other words, maybe draw a circle/ellipse around the clustered points. Has anyone seen any way to do this? I have heard about K-means clu...

Tomcat cluster installation in a cloud

In a Cloud where multiple VMs are running and with IP addresses being dynamic (dhcp), what is the approach to set up a tomcat cluster. Kindly share your experiences and ideas. One way I can think of is that since dhcp address range is known, the worker.properties file could be created (generated) with all IP addresses in the given range...

Clustering on WebLogic exception on Failover

Hi all, I deploy an application on a WebLogic 10.3.2 cluster with two nodes, and a load balancer in front of the cluster. I have set the <core:init distributable="true" debug="true" /> My Session and Conversation classes implement Serializable. I start using the application being served by the first node. The console shows that the...