ansaurus

Question

Answer 1

A:

Four questions in one question is extremely awkward -- why not make a question one question? It's not as if it would cost you;-). Anyway, wrt "How do I indicate an empty value?", see the docs regarding attribute value of instances of Orange.Value:

If value is continuous or unknown, no descriptor is needed. For the latter, the result is a string '?', '~' or '.' for don't know, don't care and other, respectively.

I'm not sure if by empty you mean "don't know" or "don't care", but anyway you can indicate either. Take care about distances, however -- from this other page in the docs:

Unknown values are treated correctly only by Euclidean and Relief distance. For other measure of distance, a distance between unknown and known or between two unknown values is always 0.5.

The distances listed in this latter page are Hamming, Maximal, Manhattan, Euclidean and Relief (the latter is like Manhattan but with correct treatment of unknown values) -- no Cosine distance provided: you'll have to code it yourself.

For (4), with just a little Python code you can obviously format results in any way you want. The .clusters attribute of a KMeans object is a list, exactly as long as the number of data instances: if what you want is a list of lists of data instances, for example:

def loldikm(data, **k):
  km = orange.KMeans(data, **k)
  results = [[] for _ in km.centroids]
  for i, d in zip(km.clusters, data):
    results[i].append(d)

Alex Martelli 2010-02-07 17:02:51

ansaurus

tags:

views:

answers:

Python KMeans Orange Framework

related questions