Reducing dimension of dataset | ansaurus

tags:

views:

27

answers:

1

Q:

Reducing dimension of dataset

Hi,

I'm trying to reduce dataset dimension. PCA is a good metric but that gives me new dataset. My goal is to determine from number of events (e.g. 60) and number of trials (e.g. 6) which events are more relevant.

For example:

1st, 3rd, 21st, 45th ... (N total) events are good enough to approximate behavior of dataset.
That will allow me to discard 60-N events, and to deal with only N.

For now, I'm calculating covariance matrix, and take events for which correlation is smaller than some threshold.
Is there some official metric or math function for this???

Thanks.

A:

What you are describing is not dimensionality reduction, but rather sampling. If your data is labeled (which I couldn't understand from your question), then most probably you would want to perform stratified sampling - a random sampling that ensures that each label is sampled with a probability that approximately equals to that in the original data set. See this Wikipedia article on sampling techniques. It provides a list of good reading material on this matter

bgbg 2010-10-04 06:46:43

related questions

.NET Table Adapters: Get vs. Fill?

How can I read multiple tables into a dataset?

.Net: Convert Generic List of Objects to DataSet

Synchronize DataSet

How to wire a middle tier of Objects to a data tier consisting of a DataSet?

How do I refresh the relationships in a dataset?

What triggers ConstraintException when loading DataSet?

Why is a SQL float different from a C# float

Point ADO.Net DataSet to different databases at runtime?

Should I return a strongly typed dataset from a webservice?

How many DataTable objects should I use in my C# app?

How to convert Typed DataSet Scheme when one of the types was changed?

DataSet.Select and DateTime

Select rows in dataset table based on other dataset table

Object initialization in C# (.Net)

What are the disadvantages of Typed DataSets

Read Access File into a DataSet

DBUnit dataset generation

Handling XSD Dataset ConstraintExceptions

VB.NET Get underlying system.type from nullable type

Get the DefaultView DataRowView from a DataRow

PHP Script to populate MySQL tables

C#: What Else Do You Use Besides DataSet

Is there some way of recycling a Crystal Reports dataset?

Datatable vs Dataset