data-analysis

Building an index of URLs , what features to include?

I am working towards building an index of URLs. The objective is to build and store a data structure which will have key as a domain URL (eg. www.nytimes.com) and the value will be a set of features associated with that URL. I am looking for your suggestions for this set of features. For example I would like to store www.nytimes.com as f...

Trend analysis using iterative value increments

We have configured iReport to generate the following graph: The real data points are in blue, the trend line is green. The problems include: Too many data points for the trend line Trend line does not follow a Bezier curve (spline) The source of the problem is with the incrementer class. The incrementer is provided with the data p...

Best fit curve for trend line

Problem Constraints Size of the data set, but not the data itself, is known. Data set grows by one data point at a time. Trend line is graphed one data point at a time (using a spline/Bezier curve). Graphs The collage below shows data sets with reasonably accurate trend lines: The graphs are: Upper-left. By hour, with ~24 data ...

Finding Common Phrases in SQL Server TEXT Column

Short Desc: I'm curious to see if I can use SQL Analysis services or some other SQL Server service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset. Long Desc I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking ...

Resources to learn about engineering aspects of data analytics (OLAP, warehousing, ETL, etc.)

I'm a math/stats guy, interested in learning more about the engineering aspects of "data analytics" (probably an overly broad term, but this is definitely a case of "I don't know what I don't know", so I'm not sure how to be more specific). I'm fine with manipulating and analyzing the data once it's already stored somewhere and I can ac...

Accelerometer data analysis

Hello, I would like to know if there are some libraries/algorithms/techniques (python, if at all possible) that help to extract features from accelerometer data (extracted from and android phone, btw), like periodicity of movements, energy of acceleration and the like. Has anyone done this kind of task before? Thank you very much in adv...

Non-linear regression models in PostgreSQL using R

Background I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section): The primary purpose of the web appli...

linking info of pairs of respondents (couples) in SPSS

I am preparing for analyses of the determinants of partner choice in SPSS, but basically I can't get off the ground because I don't know how to create new variables based on the information of each respondent's spouse (i.e. education, wages, social background, ethnicity etc.). Each respondent is currently identified by an ID#, and exis...

Finding Weekly Summary from a table

I am here to write a SQL statement for finding the weekly summary from a table. I had a table with following fields: UIN, Date, Staff, work_hours Now I would like to gather information how many hours has a staff worked in one week. ...

Looking for an estimation method (data analysis)

Hi! Since I have no idea about what I am doing right now, my wording may sound funny. But seriously, I need to learn. The problem I'm facing is to come up with a method (model) to estimate how a software program works: namely running time and maximal memory usage. What I already have are a large amount of data. This data set gives an o...

Segmenting a set of data with discrete and continuos data values into one of two groups without using analysis services?

Say I have a table with the following scheme (note: this example is hypothetical, though the real use case is similar). Type | Name | Notes ===================================================================================== Gender | Gender | Either Male or Female (not null) GeoCoord | Location | Lattitude an...

save DataTable to database

Hi, I am generating a DataTable from a webservice and i would like to save the whole DataTable into one database table. DataTable ds = //get info from webservice The DataTable is getting generated but What to do next .I am getting stuck .Show me some syntax.I dont really need the select statement there either, i just want to insert al...

What process do i take to understand an existing system for a very large hospital for example?

I've been asked to study and document the existing system for a tertiary hospital. The hospital consist of administrative (Accounts, Admin, Engineering,...) and clinic units (Pharmacy, The process by which a patient gets in and out of the hospital,...). I would need to understand how data flows in and out and the business process. It'...

Binarization in excel

How would you perform binarization of an attribute with five categorical values in excel? ...

R and SPSS difference

i will be analysing vast amount of network traffic related data shortly. i will pre-process the data in order to analyse it. i have found that R and SPSS are among the most popular tools for statistical analysis. i will also be generating quite a lot of graphs and charts. so i was wondering what is the basic difference between these two ...

Fitting polynomial model to data in R

I've read the answers to this question and they are quite helpful, but I need help particularly in R. I have an example data set in R as follows: x <- c(32,64,96,118,126,144,152.5,158) y <- c(99.5,104.8,108.5,100,86,64,35.3,15) I want to fit a model to these data so that y = f(x). I want it to be a 3rd order polynomial model. How...

Parsing semi-structured data - can I use any classifiers?

I've got a set of documents which have a semi-regular format. Rows are typically separated by new line characters, and the main components of each row are separated by spaces. Some examples are a set of furniture assembly instructions, a set of table of contents, a set of recipes and a set of bank statements. The problem is that each s...