tags:

views:

81

answers:

2

I need to get a plot of a Lorentz curve of a cumulative variable as a function of the number of observations. I want both axes to be displayed on a percentage basis (e.g. say observations are the number of buyers and the y variable is the amount they bought, buyers are already ranked in descending order, I want to get the plot that says "The top 10% buyers purchased 90% of the total bought"). My dataset is a couple million observations.

What is the best way to do this? Sub-questions:

If I need to add two variables for the quantiles of total observations and total $ bought (so as to use them to plot), what is the object that returns the row number? I tried:

user_quantile <- row(df)/nrow(df)

but I get a matrix of identical columns (user_quantile.1, user_quantile.2) of which I only need one column.

Is there instead any way to skip adding percentages as variables and only have them for axes values?

The plot has way to many points than I need to get the line. What is the best approach to minimize the computational effort and get a nice graph?

Thanks.

+6  A: 

You may want to acquaint yourself with the excellent RSeek search engine for R content. One quick query for Lorentz curve (and Lorenz curve) lead to these packages:

  • ineq: Measuring inequality, concentration, and poverty
  • reldist: Relative Distribution Methods
  • GeoXp: Interactive exploratory spatial data analysis
  • lawstat: An R package for biostatistics, public policy and law

all of which seem to supply a Lorenz curve function.

Dirk Eddelbuettel
A: 

Few times I used GeoXP package as Dirk wrote on last answer.

GeoXp Package

calejero