at least simple SQL statement wont
work (please read problem carefuly) i
need to find sum of all subsets and
check tht sum of elements of the set
.5 or not . thanks – asin Aug 18 at
7:36
Since your data is in Stata, here is the code to do what you ask in Stata (paste this code into your do-file editor):
//input the data
clear
input str10 entity str10 parent_entity value
A001 B001 .10
A001 B002 .15
A001 B003 .2
A001 B004 .3
A002 B002 .34
A002 B003 .13
A002 B111 .56
end
//create a var. for sum of all subsets
bysort entity : egen sum_subset = total(value)
//flag the sets that sum > .5
bysort entity : gen indicator = 1 if sum_subset>.5
recode ind (.=0)
lab def yn 1 "YES", modify
lab def yn 0 "No", modify
lab val indicator yn
li *, clean
Keep in mind that when using Stata, your data is kept in memory so you are limited only by your system's memory resources. If you try to open your .dta file & it says 'op. sys refuses to provide mem', then you need to try to use the command -set mem- to increase your memory to run the data.
Ultimately, StefanWoe's question:
ay you give us an idea of HOW huge the
data set is? Millions? Billions of
records? Also an important questions:
Do you have to do this only once? Or
every day in the future? Or hundreds
of times each hour? – StefanWoe Aug 18
at 13:15
really drives your question more than which software to use...automating this using Stata, even on an immense amount of data, wouldn't be difficult but you you could max your resource limits quickly.
Eric A. Booth | [email protected]