tags:

views:

185

answers:

4

I am trying to convert this C code I have into a python script so it's readily accessible by more people, but I am having problems understanding this one snippet.

int i, t;
for (i = 0; i < N; i++) {
   t = (int)(T*drand48());
   z[i] = t;
   Nwt[w[i]][t]++;
   Ndt[d[i]][t]++;
   Nt[t]++;
}

N is a value (sum of one column from an array. Elemental corrected me).

T is just a numerical value.

z, w, and d are memory allocations created from the N array. They were created with this method.

w = ivec(N);
d = ivec(N);
z = ivec(N);

int *ivec(int n) //
{
   int *x = (int*)calloc(n,sizeof(int));
   assert(x);
   return x;
}

Nwt & Ndt are both arrays too, with each element being a memory allocation? (Not sure). At least, each one of them was created by using the following method, passing in two different int's.

Nwt = dmat(W,T);
Ndt = dmat(D,T);

double **dmat(int nr, int nc) //
{
   int N = nr*nc;
   double *tmp = (double*) calloc(N,sizeof(double));
   double **x  = (double**)calloc(nr,sizeof(double*));
   int r;
   assert(tmp);
   assert(x);
   for (r = 0; r < nr; r++) x[r] = tmp + nc*r;
   return x;
}

So looking at the first loop I posted, what are the following lines doing? I would like to accomplish the same thing in python, but since no memory allocation is needed, not sure what those three lines do, or how I would duplicate it in python.

Nwt[w[i]][t]++;
Ndt[d[i]][t]++;
Nt[t]++;

This is what I have so far:

for i in range(self.N):
        t = self.T * random.random()
        self.z[i] = t
        //** INCORRECT BELOW **
        //self.Nwt[self.N[i]] = t + 1 
        //self.Ndt[i] = t + 1
        //self.Nt[t + 1] += 1
+2  A: 

A suggestion for the Python part of things is to use numpy arrays to represent the matrices (and possibly the arrays too). But to be honest, you should not be concerned with that right now. That C-code looks ugly. Apart from that, different languages use different approaches to achieve the same thing. That is what makes such conversions hard. Try to get an understanding of the algorithm it implements (supposing that is what it does) and write that down in a language-agnostic way. Then think how you would implement that in Python.

Space_C0wb0y
+1  A: 

Nwt and Ndt are 2-dimensional arrays. These lines:

Nwt[w[i]][t]++;
Ndt[d[i]][t]++;

Increment by 1 the value at one of the locations in each of the arrays. If you think of the addressing as array[column][row], then the column is chosen based on the value in some other one-dimensional array w and d (respectively) for the index i. t seems to be some random index.

You don't show what dmat function is doing, so hard to break that one down.

(Can't help you on the Python side, hopefully this helps clarify the C)

quixoto
The dmat function is in the 3rd code snippet I posted in my original code. It's an allocation routine as well. I think that's what's throwing me off. I don't know C, and haven't had to do memory allocation. Thanks for your input, I am going to keep working through it and hopefully someone will be able to help with the python side.
Hallik
Oops, missed that. Yeah, it is just allocating 2d floating point arrays with nr (number rows) and nc (number columns) dimensions. But note that it's doing something tricky where the actual array (x) is a "jump table" of pointers into another array (tmp), which is itself allocated and initialized to all zeroes. (Looks like you should just sit down with a pencil and paper and get a holistic sense of what the data structure is really doing before starting on the Python. I don't envy you-- this is crappy, uncommented C code.
quixoto
Hallik
(Each line.) The statement `Nwt[w[i]][t]++;` finds a single location in the array `Nwt`, and adds one to it. The next line finds a single location in `Ndt`, and adds one to it. The "row" chosen in both arrays is the same (`t`).
quixoto
+1  A: 

Okay you seem to have a few ideas wrong. N is the size of the array.

dmat returns a matrix like thing which is represented by nr row(s) - where each row is an 'array' of nc doubles

ivec returns an 'array' of n integer elements.

So w[] and d[] represent indexes to the array of doubles.

The loop that you are having trouble with is used to increment certain elements of the matrices. One index appears pre-stored in the w and d arrays and the other generated randomly I suspect - with out knowing what the intent of the code is it is a bit difficult to understand the semantics.

Specifically it might help to know: Nwt[x][y]++ means increment (add 1) the matrix element at row x col y

Also must mention that this C code is ugly - no useful naming and no comments, fearless use C's nastiest syntax, really difficult to follow.

Elemental
This script scans documents looking for relationships between words. The original document it starts with has 3 columns. documentID, wordID, and wordCount (how many times that word happened in that document). The variable N is the sum of the the wordCount column.
Hallik
+1  A: 

In your translation, the first thing I would worry about is making sensical variable names, particularly for those arrays. Regardless, much of that translates directly.

Nwt and Ndt are 2D arrays, Nt is a one dimensional array. It looks like you're looping over all the 'columns' in the z array, and generating a random number for each one. Then you increment whichever column was picked in Nwt (row w[i]), Ndt (row d[i]) and Nt. The actual random value is stashed in z.

#Literal translation
for i in range(N):
    t = Random.randint(0,T) #Not sure on this... but it seems likely.
    z[i] = t
    Nwt[w[i]][t] = Nwt[w[i]][t] + 1
    Ndt[d[i]][t] = Ndt[d[i]][t] + 1
    Nt[t] = Nt[t] + 1

#In place of w= ivec(N);
w = [0]*N
d = [0]*N
z = [0]*N

#In place of Nwt = dmat(W,T)
Nwt = [[0.0] * T] * W
Ndt = [[0.0] * T] * D

EDIT: corrected w/d/z initialization from "n" to "N"

Note that there are still some things wrong here, since it looks like N must equal W, and D... so tread carefully.

jkerian
Hallik
w and d are providing indexes into Nwt and Ndt, which are very large tables. Note that we're missing code here that fills w/d with sane values. Since at the moment we're only incrementing the first elements of Nwt and Ndt. (None of the code you've shown will ever have a value for w[i],d[i] other than 0)
jkerian
The beginning of the script starts with has 3 columns from a txt file. documentID, wordID, and wordCount (how many times that word happened in that document). The variable N is the sum of the the wordCount column. Your translation is helping me a lot though! I really appreciate the help!
Hallik
I am still not 100% sure on how it works. I have the value of `D` and `W`, they are both int's. Why are you multiplying by 0.0 in the last 5 elements in your example?
Hallik
dmat produces a "Double MATrix", an array of size W, each element of which is an array of size T. The slightly odd syntax [0.0]*3 expands to the list [0.0,0.0,0.0] in python. If you enclose that in another expansion, you can do things like [[1]*2]*3, which will equal [[1,1],[1,1],[1,1]] (Try it in the python interpreter) Basically I'm using that list expansion trick to replace the ivec and dmat calls with simple python expressions that do the same thing.
jkerian