tags:

views:

139

answers:

2

in an effort to solve question #3367795 here on SO i have to cope with a number of subproblems. one of these is: in said algorithm (levenshtein distance), several arrays are allocated in memory and initialized with the lines

cdef char   *m1     = <char *>calloc(   blen + 2,    sizeof( char ) )
cdef char   *m2     = <char *>calloc(   blen + 2,    sizeof( char ) )
cdef char   *m3     = <char *>malloc( ( blen + 2 ) * sizeof( char ) )
#.........................................................................
for i from 0 <= i <= blen:
  m2[ i ] = i
  <...snip...>

blen here refers to the length of a Python bytes variable. now as far as i understand the algorithm (see my original post for the full code) and as the code for the initialization of m2 clearly shows, these arrays are meant to hold integer numbers, not characters, so one would think the correct allocations should look like

cdef int    *m3     = <int *>malloc( ( blen + 2 ) * sizeof( int ) )

and so on. can anyone with a background in C elucidate to me why char is used? also, maybe more for people inclined to Cython, why is there a cast <char *>? one would think that char *x = malloc( ... ) should suffice to define x.

+8  A: 

Despite the misleading name, char types in C language are ordinary integral types, just like short, int, long and such. Of all integral types, chars have the smallest range and occupy the smallest amount of memory. So, if in your application it is important to save as much memory as possible, it might make sense to use char instead of int.

On some hardware platforms it might turn out that int types work faster than char types, so the selection of the specific type becomes a speed-vs-memory trade-off, but, once again, in many cases when the range of char is naturally sufficient, it might make more sense to use char instead of int.

AndreyT
Just a minor side note: C99's int_fast8_t type (and derivatives) leaves the mentioned speed vs. size trade-off to the implementation. Usually it gives you the fastest type with a minimum of 8 bits (one byte on most systems, i.e. a 'char') available.
RWS
at least *if* this is meant as a small integer one should never, never ever use `char` as the type, since its signedness is undefined and might vary from platform to platform. Use `signed char`, `unsigned char` or better the mentioned `int_fast8_t` or `int8_T` etc.
Jens Gustedt
+1  A: 

Quite simply, to save memory -- but please note carefully that declaring these arrays as char limits the result distance to either 127 or 255, depending on whether the C compiler defaults to signed char or unsigned char respectively. In C, char is an integer type -- you don't need an ord() to get its integer value.

Your original code contains no mention of this limitation. Note that if a char overflows, it does so silently and the code will produce incorrect results -- 127 + 1 -> -128 (signed); 255 + 1 -> 0 (unsigned).

You didn't respond to my comment on your original question: """What are the (a) maximum (b) average sizes of your strings? Do you really need to do the whole O(M*N) thing if the two strings are nothing like each other?""" ..... Please answer that now (edit your question); had you done so then, you would have had this question answered then.

Update: Reading the original post again, I've noticed a problem: The code that reads

m1, m2 = m2, m1
strcpy( m3, m2 )

is WRONG on three grounds: (1) it doesn't shuffle the rows properly (should do strcpy() before swapping m1 and m2) (2) strcpy() will not copy anything beyond the first null (zero byte) (3) there is no need to copy anything, just shuffle the pointers

m3, m2, m1 = m2, m1, m3
John Machin
thx; your comment is not visible to me. i have added an edit to answer this point.
flow
@flow: It IS visible to you (1) at the end of the comments you will see "show 1 more comment" (2) you would have been notified of a new comment by the little envelope icon at the top of the page next to your name being highlited
John Machin
you're right. this is so subtle.
flow
@flow: What precisely am I right about? What is so subtle? Are you talking about SO's comment visibility or the strcpy error or both?
John Machin
i meant the half-greyed smallish "show 1 more comment" is 'subtle'. thx for your code fixes, i already incorporated them. right now running tests to see whether that gives correct results.
flow