views:

360

answers:

5

Porting code from 32bit to 64bit. Lots of places with

int len = strlen(pstr);

These all generate warnings now because strlen() returns size_t which is 64bit and int is still 32bit. So I've been replacing them with

size_t len = strlen(pstr);

But I just realized that this is not safe, as size_t is unsigned and it can be treated as signed by the code (I actually ran into one case where it caused a problem, thank you, unit tests!).

Blindly casting strlen return to (int) feels dirty. Or maybe it shouldn't?
So the question is: is there an elegant solution for this? I probably have a thousand lines of code like that in the codebase; I can't manually check each one of them and the test coverage is currently somewhere between 0.01 and 0.001%.

+1  A: 

You can treat site_t signed safely in most cases. The unsigned size_t will be treated as negative only when it (or the intermediate results in expressions) is bigger then 2^31 (for 32-bit) or 2^63 for 64 bit.

UPDATE: Sorry, size_t will be unsafe in constructions like while ( (size_t)t >=0 ). So right answer is to use ssize_t.

osgx
I meant the case where I then decrement the len to a point where it becomes negative. Like in a loopwhile (len > 0)
MK
loop `while (len>0)` should stop at `len == 0`.Please, show to us your example, problem in which was detected with unit tests.
osgx
Blah, sorry, I meant if (len < 0). I had a loop with that inverse check of "if (len < 0) skip something;" instead of "if (len >= 0) do something;"
MK
+4  A: 

Setting the compiler warnings to the maximum level should get you a nice report of every incorrect sign conversion. In gcc, '-Wall -Wextra' should do.

You can also use a static code analyzer like cppcheck to see if everything is right.

pau.estalella
and -wall will find all the places where size_t is being used in a signed context. You really should be using size_t
pm100
+1  A: 

As a compromise, you could use ssize_t (if available). Fake it up if not, using long long, int_fast64_t, intmax_t, or have a platform porting header which lets a suitable type be specified for a platform. ssize_t is in POSIX not standard C or C++, but if you ever hit a platform which doesn't have a signed type of the same size as size_t then I sympathise.

A cast to int is nearly safe (assuming 32 bit int on your 64 bit platform, which seems reasonable), because a string is unlikely to be more than 2^31 bytes long. A cast to a larger signed type is even safer. Customers who can afford 2^63 bytes of memory are what's known in the trade as "a good problem to have" ;-)

Of course, you could check it:

size_t ulen = strlen(pstr);
if (ulen > SSIZE_MAX) abort(); // preferably trace, log, return error, etc.
ssize_t len = (ssize_t) ulen;

Sure there's an overhead, but if you have 1000 instances then they can't all be performance-critical. For the ones which are (if any), you can do the work to investigate whether len being signed actually matters. If it doesn't, switch to size_t. If it does, rewrite or just take a risk on never meeting an object that absurdly huge. The original code would almost certainly have done the wrong thing anyway on the 32bit platform, if len had been negative as a result of strlen returning a value bigger than INT_MAX.

Steve Jessop
I agree that cast to int is nearly safe, but I don't understand what's the point of ssize_t: it is also *nerly* safe. It is slightly safer than int, but still -- size_t can be bigger than ssize_t.
MK
@MK, `ssize_t` must be equal size to `size_t`
osgx
@MK: I think the general intention of `ssize_t` is that in practice, POSIX implementations aren't going to allow individual objects greater than half the size of the available address space. It's easy enough to enforce this is `malloc`, although I don't think it's guaranteed. It's useful to have a signed size type for representing offsets which are allowed to be negative.
Steve Jessop
@osgx: by "bigger" he means that `SIZE_MAX > SSIZE_MAX`, so a value might be bigger. Not that the type is bigger.
Steve Jessop
+3  A: 

You could use ssize_t (the signed variant of size_t).

caf
A: 

Some time ago I posted a short note about this kind of issues on my blog and the short answer is:

Always use proper C++ integer types

Long answer: When programming in C++, it’s a good idea to use proper integer types relevant to particular context. A little bit of strictness always pays back. It’s not uncommon to see a tendency to ignore the integral types defined as specific to standard containers, namely size_type. It’s available for number of standard container like std::string or std::vector. Such ignorance may get its revenge easily.

Below is a simple example of incorrectly used type to catch result of std::string::find function. I’m quite sure that many would expect there is nothing wrong with the unsigned int here. But, actually this is just a bug. I run Linux on 64-bit architecture and when I compile this program as is, it works as expected. However, when I replace the string in line 1 with abc, it still works but not as expected :-)

#include <iostream>
#include <string>
using namespace std;
int main()
{
  string s = "a:b:c"; // "abc" [1]
  char delim = ':';
  unsigned int pos = s.find(delim);
  if(string::npos != pos)
  {
    cout << delim << " found in " << s << endl;
  }
}

Fix is very simply. Just replace unsigned int with std::string::size_type. The problem could be avoided if somebody who wrote this program took care of use of correct type. Not to mention that the program would be portable straight away.

I’ve seen this kind of issues quite many times, especially in code written by former C programmers who do not like to wear the muzzle of strictness the C++ types system enforces and requires. The example above is a trivial one, but I believe it presents the root of the problem well.

I recommend brilliant article 64-bit development written by Andrey Karpov where you can find a lot more on the subject.

mloskot