views:

855

answers:

7

When compiling a 64bit application, why does strlen() return a 64-bit integer? Am i missing somthing?

I understand that strlen() returns a size_t type, and by definition this shouldn’t change, but... Why would strlen need to return a 64-bit integer?

The function is designed to be used with strings. With that said:

Do programmers commonly create multi-gigabyte or multi-terabyte strings? If they did, wouldn’t they need a better way to determine the string length than searching for a NULL character?

I think this is ridiculous, in fact, maybe we need a StrLenAsync() function with a callback just to handle the ultra long process for searching for the NULL in the 40TB string. Sound stupid? Yea, well strlen() returns a 64-bit integer!

Of course, the proposed StrLenAsync() function is a joke.

+16  A: 

It looks like, when compiling for a 64-bit target, size_t is defined as 64-bit. This makes sense, since size_t is used for sizes of all kinds of objects, not just strings.

Steven Sudit
Totally understood, but isnt that alot of overhead for a fucntion which will likely never see a return value over the max 32-bit unsigned integer?
NTDLS
For the difference between two pointers to be exact. And strlen is just that.
Marco van de Voort
That's a bit like saying that a 32-bit size_t has 16 bits of overhead because most strings are well under 64k. :-)
Steven Sudit
True, very true.
NTDLS
That's like saying 'bool' has at least 1 bit(s) of overhead, because strings are often empty. -- sorry, couldn't resist
Aaron
Well, more like saying bool has 7 bits of overhead because the expense of bitpacking with ands and ors exceeds the space savings.
Steven Sudit
size_t is 64 bits on 64-bit machines. On such machines, there is no overhead, because a register is 64 bits, and usually the return value is in a register. It would actually be a waste to define size_t as 32 bits.
Jared Oberhaus
@NTDLS: Writing code that only behaves well in situations someone thinks are "likely" is a frequent cause of bugs - and worse yet, security flaws. Even after it's been exploited countless times, a lot of people still think it's "unlikely" that someone would type raw SQL into a browser URL and swipe a database full of credit card numbers.
Bob Murphy
@Jared: Well, there's no overhead in the register, but if it winds up in a variable then it will use twice as much RAM. Is this an issue? On the one hand, it means eating through the cache sooner. On the other, 64-bit processors can handle tons of RAM. I'd call it a wash.
Steven Sudit
For the difference between two pointers (which can be negative), `ptrdiff_t` is used. `size_t` is used for non-negative things.
Johannes Schaub - litb
+6  A: 

On a 64-bit app, it's definitely possible to create a 5GB string.

The spec is not intended to keep you from doing stupid things.

Even if it wasn't needed, it wouldn't be worth changing the specification of strlen away from using a size_t just to make the return value 4 instead of 8 bytes.

Tim Sylvester
It's also possible in a 32 bit app to create a 5GB string. It just can't be mapped into 32 bit address space at once, so strlen would have to be kind of clever about this, which it isn't. See the following interesting article for details: http://blogs.msdn.com/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx
OregonGhost
The strlen function operates on a pointer, assuming that the string follows in contiguous memory. A 32-bit pointer cannot represent a string larger than 4G (minus whatever space the O/S reserves) *in memory*. While there are certainly several ways to represent strings larger than the address space, they are irrelevant to strlen because of the assumptions built into its specification.
Tim Sylvester
+1  A: 

Well, 1) size_t is a typedef and varies with architectures and 2) Wouldn't it make sense to have the largest integer as a return value? Why 32 bits? Why not 16? It's 64 on your machine because that's the max string length possible.

Tyler
A: 

strlen() have to use return type that can represent the size of the largest object in the allocation model.

You could use std::string. Its size_type is equal to the allocator's size_type. So if you will create your own allocator then std::string::size() could use even char as return type.

Thanks to remark in comments. std::string is just a specialization of the std::basic_string. Sure you should use std::basic_string with custom allocator.

Kirill V. Lyadvinsky
You can't (in standard C++) change the allocator for std::string: it's a typedef, not a template. You have to use basic_string.
Steve Jessop
Surely it is, I was hurried.
Kirill V. Lyadvinsky
+5  A: 

Here's a chart which shows the size of some basic types in the most common datamodels:

         ILP32 LP64 LLP64 ILP64
char       8     8     8     8
short     16    16    16    16
int       32    32    32    64
long      32    64    32    64
long long 64    64    64    64
pointer   32    64    64    64
size_t    32    64    64    64

The 32-bit Windows datamodel is ILP32 and the 64-bit Windows datamodel is LLP64.

Nick
+1 for the big picture view, and a nice chart
Steven Sudit
I hoped it might be helpful. I'm involved in porting a very large C++ codebase to 64-bit at the moment, so I'm living and breathing this stuff right now.
Nick
Yea, very nice chart. I saved a copy.
NTDLS
+2  A: 

It's not about whether anybody will actually make a string that size. By convention, ALL return types that indicate the number of bytes something occupies in memory are size_t.

Larry Gritz
+2  A: 

I can think of several applications where a string of 4GB is simply not enough (computational biology, computer forensics are two HUGE ones). Don't assume that because YOU don't do it that nobody else does, either.

San Jacinto
Ohh no, I totally understand that. I'm just saying that you wouldnt want to pass that 4GB+ array of characters to a strlen() function. You just *might* be better off keeping track of its length while your building it.
NTDLS
We don't use 4GB strings in computer forensics. That would be silly.
vy32
You've never indexed an entire hard drive for later examination? How about when a cell phone is taken from a scene? It's easier to index the contents of the SD card than to keep reading from the card over and over. If you are referring to using strlen() to find the length of a 4GB string, then yes that is silly. Otherwise, I don't think I'm the one being silly here...
San Jacinto