tags:

views:

1612

answers:

6
+4  A: 

2nd code is indeed "cleaner", but with a project of the size that the article is about, it is ridiculous to think refactoring like that is at best useless, at worst error prone.

However this kind of refactoring doesn't inflate an .Exe size form 1 to 2 cds

Clement Herreman
+25  A: 

OIC, this is a source-code-churn issue

At first glance the two forms are equivalent. The second one does look nicer but they do the same thing.

But then I read the cited page.

The problem is that the new guy churned the source tree, lots of it. It's bad form to troll through a giant source tree and make a meaningless change. Sure, perhaps one style is slightly better than another, but in practice, it should be a whole lot better before putting 1,000 deltas into a source code control system for people to wade through for eternity is justified.

I suspect that this was a source release, or some other unmentioned complexity caused the editing of that many files to expand their distribution. The contributions to that site are edited quite a bit, but basically the issue is understandable without specifics.

One of the problems with editing a zillion files for a style change is that the chance of an inadvertent error increases. This chance is greatly multiplied when a junior developer does it. Even for the experienced, there is Murphy's law to consider. If it happens right before a release it really is a hanging offense.

DigitalRoss
Better answered than i did !
Clement Herreman
I agree. You mentioned source code churning. Although I have never worked in such a shop, I have heard rumors of organizations that measure productivity with metrics such as how many files are checked into the source control system. If this person worked in such a shop (and I am not assuming that he did) he could receive lots of credit for minimal work.
Anthony Gatlin
A: 

Yes, the second code is cleaner, but depending on the compiler it can lead to emitting more machine code. This is entirely compiler-dependent, but the point of the WTF article is that in the second case the compiler would allocate a copy of string/integer value for each code snippet like that and in the first case it would do so only once per program.

sharptooth
No it won't. In both cases, the literal string will appear once in the data segment on program startup (taking ~15 bytes of space in the .EXE file), and will be copied to the stack when that function is called, so there will be 2 in-memory copies at that time. The WTF author got confused. (The point about making widespread no-benefit changes still stands of course.)
j_random_hacker
+1  A: 

Um, re read the article :)

The real WTF was that he touched the entire solution with these kinds of changes when he was supposed to fix a memory leak.

Also doing a change like this wouldn't matter much except potentially breaking/introducing bugs in other, maybe more complicated files, than the example one.

Makach
+15  A: 

Depending on the compiler and compiler options, initialization like this

char data_string[15] = "data data data";

results in a lot of move instructions to copy the literal data to stack.

Calling strcpy requires less instructions.

Doing this kind of thing all over a large codebase can increase the binary size significantly.

And of course, he was not spending his time on adding any value.

laalto
@DigitalRoss: Yes, I get that and didn't emphasize it since it was already covered by other answers. I concentrated on the code size issue. Examining the assembly output of various compilers reveals that there can be significant code size differences between the two init styles.
laalto
@laalto: As you can see from my own post, i was not able to reproduce that result. which compilers showed a difference?
TokenMacGuy
@TokenMacGuy: For example, i686-apple-darwin8-gcc-4.0.1
laalto
@laalto: strange, I looked at generated code too, but it seemed roughly equivalent.
DigitalRoss
I would be very surprised if there is a substantial difference, since the semantics are almost identical. (Unlike say `char *data_string = "data data data";`, which really *is* a different operation entirely as it involves no data copying.)
j_random_hacker
@j_random_hacker: `strcpy` emits a call to the library function, whereas the array assignment has the copy instructions inlined
rpetrich
@reptrich: That's one possibility, but either can do either. E.g. a common MSVC++ optimisation replaces calls to certain library functions like `strcpy()` and `memcpy()` with inlined "intrinsics"; and the compiler is free to call an internal library function to perform the in-place array initialisation.
j_random_hacker
+3  A: 

I can't get different behavior out of this. I tried it with LLVM: I had to add just a little bit of cruft at the return value so that LLVM doesn't optimize anything away, but the generated code for wtf and wtf2 are totally identical. This wtf is BAAAAAD

Input

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int wtf(int X) {
  int x;
  char data_string[15];
  x = 2;
  strcpy(data_string,"data data data");
  return 5*X+x+ data_string[X];
}
int wtf2(int X) {
  int x = 2;
  char data_string[15]="data data data";
  return 5*X+x+ data_string[X];
}
int main(int argc, char **argv) {
  printf("%d\n", wtf(atoi(argv[1]))+wtf2(atoi(argv[1])));
}

Output:

; ModuleID = '/tmp/webcompile/_3856_0.bc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i386-pc-linux-gnu"
@.str = internal constant [15 x i8] c"data data data\00"     ; <[15 x i8]*> [#uses=3]
@.str1 = internal constant [4 x i8] c"%d\0A\00"  ; <[4 x i8]*> [#uses=1]

define i32 @wtf(i32 %X) nounwind readnone {
entry:
    %0 = mul i32 %X, 5  ; <i32> [#uses=1]
    %1 = getelementptr [15 x i8]* @.str, i32 0, i32 %X  ; <i8*> [#uses=1]
    %2 = load i8* %1, align 1  ; <i8> [#uses=1]
    %3 = sext i8 %2 to i32  ; <i32> [#uses=1]
    %4 = add i32 %0, 2  ; <i32> [#uses=1]
    %5 = add i32 %4, %3  ; <i32> [#uses=1]
    ret i32 %5
}

define i32 @wtf2(i32 %X) nounwind readnone {
entry:
    %0 = mul i32 %X, 5  ; <i32> [#uses=1]
    %1 = getelementptr [15 x i8]* @.str, i32 0, i32 %X  ; <i8*> [#uses=1]
    %2 = load i8* %1, align 1  ; <i8> [#uses=1]
    %3 = sext i8 %2 to i32  ; <i32> [#uses=1]
    %4 = add i32 %0, 2  ; <i32> [#uses=1]
    %5 = add i32 %4, %3  ; <i32> [#uses=1]
    ret i32 %5
}

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
    %0 = getelementptr i8** %argv, i32 1  ; <i8**> [#uses=1]
    %1 = load i8** %0, align 4  ; <i8*> [#uses=1]
    %2 = tail call i32 @atoi(i8* %1) nounwind readonly  ; <i32> [#uses=2]
    %3 = getelementptr [15 x i8]* @.str, i32 0, i32 %2  ; <i8*> [#uses=1]
    %4 = load i8* %3, align 1  ; <i8> [#uses=1]
    %5 = sext i8 %4 to i32  ; <i32> [#uses=1]
    %tmp2 = mul i32 %2, 10  ; <i32> [#uses=1]
    %6 = shl i32 %5, 1  ; <i32> [#uses=1]
    %7 = add i32 %6, 4  ; <i32> [#uses=1]
    %8 = add i32 %7, %tmp2  ; <i32> [#uses=1]
    %9 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([4 x i8]* @.str1, i32 0, i32 0), i32 %8) nounwind  ; <i32> [#uses=0]
    ret i32 undef
}

declare i32 @atoi(i8*) nounwind readonly

declare i32 @printf(i8*, ...) nounwind
TokenMacGuy
I'm a little surprised that LLVM in 'non optimising' mode removes the call to strcpy.
Pete Kirkham
This is llvm with default optimization. If i had turned optimization off completely, the result wouldn't be realistic.
TokenMacGuy