OK. What next? I decided to write several functions which return a string in various ways. I know that even though this is in release build with all size optizations turned to maximum, that function alignment might be off, so I also decided in the debugger to take a look at the code. I noticed that all the functions were packed close together without any nops in between functions .
First, I will show you just the sizes of the various functions:
- const char *ReturnAConstCharString() { return "test"; } =6 bytes
- Sizeof(String ReturnAString()) { return String("test"); } = 22 bytes
- Sizeof(void FillingAStringReference(String &reference)) { reference = "test"; } = 30 bytes
- Sizeof(auto_ptr
ReturnAutoPtr() { return auto_ptr (new String(""")); } = 43 bytes.
Wow, not what I expected at all. The one that most people think is the most optimized solution (using a reference), apparently takes more space than just returning a string. Naturally I would have expected the const char * version to be the lightest and 6 bytes is extremely light. However one can't do much with such a function, which is the same effect as referencing a static variable.
Looking at the underlying code, only (4) had a loop. So, the auto_ptr, performance wise, would probably be the poorest.
For most practical situations, the best solution is (2). Simply return the String. I wonder if this is always true for all classes? Probably not for the bigger classes however.
The next thing I wanted to investigate is the return load of each call. I mean, are the all the same weight or do they come with an overhead? The requirement of my return overhead functions is that the all return their values into a std::string. I also decided just to count instructions in the debugger, as there was really no other good way of doing it.
- sizeof(string value = ReturnAConstCharString()) = 17 bytes and 7 instructions (2 calls).
- sizeof(string value = ReturnAString()) = 43 bytes and 14 instructions (3 calls).
- sizeof(string value = FillingAStringReference) = 9 bytes and 4 instructions (1 call).
- sizeof(auto value = new auto_ptr
(ReturnAutoPtr())) = 20 bytes and 8 instructions (2 calls).
Now, this starts to paint a more clear picture as the overhead of each method. The truth is that passing a string by reference takes the least overhead at least when being called. So, when a function is referenced frequently, this can save a lot of space when a reference is returned.
- Const char returning: 6 + 17 = 23 bytes.
- Returning a string: 22 + 43 = 55 bytes.
- Fill a reference: 30 + 9 = 39 bytes.
- auto_ptr: 43 + 20 = 63 bytes.
Ok, so, our original assumptions are starting to prove correct. References seems to be outpacing returning a string. This is probably true in terms of performance as well. But what about practically, in a program, which calls each function 5 times? Five seems like a good number for a small program.
- Const char returning: 6 + 5*17 = 91 bytes. 10 calls.
- Returning a string: 22 + 5*43 = 237 bytes. 15 calls.
- Fill a reference: 30 + 5*9 = 75 bytes. 5 calls.
- auto_ptr: 43 + 5*20 = 143 bytes. 10 calls.
Slightly larger programs probably end up calling functions which returns string at least 100 times, but probably contain up to 30 different functions. What would that look like?
- Const char returning: 30*6 + 100*17 = 180 + 17=1876 bytes. 230 calls.
- Returning a string: 30*22 + 100*43 = 660+4300= 4960 bytes. 330 calls.
- Fill a reference: 30*30 + 100*9 = 900 + 930 bytes = 1830. 130 calls.
- auto_ptr: 30*43 + 100*20 = 1290+2000 = 3290 bytes. 230 calls.
Still I find the idea of returning a String so much easier than a reference and frankly adding 3K extra, for this convenience is, in my mind acceptable. However, many programmers may feel that 3K is too much, or the overhead may be too great.
The main reason, why, is that to return a string, actually requires far less typing than all the above options and just for that reason alone, I usually pick this. When speed becomes an issue, I slip back to using references. I have used auto_ptrs in the past, but my feeling is that auto_ptrs are more appropriate for larger classes which exceed 30 bytes. Also not mentioned here is the overhead of allocating memory on the heap. Each call to new is much more costly that to use a stack variable.
Just remember the old adage, "premature optimization is the root of all evil", and you will be just fine.