tags:

views:

823

answers:

17

when they are represented in memory are objects C++ objects the same as C structs.

With C I could do something like this:

struct myObj {
       int myInt;
       char myVarChar;
};

int main() {
       myObj * testObj = (myObj *) malloc(sizeof(int)+5);
       testObj->myInt = 3;
       strcpy((char*)&testObj->myVarChar, "test");
       printf("String: %s", (char *) &testObj->myVarChar);
}

I don't think C++ allows overloading the + operator for the built in char * type. so i'd like to create my own lightweight string class which has no extra overhead std::string has. I think std::string is represented contiguously

(int)length(char[])data

I want exactly the same functionality but without prefixing the length (saving 8 bytes overhead)

Here is the code i'm using to test, but it results in a segfault

#include <iostream>
using namespace std;
class pString {
    public:
     char c;
     pString * pString::operator=(const char *);
};


pString * pString::operator=(const char * buff) {

    cout << "Address of this: " << (uint32_t) this << endl;
    cout << "Address of this->c: " << (uint32_t) &this->c << endl;

    realloc(this, strlen(buff)+1);
    memcpy(this, buff,  strlen(buff));
    *(this+strlen(buff)) = '\0';

    return this;
};

struct myObj {
     int myInt;
     char myVarChar;
};

int main() {

    pString * myString = (pString *) malloc(sizeof(pString));
    *myString = "testing";
    cout << "'" << (char *) myString << "'"; 
}


Edit: nobody really understands what i want to do. Yes i know i can have a pointer to the string in the class but thats 8 bytes more expensive than a plain cstring, i wanted exactly the same internal representation. Thanks for trying though


Edit: The end result of what i wanted to achieve was being able to use the + operator with NO extra memory usage compared to using strcat etc

const char * operator+(const char * first, const char * second);
+2  A: 

You cannot change the size of an object/struct in either C or C++. Their sizes are fixed at compile time.

anon
+2  A: 

when they are represented in memory are objects C++ objects the same as C structs.

Strictly speaking, no. In general, yes. C++ classes and structs are identical in memory layout to C structs except:

  • Bit fields have different packing rules
  • Sizes are fixed at compile time
  • If there are any virtual functions, the compiler will add a vtable entry to the memory layout.
  • If the object inherits a base class, the new class' layout will be appended to the base class layout, including vtable, if any.

I don't think C++ allows overloading the + operator for the built in char * type. so i'd like to create my own lightweight string class which has no extra overhead std::string has. I think std::string is represented contiguously

You can create a operator+ overload for the char* type. Normal behavior is pointer arithmetic. std::string overloads operator+ to append char* data to the string. The string is stored in memory as a C string, plus additional information. The c_str() member function returns a pointer to the internal char array.

In your C example, you're relying on undefined behavior. Don't realloc like that. It can result in Bad Things - namely bizarre segfaults.

Your C++ example is also doing Bad Things in doing realloc(this). Instead, you should carry a char* and get a new char[] buffer to store the chars in instead of a realloc(). Behavior for such a realloc is undefined.

greyfade
+7  A: 
struct myObj {
   //...
   char myVarChar;
};

This won't work. You either need a fixed size array, a pointer to char or use the struct hack. You won't be able to assign a pointer to this myVarChar.

so i'd like to create my own lightweight string class which has no extra overhead std::string has.

What extra overhead are you referring to? Have you tested and measured to see if std::string is really a bottleneck?

I think std::string is represented contiguously

Yes, mostly, the character buffer part. However, the following:

(int)length(char[])data

is not required by the standard. Translated: A string implementation need not use this particular layout of its data. And it may have additional data.

Now, your lightweight string class is frought with errors:

class pString {
public:
    char c; // typically this is implementation detail, should be private
    pString * pString::operator=(const char *); 
    // need ctors, dtors at least as well
    // won't you need any functions on strings?
};

Try something along the lines of the following:

/* a light-weight string class */
class lwstring { 
  public:
     lwstring(); // default ctor
     lwstring(lwstring const&); // copy ctor
     lwstring(char const*); // consume C strings as well
     lwstring& operator=(lwstring const&); // assignment
     ~lwstring(); // dtor
     size_t length() const; // string length
     bool empty() const; // empty string?
  private:
     char *_myBuf;
     size_t _mySize;
};
dirkgently
+2  A: 

There is a lot a wrong with your class definition/usage. If you want to store a string you should use a pointer type, like char* a member, not an individual char. Using a single char means that only a single character of memory is allocated.

Another mistake is the allocation code where you do a realloc on this - you can potentially change the memory allocated, but not the value of this. You must assign the result to this to achieve this (this = (*pString)realloc(this, strlen(buff+1));) and that is really bad practice anyway. Much better to use realloc on a char* member.

Unfortunately C++ proper has no alternative for realloc or expand and you must instead use new and delete, doing any copying yourself.

jheriko
+2  A: 

Why do you write in C with classes, why don't use C++?

Paul
A: 

If you want something that is basically the same as std::string except that it doesn't know how long the string is, you should learn how std::string works, what operator overloads it has, etc. and then mimic that, with just the differences you want.

There is unlikely to be any real point to this, however.

Regarding your latest update - you say you want a design in which general application code will be passing around naked pointers to heap objects. With no automatic cleanup.

This is, quite simply, a very bad idea.

Daniel Earwicker
+14  A: 

You should not waste your time writing string classes - there's a reason people spent time writing them in the first place and it's naive to think they wrote them because they wanted to create big obfuscated and overheaded code that you could easily improve in a matter of hours.

For example your code has quadratic complexity for memory reallocations in the assignment operator - each assignment of a sting greater by 1 character will use a new memory block greater by 1 byte resulting in big memory fragmentation after a "few" assignments like this - you save a few bytes but potentialy lose megabytes to address space and memory page fragmentation.

Also designing this way you have no way of efficiently implementing the += operator as instead of just copying the appended string in most cases you will always need to copy the whole string - thus reaching quadratic complexity again in case you append small strings to one bigger one a few times.

Sorry but your idea looks to have great chances of becoming terrible to maintain and orders of magnitude less efficient then the typical string implementations like std::string.

Don't worry - this is true for practicaly all great ideas of "writing your own better version of a standard container" :)

RnR
A: 

If you want performance, you can write your class like this:

template<int max_size> class MyString
{
public:
   size_t size;
   char contents[max_size];

public:
   MyString(const char* data);
};

Initialize max_size to an appropriate value under context. In this way the object can be created on stack, and no memory allocation is involved.

It is possible to create what you desired by overloading new operator:

class pstring
{
public:
    int myInt;
    char myVarchar;

    void* operator new(size_t size, const char* p);
    void operator delete(void* p); 
};

void* pstring::operator new(size_t size, const char* p)
{
    assert(sizeof(pstring)==size);
    char* pm = (char*)malloc(sizeof(int) + strlen(p) +1 );
    strcpy(sizeof(int)+pm, p);
    *(int*)(pm) = strlen(p);  /* assign myInt */
    return pm;
}

void pstring::operator delete(void* p)
{
    ::free(p);
}


pstring* ps = new("test")pstring;

delete ps;
Glitch
Note that many std::string implementations automatically perform this optimization for short strings.
Daniel Earwicker
A: 
 #include <iostream>
    using namespace std;
    class pString {
        public:
            char c[1];
            pString * pString::operator=(const char *);
    };


    pString * pString::operator=(const char * buff) {

        cout << "Address of this: " << (uint32_t) this << endl;
        cout << "Address of this->c: " << (uint32_t) &this->c << endl;

        realloc(this->c, strlen(buff)+1);
        memcpy(this->c, buff,  strlen(buff));
        *(this->c+strlen(buff)) = '\0';

        return this;
    };

    struct myObj {
            int myInt;
            char myVarChar;
    };

    int main() {

        pString * myString = (pString *) malloc(sizeof(pString));
        *myString = "testing vijay";
        cout << "'" << ((char*)myString << "'";
    }


This should work. But its not advisable.
Warrior
Seg fault on my machine
Ben Reeves
+4  A: 

Wow. What you're trying to do is a complete abuse of C++, would be totally compiler dependent if it worked, and would surely land you in TheDailyWTF some day.

The reason you're getting a segfault is probably because your operator= is reallocating the object to a different address, but you're not updating the myString pointer in main. I hesitate to even call it an object at this point, since no constructor was ever called.

I think what you're trying to do is make pString a smarter pointer to a string, but you're going about it all wrong. Let me take a crack at it.

#include <iostream>
using namespace std;
class pString {
    public:
        char * c;
        pString & operator=(const char *);
        const char * c_str();
};


pString & pString::operator=(const char * buff) {

    cout << "Address of this: " << (uint32_t) this << endl;
    cout << "Address of this->c: " << (uint32_t) this->c << endl;

    c = (char *) malloc(strlen(buff)+1);
    memcpy(c, buff,  strlen(buff));
    *(c+strlen(buff)) = '\0';

    return *this;
};

const char * pString::c_str() {
    return c;
}

int main() {

    pString myString;
    myString = "testing";
    cout << "'" << myString.c_str() << "'";

}

Now I wouldn't use malloc but new/delete instead, but I left this as close to your original as possible.

You might think you are wasting the space of a pointer in your class, but you aren't - you're trading it for the pointer you previously kept in main. I hope this example makes it clear - the variables are the same size, and the amount of additional memory allocated by malloc/realloc is the same as well.

pString myString;
char * charString;
assert(sizeof(myString) == sizeof(charString));

P.S. I should point out that this code still needs a lot of work, it's full of holes. You need a constructor to initialize the pointer, and a destructor to free it when it's done, just for starters. You can do your own implementation of operator+, too.

Mark Ransom
It's pretty close to what i'm looking for, but not exactly. sizeof(pString) is still going to be 8 even for a blank string and pString-Cstring is not internally the same e.g. printf("%s", (char*)myPstring); won't work.
Ben Reeves
sizeof(char *) is going to be 8 for a blank string as well. You can create an operator char *() if you want automatic conversion of the class to a pointer, but I'd make it operator const char *() instead. Explicit c_str() is safer and matches std::string.
Mark Ransom
Oh, I assume you are using a 64-bit compiler when you say sizeof(pString) will be 8; if it's 32 bit, sizeof(pString) will be 4. There's only one data member, and it's a pointer.
Mark Ransom
A: 

don't mind the lack of const correctness, as this is a mock up, but how about this:

class light_string {
public:
    light_string(const char* str) {
        size_t length = strlen(str);
        char*  buffer = new char[sizeof(size_t) + length + 1];

        memcpy(buffer, &length, sizeof(size_t));
        memcpy(buffer + sizeof(size_t), str, length);
        memset(buffer + sizeof(size_t) + length, 0, 1);

        m_str = buffer + sizeof(size_t);
    }

    ~light_string() {
        char* addr = m_str - sizeof(size_t);
        delete [] addr;
    }

    light_string& operator =(const char* str) {
        light_string s = str;
        std::swap(*this, s);

        return *this;
    }

    operator const char*() {
        return m_str;
    }

    size_t length() {
        return
            *reinterpret_cast<size_t *>(m_str - sizeof(size_t));
    }

private:
    char* m_str;
};


int main(int argc, char* argv[]) 
{
    cout<<sizeof(light_string)<<endl;

    return 0;
}
+1  A: 

You are moving the "this" pointer. Thats not going to work. I think what you really want is just a wrapper around a buffer.

Sanjaya R
A: 
#include <iostream>
using namespace std;
class pString {
public:
    char c;
    pString * pString::operator=(const char *);
};

pString * pString::operator=(const char * buff) {

    cout << "Address of this: " << (uint32_t) this << endl;
    cout << "Address of this->c: " << (uint32_t) &this->c << endl;

    char *newPoint = (char *)realloc(this, strlen(buff)+1);
    memcpy(newPoint, buff,  strlen(buff));
    *((char*)newPoint+strlen(buff)) = '\0';

    cout << "Address of this After: " << (uint32_t) newPoint << endl;

    return (pString*)newPoint;
};

int main() {

    pString * myString = (pString *) malloc(sizeof(pString));
    *myString = "testing";

    cout << "Address of myString: " << (uint32_t) myString << endl;

    cout << "'" << (char *) myString << "'";    
}

Works When realloc doesn't re-assign the pointer i.e.

Address of this: 1049008 Address of this->c: 1049008 Address of this After: 1049008 Address of myString: 1049008 'testing'

Works, but when the the following happens it fails

Address of this: 1049008 Address of this->c: 1049008 Address of this After: 1049024 Address of myString: 1049008 ''

the obvious solution is to have

this = (pString*) newPoint;

But the compiler complains about an invalid lvalue in assignment. Does anyone the correct way to update this (just for completeness, i doubt i'll use the code since everyone seems to hate it). Thanks

Ben Reeves
Don't try to update "this"; use myString = (*myString = "testing"); in main. But see my answer for a better way, which doesn't use any more memory than your solution.
Mark Ransom
+2  A: 

I do not think 'this' works the way you think it works.

Specifically, you cannot reallocate this to point to a larger buffer in a member function, because whatever called that member function still has a pointer to the old 'this'. Since it's not passed by reference there is no way that you can update it.

The obvious way around that is that your class should hold a pointer to the buffer and reallocate that. However, reimplementing a string class is a good way to give yourself lots of headaches down the line. A simple wrapper function would probably accomplish what you wanted (assuming "being able to use the + operator with NO extra memory usage compared to using strcat" is really what you wanted):

void concatenate(std::string& s, const char* c) {
    s.reserve(s.size() + strlen(c));
    s.append(c);
}

There's some probability that append may do that internally anyway though.

Peter
A: 

This code is a mess and RnR and others suggested is not advisable. But it works for what i want it to do:

#include <iostream>
using namespace std;

struct pString {
     /* No Member Variables, the data is the object */ 
     /* This class cannot be extended & will destroy a vtable */
    public:
     pString * pString::operator=(const char *);
};

pString& operator+(pString& first, const char *sec) {


     int lenFirst;
     int lenSec = strlen(sec);
     void * newBuff = NULL;

     if (&first == NULL)
     {
      cout << "NULL" << endl;
      lenFirst = 0; 
      newBuff = malloc(sizeof(pString)+lenFirst+lenSec+1);
     } else {
      lenFirst = strlen((char*)&first);
      newBuff= (pString*)realloc(&first, lenFirst+lenSec+1);
     }

     if (newBuff == NULL)
     {
      cout << "Realloc Failed"<< endl;
      free(&first);
      exit(0);
     }  

     memcpy((char*)newBuff+lenFirst, sec, lenSec);
     *((char*)newBuff+lenFirst+lenSec) = '\0';


     cout << "newBuff: " << (char*)newBuff << endl;

     return *(pString*)newBuff;

};


pString * pString::operator=(const char * buff) {

    cout << "Address of this: " << (uint32_t) this << endl;

    char *newPoint = (char *)realloc(this, strlen(buff)+200);
    memcpy(newPoint, buff,  strlen(buff));
    *((char*)newPoint+strlen(buff)) = '\0';

    cout << "Address of this After: " << (uint32_t) newPoint << endl;

    return (pString*)newPoint;
};


int main() {

    /* This doesn't work that well, there is something going wrong here, but it's just a proof of concept */

    cout << "Sizeof: " << sizeof(pString) << endl;

    pString * myString = NULL;

    //myString = (pString*)malloc(1);
    myString = *myString = "testing";
    pString& ref = *myString;


    //cout << "Address of myString: " << myString << endl;

    ref = ref + "test";
    ref = ref + "sortofworks" + "another" + "anothers";


    printf("FinalString:'%s'", myString);

}
Ben Reeves
How are you going to manage the deallocation of the pString objects? You seem like the kind of person who enjoys tracking down memory leaks! :)
Daniel Earwicker
I'm sure i can cobble something together :)
Ben Reeves
bk1e
This went from bad to worse. TheDailyWTF material for sure. I see you're now adding an extra 200 bytes to your allocation - so much for wanting extra efficiency. Heaven help you if you try to append 201 characters to your string.
Mark Ransom
+200 is a type, vtable is not needed if there are no virtuals
Ben Reeves
A: 

What you want to do doesn't and cannot work in C++. What you are looking for is the C99-feature of flexible arrays. This works nice in C99 for two reasons, first you don't have build-in constructors and second you don't have inheritance (at least not as a language feature). If a class inherits from another the memory used by the subclass is packed by hind the memory of the parent class, but a flexible array needs to be at the end the structure/class.

class pString {
    char txt[];
}

class otherString : pString { // This cannot work because now the
    size_t len;               // the flexible array is not at the
}                             // end

Take std::string it was written by experts of C++, I'm sure they didn't leaved out a "good trick" without a reason. If you still find out that they don't perform very well in your programm, use plain C strings instead, of course, they don't provide the sweet API, you want.

quinmars
A: 

You can't realloc C++ objects. As others pointed out this is not really a pointer you can modify, there's no guarantee that it will be pointing to an area realloc has access.

One solution to concatenation is to implement a class hierarchy that will defer the real concatenation until it is needed.

Something like this

class MyConcatString;
class MyString {
public:
  MyString(const MyConcatString& c) {
    reserve(c.l.length()+c.r.lenght());
    operator = (l);
    operator += (r);   
  }
  MyConcatString operator + (const MyString& r) const {
    return MyConcatString(*this, r);
  }
};

class MyConcatString {
public:
  friend class MyString;
  MyConcatString(const MyString& l, const MyString& r):l(l), r(r) {};
  ...
  operator MyString () {
    MyString tmp;
    tmp.reserve(l.length()+r.length());
    tmp = l;
    tmp += r;
    return tmp;
  }
private:
  MyString& l;
  MyString& r;
}

So if you have

MyString a = "hello";
MyString b = " world";
MyString c = a + b;

Will turn into MyString c = MyConcatString(a, b);

For more detail check "The C++ Programming language".

Other solution, is to wrap char* inside a struct, but you seem to no like this idea.

But whatever solution you will choose, objects in C++ can't be relocated.

Ismael