Rare condition in api code causing heap corruption



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 31/10/2012 at 11:19, xxxxxxxx wrote:

    User Information:
    Cinema 4D Version:   R10-11.5 
    Platform:   Windows  ;   
    Language(s) :     C++  ;

    ---------
    Hi,

    we have located an issue in c4d_memory.cpp that can cause a heap corruption by calling free() on memory allocated with C4DOS.Ge->Alloc() (where C4DOS.Ge->Free() should be called instead).

    It's very rare.

    We have a fix (workaround) for it, too.

    Is it possible to discuss this with a developer via mail or should I just post the details here?

    I basically want to know if it's safe to ship plugins with that change to c4d_memory.cpp and if there would be a possibility for an official fix.

    Cheers!

    Yves



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 31/10/2012 at 13:32, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    <ADDRESS>
    User Information:
    Cinema 4D Version:   R10-11.5 
    Platform:   Windows  ;   
    Language(s) :    
    C++  ;

    ---------
    </ADDRESS> Hi,

    we have located an issue in c4d_memory.cpp that can cause a heap corruption by calling free() on memory allocated with C4DOS.Ge->Alloc() (where C4DOS.Ge->Free() should be called instead).

    It's very rare.

    We have a fix (workaround) for it, too.

    Is it possible to discuss this with a developer via mail or should I just post the details here?

    I basically want to know if it's safe to ship plugins with that change to c4d_memory.cpp and if there would be a possibility for an official fix.

    Cheers!

    Yves

    Feel free to discuss this here. Using free() for memory allocated with GeAlloc or gNew is a bug (the OS doesn't know about the internal data structures that we use for memory management and we don't know about their internal data structures); to quote the SDK documentation ("Advice for Developers") :

    "Only use CINEMA 4D's memory model (gNew, gDelete(), GeAlloc(), GeFree()) - this will give maximum speed and stability."

    Best regards,

    Wilfried Behne



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 31/10/2012 at 17:51, xxxxxxxx wrote:

    Ok, here's what happens:

    Cinema 4D will abruptly crash, without the usual dialog indicating that C4D has crashed and that a crash report has been written, while our plugin is performing a task.

    Ok, here's what happens:

    Cinema 4D will abruptly crash, without the usual dialog indicating that C4D has crashed and that a crash report has been written, while our plugin is performing a task.

    What the problem seems to be:

    The problem seems to be a rare heap corruption caused by the way the C4D API keeps track of if memory was allocated using C4D's memory management (e.g. C4DOS.Ge->Alloc and consorts) or by the default (e.g. malloc). It occurs in C4D R10 (possibly previous versions, too, but we don't support these) and R11, as well as possibly R11.5 and newer versions but I haven't seen it happen there.

    What's happening exactly:

    Looking at c4d_memory.cpp/.h, you can see that C4D overrides new and delete also has a way of keeping track of when C4D's memory management is available and what was allocated with it. This whole part revolves around (this is for anyone else interested. If you are familiar with how it works, you can skip this part) :

    #define     STDLIB_MEM_MAGIC          -1   
    static Bool     stdlib_mem_used = FALSE;                                                            // changed to TRUE if static constructors have allocated memory before c4d's memory management was available
    

    t_C4DOS, which is NULL as long as C4D's mem management isn't available and

    void *AlienMem(size_t s, Bool clear)   
    {   
         if (s<1) s=1;   
      
         void *p = malloc(s+sizeof(VLONG));   
         if (!p) return NULL;   
            
         if (clear) memset(p,0,s+sizeof(VLONG));   
         *(VLONG * ) p = STDLIB_MEM_MAGIC;   
         p = (void * ) ((UCHAR * )p + sizeof(VLONG));   
         stdlib_mem_used = TRUE;                                                                                     // static constructor has allocated memory   
         return p;   
    }
    

    As you can see stdlib_mem_used is set to TRUE "if static constructors have allocated memory before c4d's memory management was available" (which is, when you are using the stl, pretty much unavoidable).

    AlienMem() is called by the overridden new and delete operators for as long as t_C4DOS is NULL. If a block of memory has been allocated with with AlienMem(), it extends the requested ammount of memory by sizeof(VLONG). The first few extra bytes are set to STDLIB_MEM_MAGIC so these blocks of memory can be identified and it then returns the pointer to after the leading extra VLONG.

    Why this causes a heap corruption (very, very rarely) :

    After quite some time spent on debugging this we have found that sometimes, for unknown reasons (as far as we could tell, randomly), free() is called for when deleting something (usually inside an std::vector::reserve call).
    Calling free() on memory allocated by C4D's memory management is pretty bad and causes a heap corruption.

    Before we come to the why, here's the interesting bit of a log file where we logged most memory allocation and deallocations:

    [] Allocated with C4DOS.Ge->Alloc() : 37bfaca0
    [] Freeing with C4DOS.Ge->Free() : 37a0fef0
    [] Allocated with C4DOS.Ge->Alloc() : 37e9ad30
    [] Freeing with C4DOS.Ge->Free() : 37b7ee80
    Allocated with C4DOS.Ge->Alloc() : 38143cb0
    Freeing with C4DOS.Ge->Free() : 381242c0
    [] Allocated with C4DOS.Ge->Alloc() : 37d43680
    [] Freeing with C4DOS.Ge->Free() : 37bfaca0
    [] Allocated with C4DOS.Ge->Alloc() : 37ef9820
    [] Freeing with C4DOS.Ge->Free() : 37e9ad30
    Allocated with C4DOS.Ge->Alloc() : 3820bce0
    Freeing with free() : 38143cb0
    (crash when free is called)

    Note that even though the block with the adress 38143cb0 was allocated by C4DOS.Ge->Alloc() it is being freed with free().

    For real now, why?:

    We think that this is caused by the way the identifying of AlienMem blocks is working:

    void operator delete(void *p)   
    {   
         if (p)   
         {   
              void     *temp = p;   
      
              if (stdlib_mem_used)                                                                                     // memory allocated by static constructors?   
              {   
                   if (((VLONG* )p)[-1]==STDLIB_MEM_MAGIC)                         // is this a block for the stdlib?   
                   {   
                        free((void* )((UCHAR* )p - sizeof(VLONG)));   
                        return;   
                   }   
              }   
              C4DOS.Ge->Free(temp);   
         }   
    }
    

    if (((VLONG* )p)[-1]==STDLIB_MEM_MAGIC)
    detects AlienMem blocks by checking if the preceding VLONG is equal to STDLIB_MEM_MAGIC. If this is the case it correctly calls free() instead of C4DOS.Ge->Free().

    But what if the preceding VLONG happens to be equal to STDLIB_MEM_MAGIC by pure chance? free() is called when it should not be called and causes the heap corruption. In the session the above log exerpt is from, it took about 2.5 million deallocations to happen.

    The bug is relatively reproducible when you do a lot of small memory allocations of ramdom size - or during some regular mesh processing in a very complex scene (which is usually where it happens in our case).

    Our workaround:

    We are currently working around this in a similar way as the C4D API keeps track of AlienMem blocks. We allocate one extra VLONG for every allocation (the actual allocation is passed on to C4DOS.Ge->Alloc(), etc.) and set this extra preceding VLONG to 0, to make sure it can never be STDLIB_MEM_MAGIC by chance.

    Here is the code:

    void* ourC4DOSAlloc(size_t s, int line,const CHAR *file)   
    {   
         // Alloc an extra VLONG with the mem   
         size_t use_size = s + sizeof(VLONG);   
         VLONG* mem = (VLONG* )C4DOS.Ge->Alloc(use_size, line, file);   
      
         // Write a zero to the start of the allocated mem. The important thing is that the value we write is != STDLIB_MEM_MAGIC   
         mem[0] = 0;   
      
         // Return the address of element one as the actual mem   
         return mem + 1;   
    }   
      
      
    void* ourC4DOSAllocNC(size_t s, int line,const CHAR *file)   
    {   
         // Alloc an extra VLONG with the mem   
         size_t use_size = s + sizeof(VLONG);   
         VLONG* mem = (VLONG* )C4DOS.Ge->AllocNC(use_size, line, file);   
      
         // Write a zero to the start of the allocated mem. The important thing is that the value we write is != STDLIB_MEM_MAGIC   
         mem[0] = 0;   
      
         // Return the address of element one as the actual mem   
         return mem + 1;   
    }   
      
      
    void ourC4DOSFree(void *p)   
    {   
         VLONG* actual_mem_start = (VLONG* )p - 1;   
      
         C4DOS.Ge->Free(actual_mem_start);   
    }
    

    All calls to C4DOS.Ge->Alloc(), C4DOS.Ge->AllocNC() and C4DOS.Ge->Free in c4d_memory.cpp are replaced with calls to our wrappers.

    This fixes the problem for us.

    Cheers!

    Yves



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 01:10, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    [...]
    But what if the preceding VLONG happens to be equal to STDLIB_MEM_MAGIC by pure chance? free() is called when it should not be called and causes the heap corruption. In the session the above log exerpt is from, it took about 2.5 million deallocations to happen.

    The bug is relatively reproducible when you do a lot of small memory allocations of ramdom size - or during some regular mesh processing in a very complex scene (which is usually where it happens in our case).

    There are a few problems with your workaround:

    - the magic is either in the state of STDLIB_MEM_MAGIC or, if it is allocated by our code, set to a special magic value & size info and some other stuff. If this turns into -1 for a memory block allocated by our memory allocator, someone has overwritten it -> you've a memory trasher.

    - setting the values in "our" memory management structure to 0 as you do, not only avoids calling free(), but also avoids the memory being freed by our memory management code (the data is being checked before it is freed - at least from r11.5 on I can tell you for sure), if this memory is passed to the kernel or a different module/plugin and later on should be freed without using your changed free code -> there is a high potential for creating memory leaks.

    I suggest: try to find out where this memory trasher happens (hard to say if this is in our old code, your code or in the STL code).

    Please note that the internal memory management code has been changed partly with r11 and completely with r11.5.

    Best regards,

    Wilfried



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 04:00, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    I suggest: try to find out where this memory trasher happens (hard to say if this is in our old code, your code or in the STL code).

    Please note that the internal memory management code has been changed partly with r11 and completely with r11.5.

    I've tried but this is the worst and most evasive bug I have ever had to deal with. I've added range checks to big areas of the code to make sure we aren't just writing to the wrong location. Also it *only* happens in R10 with the one scene we are able to reproduce it with.

    Also, if there were any memory trasher, it would certainly also trash the extra 0 we are writing. I have run it with the change, and not just a few times.

    Some more info:

    The memory directly before the block of memory causing the crash is:

    ((VLONG* )p)[-2]: 0                  (this is just 0)   
    ((VLONG* )p)[-1]: ffffffffffffffff   (this is the -1)
    

    Usually it would look somewhat like this:

    ((VLONG* )p)[-2]: 2e8afe60   
    ((VLONG* )p)[-1]: 2e8c9bb0   
    Freeing with C4DOS.Ge->Free() : 2e8c7af0
    

    Interestingly, this doesn't seem to be rare either:

    ((VLONG* )p)[-1]: 0   
    ((VLONG* )p)[-2]: 0   
    Freeing with C4DOS.Ge->Free() : 2e90b870
    

    And when running in debug mode ((VLONG* )p)[-1] is often unitialised:

    ((VLONG* )p)[-2]: 0   
    ((VLONG* )p)[-1]: baadf00dbaadf00d   
    Freeing with C4DOS.Ge->Free() : 2b97e100
    

    Whatever would be trashing the memory certainly wouldn't write 0xbaadf00d...



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 04:39, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    • setting the values in "our" memory management structure to 0 as you do, not only avoids calling free(), but also avoids the memory being freed by our memory management code (the data is being checked before it is freed - at least from r11.5 on I can tell you for sure), if this memory is passed to the kernel or a different module/plugin and later on should be freed without using your changed free code -> there is a high potential for creating memory leaks.

    Oh, I'd also like to point out that our workaround doesn't write values in "your" memory managements structure. It is allocating an extra sizeof(VLONG) bytes for that and, as far as C4D's memory management should be concerned, everything is as it should be.
    It also doesn't avoid calls to free(). They still happen when they need to. It is just making sure they only happen when they need to.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 04:57, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    Originally posted by xxxxxxxx

    I suggest: try to find out where this memory trasher happens (hard to say if this is in our old code, your code or in the STL code).

    Please note that the internal memory management code has been changed partly with r11 and completely with r11.5.

    I've tried but this is the worst and most evasive bug I have ever had to deal with. I've added range checks to big areas of the code to make sure we aren't just writing to the wrong location. Also it *only* happens in R10 with the one scene we are able to reproduce it with.

    Also, if there were any memory trasher, it would certainly also trash the extra 0 we are writing. I have run it with the change, and not just a few times.

    Some more info:

    The memory directly before the block of memory causing the crash is:

    ((VLONG* )p)[-2]: 0                  (this is just 0)   
    ((VLONG* )p)[-1]: ffffffffffffffff   (this is the -1)
    

    Usually it would look somewhat like this:

    ((VLONG* )p)[-2]: 2e8afe60   
    ((VLONG* )p)[-1]: 2e8c9bb0   
    Freeing with C4DOS.Ge->Free() : 2e8c7af0
    

    Interestingly, this doesn't seem to be rare either:

    ((VLONG* )p)[-1]: 0   
    ((VLONG* )p)[-2]: 0   
    Freeing with C4DOS.Ge->Free() : 2e90b870
    

    And when running in debug mode ((VLONG* )p)[-1] is often unitialised:

    ((VLONG* )p)[-2]: 0   
    ((VLONG* )p)[-1]: baadf00dbaadf00d   
    Freeing with C4DOS.Ge->Free() : 2b97e100
    

    Whatever would be trashing the memory certainly wouldn't write 0xbaadf00d...

    0xbaadf00d is something that 's written by Microsofts memory code (see http://en.wikipedia.org/wiki/Magic_number_(programming)) to indicate uninitialized heap in release mode. I can't tell you how the memory code in R10 and before worked (the code was written years before R10 and I haven't written it; can't remember its details anymore), but I think this points to a situation where memory which was freed is still used (hard to say what causes it; we won't investigate this for a version that is out of support for half a decade). Are you sure you catched all free calls of this block (even the ones from STL, ...)?

    As mentioned, the memory management code you use since r11/r11.5 is very different and I'd suggest to avoid your workaround in newer versions due to the potential pitfalls.

    Best regards,

    Wilfried



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 05:08, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    0xbaadf00d is something that 's written by Microsofts memory code (see http://en.wikipedia.org/wiki/Magic_number_(programming)) to indicate uninitialized heap in release mode.

    Yes, I am aware of that.

    Originally posted by xxxxxxxx

    I can't tell you how the memory code in R10 and before worked (the code was written years before R10 and I haven't written it; can't remember its details anymore), but I think this points to a situation where memory which was freed is still used (hard to say what causes it; we won't investigate this for a version that is out of support for half a decade). Are you sure you catched all free calls of this block (even the ones from STL, ...)?

    I certainly hope so :)
    The ones from the STL all seem to go through the overridden new and delete operators.

    Originally posted by xxxxxxxx

    As mentioned, the memory management code you use since r11/r11.5 is very different and I'd suggest to avoid your workaround in newer versions due to the potential pitfalls.

    Best regards,

    Wilfried

    We will be very careful deploying this workaround since it is some rather major bit surgery. For now we would only test it out in a R10 build to see if it solves this problem for all users who have run into it.

    Thanks for your reply.

    Kind regards,
    Yves



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 05:25, xxxxxxxx wrote:

    Forgot to mention. The baadf00d came up when running the plugin is compiled in debug mode, but not running in the debugger.

    After the memory memory block preceded by 0xbaadf00dbaadf00d is freed it is correctly set to 0xfeeefeeefeeefeee (including the data before the start of the memory block, which probably contains the C4D mem management data) :

    from Visual Studio (after C4DOS.Ge->Free(p)) :

    p                    - 0x00000000249f0b20 - void *   
    ((VLONG* )p)[3]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[2]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[1]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[0]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[-1]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[-2]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[-3]     - 0x00000000254ddc60 - __int64   
    ((VLONG* )p)[-4]     - 0x00000000002e0158 - __int64
    

    Edit: and the memory blocks containing 0xbaadf00dbaadf00d *always* come from std::vector::reserve, going through the overridden new and delete operators.



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 05:54, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    Forgot to mention. The baadf00d came up when running the plugin is compiled in debug mode, but not running in the debugger.

    After the memory memory block preceded by 0xbaadf00dbaadf00d is freed it is correctly set to 0xfeeefeeefeeefeee (including the data before the start of the memory block, which probably contains the C4D mem management data) :

    from Visual Studio (after C4DOS.Ge->Free(p)) :

    p                    - 0x00000000249f0b20 - void *   
    ((VLONG* )p)[3]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[2]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[1]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[0]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[-1]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[-2]     - 0xfeeefeeefeeefeee - __int64   
    ((VLONG* )p)[-3]     - 0x00000000254ddc60 - __int64   
    ((VLONG* )p)[-4]     - 0x00000000002e0158 - __int64
    

    Edit: and the memory blocks containing 0xbaadf00dbaadf00d *always* come from std::vector::reserve, going through the overridden new and delete operators.

    Could be, that you're dealing with a Microsoft STL bug (you're using VS 2005 I guess?). Google for it...

    E.g:

    http://blogs.stonesteps.ca/showpost.aspx?pid=16
    http://connect.microsoft.com/VisualStudio/feedback/details/694663
    http://stackoverflow.com/questions/1615518/vector-resize-function-corrupting-memory-when-size-is-too-large

    Best regards,

    Wilfried



  • THE POST BELOW IS MORE THAN 5 YEARS OLD. RELATED SUPPORT INFORMATION MIGHT BE OUTDATED OR DEPRECATED

    On 01/11/2012 at 05:58, xxxxxxxx wrote:

    Originally posted by xxxxxxxx

    Could be, that you're dealing with a Microsoft STL bug (you're using VS 2005 I guess?). Google for it...

    E.g:

    http://blogs.stonesteps.ca/showpost.aspx?pid=16
    http://connect.microsoft.com/VisualStudio/feedback/details/694663
    http://stackoverflow.com/questions/1615518/vector-resize-function-corrupting-memory-when-size-is-too-large

    Best regards,

    Wilfried

    We are using vs2010, currently, but I have tried it with vs2008 and the same thing happens.

    Thank you vey for the hint, tho. I will have a look at this.

    Regards,
    Yves


Log in to reply