When you receive a 1.7Gb crash dump from as a bug report there is a very strong suggestion that we've run out of memory and there's a leak somewhere. The question is where, and how to fix it.

The first step was to look at the crash dump in DebugDiag which indeed shows that the C runtime heap is massive.

Heap Name                         msvcr80!_crtheap 
Heap Description                  This heap is used by msvcr80 
Reserved memory                   1.45 GBytes 
Committed memory                  1.45 GBytes (100.00% of reserved)  
Uncommitted memory                60.00 KBytes (0.00% of reserved)  
Number of heap segments           64 segments 
Number of uncommitted ranges      0 range(s) 
Size of largest uncommitted range 0 Bytes 
Calculated heap fragmentation     100.00%

And lists the segments of those heaps.

0x00030640    64.00 KBytes    64.00 KBytes 0 Bytes 0 0 Bytes 0.00% 
0x01340000 1,024.00 KBytes 1,024.00 KBytes 0 Bytes 0 0 Bytes 0.00% 
0x05ef0000     2.00 MBytes     2.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x0b9b0000     4.00 MBytes     4.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x0bfa0000     8.00 MBytes     8.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x0d580000    16.00 MBytes    16.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x10170000     2.81 MBytes     2.81 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x11020000    32.00 MBytes    32.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x14020000    64.00 MBytes    64.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x18020000   128.00 MBytes   128.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x20020000   256.00 MBytes   256.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x30020000   256.00 MBytes   256.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x40020000    96.00 MBytes    96.00 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x46020000    30.38 MBytes    30.38 MBytes 0 Bytes 0 0 Bytes 0.00% 
0x47e80000     9.57 MBytes     9.57 MBytes 0 Bytes 0 0 Bytes 0.00%

Loading the crash dump into WinDBG we can have a look at one of those segments

0:000> dt ntdll!_HEAP_SEGMENT 20020000
   +0x000 Entry            : _HEAP_ENTRY
   +0x008 Signature        : 0xffeeffee
   +0x00c Flags            : 0
   +0x010 Heap             : 0x00030000 _HEAP
   +0x014 LargestUnCommittedRange : 0
   +0x018 BaseAddress      : 0x20020000 
   +0x01c NumberOfPages    : 0x10000
   +0x020 FirstEntry       : 0x20020040 _HEAP_ENTRY
   +0x024 LastValidEntry   : 0x30020000 _HEAP_ENTRY
   +0x028 NumberOfUnCommittedPages : 0
   +0x02c NumberOfUnCommittedRanges : 0
   +0x030 UnCommittedRanges : (null) 
   +0x034 AllocatorBackTraceIndex : 0
   +0x036 Reserved         : 0
   +0x038 LastEntryInSegment : 0x3001ffc8 _HEAP_ENTRY

In the spirit of Raymond Chen's article The poor man's way of identifying memory leaks let's pick a random memory location in the heap and have a look.

0:000> dc 2325d320 L 100
2325d320  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
2325d330  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d340  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d350  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
2325d360  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d370  68616c62 0000002e 001b001b 090a012e  blah............
2325d380  782b8d18 000000bd 000000bd 00000001  ..+x............
2325d390  656d6f53 72726520 6820726f 6f207361  Some error has o
2325d3a0  72756363 20646572 66206f74 74207869  ccurred to fix t
2325d3b0  20736968 20656573 68616c62 6c62202c  his see blah, bl
2325d3c0  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d3d0  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d3e0  6e6f202c 6120796c 75727420 65672065  , blah, blah, bl
2325d3f0  77206b65 646c756f 65707320 7420646e  ah, blah, blah,
2325d400  20656d69 6f636564 676e6964 65687420  blah, blah, blah
2325d410  78656820 6d756420 6c622070 202c6861  , blah, blah, um
2325d420  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d430  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
2325d440  202c6861 68616c62 6c62202c 00000061  ah, blah, bla...
2325d450  001b001b 090a01cb 782b8d18 000000bd  ..........+x....
2325d460  000000bd 00000001 656d6f53 72726520  ........Some Err
2325d470  6820726f 6f207361 72756363 20646572  or has occurred 
2325d480  66206f74 74207869 20736968 20656573  to fix this see 
2325d490  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d4a0  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
2325d4b0  202c6861 68616c62 6e6f202c 6120796c  ah, blah, only a
2325d4c0  75727420 65672065 77206b65 646c756f   true geek would
2325d4d0  65707320 7420646e 20656d69 6f636564   spend time deco
2325d4e0  676e6964 65687420 78656820 6d756420  ding the hex dum
2325d4f0  6c622070 202c6861 68616c62 6c62202c  p blah, blah, bl
2325d500  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d510  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d520  6c62202c 00000061 00000000 00000000  , bla...........
2325d530  2325cc10 2325d544 2325cca4 2325d468  ..%#D.%#..%#h.%#
2325d540  00000002 2325d554 2325d534 2325d5f0  ....T.%#4.%#..%#
2325d550  00000002 2325d564 2325d544 2325d6c8  ....d.%#D.%#..%#
2325d560  00000002 2325d574 2325d554 2325d7a0  ....t.%#T.%#..%#
2325d570  00000002 2325d584 2325d564 2325d878  ......%#d.%#x.%#
2325d580  00000002 2325d594 2325d574 2325d950  ......%#t.%#P.%#
2325d590  00000002 2325d5a4 2325d584 2325da28  ......%#..%#(.%#
2325d5a0  00000002 2325d5b4 2325d594 2325db00  ......%#..%#..%#
2325d5b0  00000002 2325d5c4 2325d5a4 2325dbd8  ......%#..%#..%#
2325d5c0  00000002 2325de54 2325d5b4 2325dcb0  ....T.%#..%#..%#
2325d5d0  00000002 00000000 0016001b 090a01fa  ................
2325d5e0  782b8d18 000000bd 000000bd 00000001  ..+x............
2325d5f0  656d6f53 72726520 6820726f 6f207361  Some error has o
2325d600  72756363 20646572 66206f74 74207869  ccurred to fix t
2325d610  20736968 20656573 68616c62 6c62202c  his see blah, bl
2325d620  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d630  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d640  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
2325d650  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d660  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d670  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
2325d680  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
2325d690  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
2325d6a0  6c62202c 202c6861 68616c62 0000002e  , blah, blah....

A lot of error messages, all the same as well. (And repeating with different segments shows the same - repeated error message strings.)

Playing spot the heap entry header suggests that 001b001b 090a012e is one (sixth line above at address 2325d378), similarly 001b001b 090a01cb, 001b0016 090c01e4 etc. further down. All of these are followed by 782b8d18. So what's that?

0:000> ln 782b8d18
(782b8d18)   mfc80!afxStringManager   |  (782b8d30)   mfc80!_afxSessionMap
Exact matches:
    mfc80!afxStringManager = class CAfxStringMgr

So that's the start of the string, makes sense given the contents after it. It looks we've got a lot of error log messages. If we dig around the source code we find that one places that stores them look like

class CErrors
{
    // snipped - lots of COM interface definitions.

    class Message
    {
        CString error_text_;
        long error_level_;
    public:
        Message(const CString & sMsg, const long errorLevel);
        Message(const Message& rhs);

        // Methods snipped.
    };

    CList<Message, Message &> m_ErrorList;  
    // other data members snipped.
};

So that's what we expect we're looking for, in which case there should a linked list Messages. CList stores it's internal data as

template<class TYPE, class ARG_TYPE = const TYPE&>
class CList : public CObject
{
    struct CNode
    {
        CNode* pNext;
        CNode* pPrev;
        TYPE data;
    };
    // Remaining definition snipped.
};

And stores the CNodes in seperate blocks (so it doesn't have to allocate each one individually - see afxtempl.h in the MFC for source code). So we should see 4 DWORD chunks that are two close together pointers, another (probably fairly close) pointer to a string and a numeric error level. We could either dump some memory and try spotting it by eye (exercise for the reader: spotting the linked list segment in the memory dump above) or searching for a pointer to one of the strings.

Note that although the heap entry started at 2325d378 and the MFC string wrapper started at 2325d380 the pointer that actually corresponds to what the user of the CString sees is

2325d370  6c62202c 00000061 001b001b 090a012e   blah...........
2325d380  782b8d18 000000bd 000000bd 00000001  ..+x............
2325d390  656d6f53 72726520 6820726f 6f207361  Some error has o

So this is what we search for.

0:000> s -d 20020040 30020000 2325d390
2325ccac  2325d390 00000002 00000000 0016001b  ..%#............

That looks plausible as a pointer to the string followed by a 2 which is the warning level for log messages. Inspect the area around the address we've found (2325ccac).

0:000> dc 2325ccac - 2c L 20
2325cc80  00000002 2325cc94 2325cc74 2325d1e0  ......%#t.%#..%#
2325cc90  00000002 2325cca4 2325cc84 2325d2b8  ......%#..%#..%#
2325cca0  00000002 2325d534 2325cc94 2325d390  ....4.%#..%#..%#
2325ccb0  00000002 00000000 0016001b 090a01d6  ................
2325ccc0  782b8d18 000000bd 000000bd 00000001  ..+x............
2325ccd0  656d6f53 72726520 6820726f 6f207361  Some error has o
2325cce0  72756363 20646572 66206f74 74207869  ccurred to fix t
2325ccf0  20736968 20656573 68616c62 6c62202c  his see blah, bl

The start of that looks like what we expect for the linked list structure with node we've found starting at 2325cca4 (and being the last one in that CPlex block).

So having got a node in the list we need to find the main list structure. First step is to find the start of the list via the dlb command (Dump List Backwards). (We set count to the massively large value of 0x130000 because we know there's likely to be a lot in this list.)

0:000> dlb 2325cca4 130000
... Many, many lines snipped ...
0e35ef2c  0e35ef3c 0e35ef1c 11b0d820 00000002
0e35ef1c  0e35ef2c 0e35ef0c 11b0d748 00000002
0e35ef0c  0e35ef1c 0e35eefc 11b0d5e0 00000002
0e35eefc  0e35ef0c 0e35eeec 0e461408 00000002
0e35eeec  0e35eefc 0e35eedc 0e1f16c0 00000002
0e35eedc  0e35eeec 00000000 0e1f1ce0 00000002
0:000>

Check whether the first element is plausible (0x00000002 looks like a message level):

0:000> dc 0e1f1ce0 
e1f1ce0  656d6f53 72726520 6820726f 6f207361  Some error has o
e1f1cf0  72756363 20646572 66206f74 74207869  ccurred to fix t
e1f1d00  20736968 20656573 68616c62 6c62202c  his see blah, bl
e1f1d10  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
e1f1d20  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah
e1f1d30  6c62202c 202c6861 68616c62 6c62202c  , blah, blah, bl
e1f1d40  202c6861 68616c62 6c62202c 202c6861  ah, blah, blah, 
e1f1d50  68616c62 6c62202c 202c6861 68616c62  blah, blah, blah

So what's around 0x0e35eedc?

0:000> dc 0x0e35eedc - 1c L30
0e35eec0  58f88bb4 00000001 0d9fad04 c0000001  ...X............
0e35eed0  00060016 050c019b 00000000 0e35eeec  ..............5.
0e35eee0  00000000 0e1f1ce0 00000002 0e35eefc  ..............5.
0e35eef0  0e35eedc 0e1f16c0 00000002 0e35ef0c  ..5...........5.
0e35ef00  0e35eeec 0e461408 00000002 0e35ef1c  ..5...F.......5.
0e35ef10  0e35eefc 11b0d5e0 00000002 0e35ef2c  ..5.........,.5.
0e35ef20  0e35ef0c 11b0d748 00000002 0e35ef3c  ..5.H.......<.5.

Well the 8 bytes at 0e35eed0 look like a heap block and are aligned correctly for it

0:000> dt ntdll!_HEAP_ENTRY 0e35eed0
   +0x000 Size             : 0x16
   +0x002 PreviousSize     : 6
   +0x000 SubSegmentCode   : 0x00060016 
   +0x004 SmallTagIndex    : 0x9b ''
   +0x005 Flags            : 0x1 ''
   +0x006 UnusedBytes      : 0xc ''
   +0x007 SegmentIndex     : 0x5 ''

We can also confirm this with

0:000> !heap -x 0e35eedc
Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
-----------------------------------------------------------------------------
0e35eed0  0e35eed8  00030000  0d580000        b0        30         c  busy

(So what's the null in front of the 0e35eeec at address 0e35eed8 I (don't) hear you ask? Well the blocks (CPlex structs) that CList stores it's nodes in are themselves a singly reversed linked list, so that null indicates this is the first block of nodes.)

If this is the head of the list then there should be something in the CList structure pointing to it.

0:000> !heap -srch 0e35eedc
... [snipped warning messages] ...
skipping searching 0d3b9cb8 allocation of size 0001af80 greater than 00010000
    _HEAP @ 30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        0bae6790 0010 0000  [01]   0bae6798    00078 - (busy)
          UtilityLibrary!CErrors::`vftable'
skipping searching 0e23cdf8 allocation of size 00013884 greater than 00010000
skipping searching 0e250688 allocation of size 00035b64 greater than 00010000
    _HEAP @ 30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        0e35eed0 0016 0010  [01]   0e35eed8    000a4 - (busy)
skipping searching 0e54c600 allocation of size 00013884 greater than 00010000
... [snipped warning messages] ...

The second of these looks like the second element of the list (which would be right - it's a double linked list). The first is what we are looking for: it's an instance of the error log object.

0:000> dt UtilityLibrary!CErrors 0bae6798
   +0x000 __VFN_table : 0x5f76f104 
   =5f750000 classCObject     : CRuntimeClass
   =5f750000 classCCmdTarget  : CRuntimeClass
   =5f750000 _commandEntries  : [0] AFX_OLECMDMAP_ENTRY
   ... snipped ...
   =5f750000 eventsinkMap     : AFX_EVENTSINKMAP
   +0x004 m_dwRef          : 22
   +0x008 m_pOuterUnknown  : (null) 
   +0x00c m_xInnerUnknown  : 0
   +0x010 m_xDispatch      : CCmdTarget::XDispatch
   ... snipped ...
   +0x038 m_ErrorList      : CList<CErrors::Message,CErrors::Message &>
   ... snipped ...

0:000> dt UtilityLibrary!CList<CErrors::Message,CErrors::Message &> 0bae6798 + 38
   +0x000 __VFN_table      : 0x5f76efcc 
   =5f750000 classCObject  : CRuntimeClass
   +0x004 m_pNodeHead      : 0x0e35eedc CList<CErrors::Message,CErrors::Message &>::CNode
   +0x008 m_pNodeTail      : 0x6076f8d4 CList<CErrors::Message,CErrors::Message &>::CNode
   +0x00c m_nCount         : 6301570
   +0x010 m_pNodeFree      : (null) 
   +0x014 m_pBlocks        : 0x6076f840 CPlex
   +0x018 m_nBlockSize     : 10

And there we have m_pNodeHead pointing exactly to where we expect.

So the cause of the crash is indeed a CError object that is holding far too many error messages.