Help interpreting and acting on dump files

HouseBot Plugin Development Discussions.
Post Reply
dlmorgan999
HouseBot Special Member
Posts: 409
Joined: Tue Jul 13, 2004 9:13 am
Location: Tigard, OR

Help interpreting and acting on dump files

Post by dlmorgan999 »

I'm getting the itch to try and fix the occasional crashes with my HAI plugin. I often see output like this in a dump file:


Code: Select all

====== Begin Dump - Thursday, November 10, 2005 22:06:39 ======
Server Version = 2.22
Exception code: C0000005 ACCESS_VIOLATION
Fault address:  7C92AE22 01:00029E22 C:\WINDOWS\system32\ntdll.dll

Registers:
EAX:20726568
EBX:00000000
ECX:646E6F43
EDX:00D70608
ESI:193AFAE0
EDI:00D70000
CS:EIP:001B:7C92AE22
SS:ESP:0023:0846F98C  EBP:0846FA48
DS:0023  ES:0023  FS:003B  GS:0000
Flags:00010213

Call stack:
Address   Frame
7C92AE22  0846FA48  RtlImpersonateSelf+3A5
77C2C2DE  0846FA90  free+C3
07504453  00000001  omni_strerror+233


====== End Dump ======
I have a couple of challenges to address in trying to fix this. First it doesn't always happen and I can't make it happen on demand. Second I'm not sure what (if anything) the dump output is telling me.

I'm using a library written by someone else to do the communication with the Omni controllers and it does declare omni_strerror. There are only two sections of code where this shows up. One is in the header for the Omni protocol code:

Code: Select all

HAIIMPORT const char *omni_strerror(int err);
and one is in the main Omni protocol code file:

Code: Select all

HAIEXPORT const char *omni_strerror(int err)
{
    switch (err)
    {
        case EOMNIARGUMENT :
            return "Bad omni function argument";
        case EOMNIRESPONSE :
            return "Unexpected response from Omni";
        case EOMNICRC :
            return "Bad CRC from Omni";
        case EOMNIEOD :
            return "End of Omni data";
    }

    return NULL;
}


If there are any developers out there with more experience than me in this area (I'm still fairly new to C++) I'd appreciate any advice on where to look or how to go about troubleshooting this. Thanks!



-- Dave
ScottBot
Site Admin
Posts: 2787
Joined: Thu Feb 13, 2003 6:46 pm
Location: Georgia (USA)
Contact:

Post by ScottBot »

The dump information is mostly helpful when the fault is within the HouseBot code. Unfortunately, since the fault is outside HouseBot it's a bit harder to see.



What probably happens is a memory corruption which then causes an error of some kind. The omni code then tries to report the error and crashes in the error reporting when it tries to free some corrupted memory.



The only thing I can even suggest would be to return "Unknown Error" instead of NULL from omni_strerror() when the error is unknown. It probably doesn't matter, but if the caller is not expecting a NULL to be returned, it could be very dangerous. However, I'm guessing it is only called in a very controlled way where this probably isn't an issue.



These types of memory errors can be very difficult to find, particularly if they are not reproducible. There are tools (like BoundsChecker) that can be used to help find the point of the memory corruption, but it's a little expensive to buy if you're just trying to track down one bug. There are other memory validation tools that have a trial period that you could also try.
Scott
dlmorgan999
HouseBot Special Member
Posts: 409
Joined: Tue Jul 13, 2004 9:13 am
Location: Tigard, OR

Post by dlmorgan999 »

Hi Scott,



Thanks for the response. I figured that this was likely what I would hear but it never hurts to ask. At this point I think I will do a "crash analysis" to see how many of my crashes are attributable to my Omni code and how many are from other sources. It might help me see a pattern and I can also log a support ticket (although I don't expect I'll get much of a response :( ) for the ones that appear to be internal to HouseBot.



-- Dave
Post Reply