There are a few good resources out there for troubleshoting crash/hang situations in Windows. Some of the resources I regularly look to include:

One recurring message that comes through is that debuggers do not lie - when you have a crash/hang situation you can spend countless hours on pointless guessing if you are not able to isolate the problem to a particular repeatable action and then determine what components are involved in that action. Take, for example, the following post on an IIS list, where the poster felt that something in their web application was causing their server to crash

I have got a Dual Xeon 3.2 GHZ with 2 GB Ram with Windows Server 2003 Standard Version SP1 (all patched), 2 x NTFS 80GB in RAID1 and it keeps crashing intermittently with the same error each time.

I am using MySQL on the server as the Database and running ASP websites using some ASP.NET. I cannot seem to find any particular action, time or scenario that brings this about and is intermittent. I used to get them a lot more often before I defragged the hard drive on the Data Center’s advice but even today when it crashed and I did an Defrag analysis it said that 2% of files were fragmented and so I cannot understand this error. I do run some large ASP and MySQL scripts that take up to 20 minutes to finish executing that run each day but this happens fine usually and I can run these scripts anytime with no problems but it was running one of these scripts today when it crashed. I have never had ASP crash my server before and so this really bugs me to what could cause this.

ASP pages, and indeed most web-application code would run inside w3wp.exe or dllhost.exe, both of which are user-mode processes, and generally unlikely to cause Windows to blue screen. So what could be causing the problem? By default Windows Server 2003 will write a "small memory dump" file to %systemroot%\minidump when Windows bugcheck's the machine.

Loading one such minidump file into WinDBG and running the kb command shows us the likely culprit straight away:

kd> .bugcheck
Bugcheck code 00000050
Arguments b0f08c38 00000000 b7bd93fb 00000000

kd> kb
ChildEBP RetAddr  Args to Child              
b0e73f24 8085e6cd 00000050 b0f08c38 00000000 nt!KeBugCheckEx+0x1b
b0e73f9c 8088bc08 00000000 b0f08c38 00000000 nt!MmAccessFault+0xb25
b0e73f9c b7bd93fb 00000000 b0f08c38 00000000 nt!KiTrap0E+0xdc
WARNING: Stack unwind information not available. Following frames may be wrong.
b0e74024 b0e74780 890d62b0 85361f04 00000000 pwipf2+0x53fb          <--- OUR LIKELY CULPRIT
b0e7409c b7bd895e b0e74578 00000104 b0e7467c 0xb0e74780
b0e747bc 8081dce5 89661618 85361e70 85361e70 pwipf2+0x495e
b0e747d0 808f8255 b0e74978 8969af00 00000000 nt!IofCallDriver+0x45
b0e748b8 80936af5 8969af18 00000000 850de008 nt!IopParseDevice+0xa35
b0e74938 80932de6 00000000 b0e74978 00000240 nt!ObpLookupObjectName+0x5a9
b0e7498c 808ea211 00000000 00000000 e74a4400 nt!ObOpenObjectByName+0xea
b0e74a08 808eb4ab 87f49bac 02000000 b0e74bb4 nt!IopCreateFile+0x447
b0e74a64 b7b7f58c 87f49bac 02000000 b0e74bb4 nt!IoCreateFile+0xa3
b0e74c28 b7b8804c 85f11b30 890cdc08 b0e74c4c afd!AfdBind+0x2f7
b0e74c38 8081dce5 890d4230 85f11b30 8645c920 afd!AfdDispatchDeviceControl+0x53
b0e74c4c 808f4797 85f11c0c 89a12610 85f11b30 nt!IofCallDriver+0x45
b0e74c60 808f5515 890d4230 85f11b30 89a12610 nt!IopSynchronousServiceTail+0x10b
b0e74d00 808ee0e4 00001748 000011c0 00000000 nt!IopXxxControlFile+0x5db
b0e74d34 80888c6c 00001748 000011c0 00000000 nt!NtDeviceIoControlFile+0x2a
b0e74d34 7c82ed54 00001748 000011c0 00000000 nt!KiFastCallEntry+0xfc
05e0fb34 00000000 00000000 00000000 00000000 0x7c82ed54

psipf2.sys is part of Threat Sentry by PrivacyWare, and when the original poster contacted the vendor, they were able to get an updated version of that driver and their server has been stable since.

In this case, the original poster was heading down the wrong track defragmenting their hard disk and looking for possible issues with ASP or mySQL. On the other hand, breaking out a debugger to look at the .dmp file isolated the most likely cause within a few minutes. Of course, not all debugging is this simple (most of it isn't). In my limited experience, debugging takes a long time because of the need to gather more information to eliminate possible causes and to ensure that the information you are collecting is good (rather than already corrupted by bad code). However, rarely does the debugger lie and lead you down the garden path. Having some rudimentary knowledge of Windows internals and examining dump files are skills that every system administrator should have.