2014-04-01

My experience to trouble shoot a notebook computer powering off suddenly


I like programming (hence debugging) but I am not particular interested in trouble-shooting PC installation issues.

A friend of my wife has recently got a second hand notebook PC for his son.  However, the machine will power off randomly, sometimes as early as 1 minute right after power on.  However, if the PC is is booted in Windows Safe mode, the symptom will disappear.

The PC configuration is as follows:

Model: LG R405-A
Operating System: Windows 7 Enterprise
Video card: ATI Radeon Xpress 1250

Why I highlight the Video card because it was also reported there will be AppCrash (about atiundag,dll) when the Powerpoint slideshow mode is activated (and also within Desktop Window Manager (DWM) when the Windows starts).

Therefore my first impression is about the ATI driver version.  However, Windows Update shows it is already running the latest version.

Unbelieving this fact, I go to the ATI web site (now AMD) to download some new version of drivers.  After rebooting, the AppCrash still appears.  Worst still, the random power off problem also still occurs.

However, I notice that right before the powering off, I can hear a sudden increase of the ventilation fan noise.  In fact, from some internet forums, some people has already said that this symptom may be due to overheating.  Then I install some temperature monitoring software and I find that the CPU temperature rises gradually to nearly 100°C and then the machine will shut down.  Then I realize that the previous so called randomness is just due to my testing pattern.  If the machine is off whole night and then power up, then it can sustain longer until the temperature reaches the threshold.  But if I repeatedly reboot the machine and do not wait for it to cool down, of course the temperature problem is not solved and it can reach the threshold within 1 minute after booting up.

At this stage, I still think the ATI driver is the root cause of temperature.  Then I try to go to the LG web site to try to download the ATI driver.  Because Xpress 1250 is a really old chip, in fact, there is no Windows 7 driver for it.  In the LG web site, the latest version is only for Vista.  I did download to try anyway.

To my surprise, the AppCrash problem is solved immediately with the Vista driver.

However, before I can feel happy, the machine turns off again suddenly!

Then I start to think whether there is any hardware issue (e.g. blocked ventilation or bad cooling agent) that causes the high temperature.  I do not want to dismantle the notebook PC.  So, I resort to use the Windows Power Plan to limit the maximum CPU utilization to 50%.  Okay, then the temperature is only around 85°C and does not crash again even though I run YouTube to play video repeatedly.  It seems that the problem is solved (although not fundamentally but at least superficially).

I try to check if there is any fan control utility that can force the fan to always on (I have already setup the BIOS setting but the fan speed is not high).  When I use the SpeedFan software, I notice suddenly that both cores of the CPU is running at 100%.

When I check the task manager, I find that a process dgen.exe is eating my CPU.  It is a virus.  How come I never think about this?  (At a second thought, although the CPU is at 100%, it is not sluggish and still very responsive.)  Afterwards, I update the anti-virus engine and eradicate any other remaining viruses found in the computer.

Postscript: Although the root cause is traced now, I still cannot understand how come the LG computer is so designed that if the CPU keeps on running at 100% loading, the cooling mechanism is not sustainable to keep the temperature within the working limit.