cancel
Showing results for 
Search instead for 
Did you mean: 
dv411
Level 3
58 52 6 11
Message 1 of 17
4,920
Flag Post

HP Z800 shuts down on startup - and randomly

HP Recommended
Z800
Microsoft Windows 10 (64-bit)

My otherwise reliable Z800 started having issues that seem to be accelerating lately.

 

Anyone else is having the same symptoms where it reliably shuts down after cold when getting into Windows; and then sometimes shuts down during normal operation? If so, what fixed it, if anything: replacement PSU? Graphics card? mobo?

 

Thanks!

 

Symptoms:

 

  • reliably shutd down on startup from cold - either while getting to the login prompt, or right after loggin in
    • the shutdown is abrupt - i.e. an unexpected one, i.e. not OS-initiated. Nothing in the event logs.
    • don't remember it to ever shut down like that before it gets to the Windows boot process - i.e. never during BIOS set up
    • haven't been able to run hardware diagnostics from a flash drive or a DVD yet - just haven't gotten to it yet
    • the issue started about a year ago - preceded by having to press the power button twice before it powers up
    • if I let it sit at the BIOS boot prompt for 10-15 minutes (i.e. before it gets to the OS), and then startup - it goes through w/o a shutdown
  • occasionally - and more frequently lately - shuts down during normal operation
    • 2-5 hours into uptime, while seemingly doing nothing unusual - reading and browsing - shuts down abruptly

 

The system:

  • Dual 2.4Ghz Xeon CPUs, 18GB RAM, 850W PSU, Quadro CX, boot SSD, four 2TB drives in RAID5
  • Windows 10 Pro
16 REPLIES 16
Z440Roger
Level 5
161 157 12 46
Message 2 of 17
Flag Post
HP Recommended

This is really a long shot but the first that came on my mind was an issue with Wake on LAN. But reading you description points to something else.

 

However, i would have tried to disable Wake on LAN in the BIOS just to rule it out. It is possible to shut down a computer via LAN if that feature is enabled.

titustimuli
Level 1
7 7 0 3
Message 3 of 17
Flag Post
HP Recommended
Hi dv411. Exact same symptoms and same placebo : if i let it run on bios screen for 10 minutes it will boot normally. Note that to animate my screen while waiting i noticea the following :
* cpus temps are steady plus or minus one degree
* ram temp increases in a logarithmic fashion. Unit will usually boot when this temp reaches 43°C
* the link between wait time CANNOT be definitively asserted since i cant isolate any of the phenomenons

The big question is : what hardware failure is this ? (I assume it's one due to the very honorable age of this station).
aldywaani
Level 1
10 8 0 0
Message 4 of 17
Flag Post
HP Recommended

You can try any of these first:

 

  • Try to run on Safe Mode and wait
  • a chance for BIOS update
  • Please do Hardware Diagnostic
  • Take down system to minimum config:
    • Use 2 sticks of RAM (for Dual Proc) or re-seats
    • Dismount RAID disks (did you use SATA or SAS port?)
    • Test with other GPU if possible ) or swap PCIe Slots

 

For me it's like Power failure. So try to boot with only 1 Harddrive (sysdrive) & Test with other GPU available

Other HW failure can be easily indentify with LED & Beep codes, but you have none beeps

0 Kudos
titustimuli
Level 1
7 7 0 3
Message 5 of 17
Flag Post
HP Recommended
Update:
Today is particularly cold and the unit just won't work. After 1 hour of setup screen i couldt get to start. Ram temp was 42 due to very cold room temperature... while it happily exceeds 44 normally. I also noticed that temperatures decreases during boot attempts and that by hot weather it starts straightforward.
Anybody with an ideaa ?
Ps : CPU0 and CPU1 temps are equal at all times and their value is ALWAYS above ambient temp by EXACTLY 7°C. These values are typically 26 and 19.
0 Kudos
titustimuli
Level 1
7 7 0 3
Message 6 of 17
Flag Post
HP Recommended
Thanks aldywaani.

I suspect that too. The idea here is that if I keep it warm, it can work for days. But as soon as I shut it down, it might start or not !!!

Adressing ur suggestions:
No safe mode available. Only automatic repair win10. Bios is up to tges latest version. Im exploring this diagnostic area right now... transferring the software on usb.
0 Kudos
dv411
Author
Level 3
58 52 6 11
Message 7 of 17
Flag Post
HP Recommended

One would think it's the memory - but don't know.

 

P.S. Kudos on catching CPU and memory temps while-u-wait - that didn't occur to me. I'll try the same especially that some of my memory banks have been failing sporadically and getting marked as offline - although that started happening only a couple of months ago while the shutdowns - over a year ago.

 

I still haven't replaced the PSU or the aging GPU (the aging Quadro CX) - been procrastinating on this as the ultimate goal is to replace the entire thing with something a little smaller.

 


@titustimuli wrote:
Hi dv411. Exact same symptoms and same placebo : if i let it run on bios screen for 10 minutes it will boot normally. Note that to animate my screen while waiting i noticea the following :
* cpus temps are steady plus or minus one degree
* ram temp increases in a logarithmic fashion. Unit will usually boot when this temp reaches 43°C
* the link between wait time CANNOT be definitively asserted since i cant isolate any of the phenomenons

The big question is : what hardware failure is this ? (I assume it's one due to the very honorable age of this station).

 

titustimuli
Level 1
7 7 0 3
Message 8 of 17
Flag Post
HP Recommended

Thank you dv411 for the kudo 😄 here's a screeen shot of my monitoring (first 2 or 3 minutes - sorry for quality)

 

Let me share with you some other details about memory cause that might be actually the problem (don't worry my GPU is far older than yours 🙂 and it's a big one ! Quadro5000, if u see what i mean...)

 

I think that it's got really to do with heat. MEMORY temperature, and here's what lets me think it:

  • The only temperature that counts (and move significantly) for startup is RAM temperature
  • When the computer is idling, it might crash more often that when it is loaded with job (heats up the circuit)
  • The setup screen doesn't run those sophisticated power management routines and thus, the memory circuits are all powered regardeless of usage (hopefully !!! otherwise we should bring the hairdryer evey morning)

These old ladies got arthritis !

 

System Temperatures screen shot (startup + 3~ minutes)System Temperatures screen shot (startup + 3~ minutes)

titustimuli
Level 1
7 7 0 3
Message 9 of 17
Flag Post
HP Recommended

OKAY GUYS, I THINK IT'S SOLVED FOR ME... WAITING TO HAµAR FROM YOU !

 

After, cleaning, clensing, brightening, every single part on the computer (it looks brand-new form the inside 🙂 ), unmouting every card, every disk, every cable...

Moving the unit in all possible positions, running multiple configurations of everything, from single ram single disk and nothing else, to the complete installation (I even tried to run the computer upside down !)...

The issue persisted !

 

So I ran through every single test and assessment I heard of : Passed successfully all HP Vision test in torough mode (it takes quite some time !), Passed Ram tests and even passed the BIOS DPS tests (you need to disable RAID/AHCI, switching to IDE and reboot before the option appears in the BIOS menu).

 

Then I ultimately started testing hundreds of BIOS configurations (I missed bricking the motherboard with PCI slot compute 😄 ) Nothing changed except when I changed...

 

MEMORY NODE INTERLEAVE : DISABLED & when you set this (if not displayed already, you'll see a new option) NUMA SPLIT MODE : DISABLED.

 

From now on, reverting all other settings to their normal positions, I was able to reboot Windows 10, 5 times in a row !

I wrote these lines with computer already 'hot', cause it's been running for a couple of hours now, but I'll let it cool and let you know in a few minutes if it can boot like a 'normal' computer : You press the power button and wait for windows welcome screen B-)

titustimuli
Level 1
7 7 0 3
Message 10 of 17
Flag Post
HP Recommended
Well, after some trials, It looks like the cleaning reduced the heat-up time (not required Ram temperatures : it still refuses to boot under 43°C). The hardware part of it can't be cured by settings nor software. More details below : dv411, do you have the same issue as this one ? : at first when unit is cold, I press the power button and hear the fans spinning, the HDD LED blinks shortly, then after 1 second or maybe 1.5 seconds it abruptly shuts down, then powers itself back on, and loop again. This loop continues until it can turn on the monitors and this takes nothing less than 5 minutes of on/off cycles to happen. meanwhile, I CANNOT DO ANYTHING but wait for it to get warm. After that, some heat built inside, the station manages to get through POST and loads some cards BIOS, then starts another on/off cycle that has 2 major differences : I can press F10 and, when it shuts down, I need to press the power button to get it ON again (notice that my BIOS setting after power failure is : OFF to protect the unit). These second cycles take 10+ minutes to complete. THIS improved after cleanup (it wasn't too dirty, but it used to take up to 2 HOURS previously !!!) Then, a new cycle starts (Next level unlocked 🙂 ) During this cycle, the unit allows to boot the recovery environment and performs Windows startup recovery diags (that pass ALL tests and suggests to reboot) but not Windows itself (It shuts down again and requires manual power ON), alternating between a normal boot trial and a recovery boot. Please note here that, unlike in BIOS screens, RAM temperature decreases in a fairly fast fashion when left on the recovery screen. So I often need to interrupt a boot trial and go to BIOS screen, leave it there for 3-5 minutes until it heats over 43°C and try again. Finally, after 5-10 times, windows can FINALLY load. The whole process takes a minimum of 30 minutes to complete. This makes the station nearly useless : Even if I leave it ON, it cools down and crash after some time (never been there to monitor it and the crash doesn't leave a log event). If you want I may share the HP Vision diagnostics screens, but all tests passed without a single error !!! Note : During a boot attempt with logging enabled, it wrote to system events that the crash was due to inaccessible OS files, so I investigated disks and of course, everything was OK. except, maybe a 6% wear ratio on the SDD (system) disk... What do you think of that ?
0 Kudos
Warning Be alert for scammers posting fake support phone numbers and/or email addresses on the community. If you think you have received a fake HP Support message, please report it to us by clicking on "Flag Post".
† The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation