Create an account on the HP Community to personalize your profile and ask a question
11-30-2018 01:15 PM
Do you know Core temp software? I use this software to check CPU's temperture.
Watch red circles!
You can see 6154's Watt is very high than others.
1. Xeon 2530 9.7W
2. Xeon 1650 1.0W
3. Xeon 6154 166.6W
All CPU's condition is normal, they are not simulating anything!
Xeon 6164 RAM watt is abnormally high!
11-30-2018 02:24 PM
The idea of modifying the cooling is only a final solution to make the sysem useable. A system of this is kind must be 100% reliable.
I]m completely mystified as regards very high temperatures that are not related to CPU load. Running the Prime95 stress test on the overclocked E5-1680 v2 (4.3GHz on all 8-cores) and over four hours, the highest temperature- for only a few seconds here and there, was 73C and mostly it ran at 64-70C. That CPU is rated to 85C. Long runs at 95C on a CPU rated for 82C is a worry for the longevity of the CPU's.
I appreciate more detail about the failure mode and yes, it occurred to me earlier that the unusual memory configuration of 24 slots for 4X hex-channel is unconventional and without a very long history. Also, 32GB modules are not released for a year, that was February or March 2018.
Typically, when workstations have the mamum RAM it requires a paticular type but certially HP will have specified SDRAM , LRDIMM, or whattever. It may be worth trying the system using one complete memory channel for each CPU, installed according to the proper sequence of placement.
I am very suprised that HP has not carefully analyzed this situation and made a recommendation.
I would be very interested to know how your system performs in benchmarks. There are I think 800,000 systems tested on Passmark Performance Test, but not a single example using the Xeon 6154. If you are inclined, Passmark has a free 30-day trial.
It occurred ot me that you might look into the start -up listings in msconfig. Perhaps there are programs loading at startup and running in the backhground that are using considerable CPU resources.
Difficult situation without error messages.
11-30-2018 03:31 PM
in this case i doubt that it's a cpu issue causing a blue screen/shutdown as the xeons have builtin throttling if the temps get to high and while the high cpu loads during the running task may not help overall, the system was designed and tested by HP as was previously noted to perform 24/7
i would look more towards ram related issues since the system is using a lot of ram, or towards the program itself perhaps calling for more ram than the system can provide, or the program itself may have bug
try running a simulation that does not require/use as much ram, and if the system is stable it would point towards the specific app task or to a borderline ram issue
11-30-2018 08:32 PM - edited 11-30-2018 09:10 PM
Thank you for the detailed temperature information.
Something that is interesting is that all of the systems overall package tempertures are very high: 87C, 101C, and 104C. What is the ambient temperature of the room?
For comparison, here is z620_2 , which is running an E5-1680 v2 overclocked on all 8-cores at 4.3Ghz using z420 liquid cooling:
Note that the extra voltage added to overclock can raise the voltage to 1.451V and maximum wattage is 121.2W, idling at about 20W.
And below is z420_3 Xeon E5-1620 v2 4C@3.7/3.9Ghz, air cooling:
At standard clock speed, the maximum is 35W @ 1.151V, but appears to idle at about 17W.
Both systems are set to maintain Turbo mode or longer periods, but will settle to idle at the base speeds of 3.4GHz for the 1680 v2 and 3.7Ghz for the 1620 v2.
The Xeon 6154 package temperature is very high at a 104C maximum, a temperature one would see in extreme, overclocked gaming processors with a lot of added voltage. However, it is the added voltage when overclocking that can degrade a processor more than the general temperature. The E5-1680 v2 is rated to 1.331V so running at 1.45V is pushing it a bit, but the liquid cooling prevents the microprocessor controller from overheating which is the typical fatal blow to extreme gaming processors.
It's probable that the 6154 is designed to withstand extreme temperatures and as such is configured to run at near maximum performance at all times. If the system is idling at 166W, the high performance mode may maintain the 3.7GHz Turbo clock speed on a certain number of cores all the time- perhaps all of them, so the single thread performance is maintained- excellent for an 18-core CPU. the vrey high single-thread 6154 is an excellent choice for simulation- the best of single-threading and multi-core applications. On Passmark the average single thread mark is 2122. For comparison, I see there is an E5-1650 v4 system and that has among the highest standard single thread performance for any Xeon E5. On Passmark the 1650 v4 t is rated at 2181 and is 6-core. The overclocked E5-1680 v2 averages 2101 but at 4.3 runs at 2368. The 6154 has a lot of cores to produce heat, but in compensation, LGA3647 has a large surface area to dissipate the heat.
However, the unusually high idling temperatures for all three systems aside, I am still not sure of cause of the Z8 rebooting as the rebooting appears to be unrelated to system activity. This returns to the idea of memory problems, perhaps some kind of page or indexing error. Servers place as much as possible in RAM as the access to the processor is 1000X+ faster than the disk and that is extrmely valuable in a simulation system to avoid swapping to disk- ever if possible. Perhaps when a large memory bank is at very low saturation, it is distributed and returns a page tear error. This is the reason to consider trying the system with reduced RAM, perhaps trying with only one module per CPU. It may be valuable to temporarily disable the automatic reboot - when the system will not be used remotely- so as to record some error messages.
Windows perhaps is not designed as well for controlling so much RAM as for example SQLServer. I don't know really anything about it at all except that it does have a lot of control over buffer, caching, and page parameters to maintain data flow in large amounts of RAM.
Is HP still working on this problem? I agree with the earlier statement that continuing unexplained, random failures of a system of that level and cost mean that there can not be confidence over a long period and should justify a return for refund.
12-02-2018 08:31 PM
Thank you your CPU's temperature examples!
I want to know how to solve this problem!
Where should I contact about this problems?
Where's the governmental organization about recall?
I would like to send this issue directly to the HP service or technical department.
Be alert for scammers posting fake support phone numbers and/or email addresses on the community.
If you think you have received a fake HP Support message, please report it to us by clicking on "Flag Post".
Didn't find what you were looking for? Ask the community