Create an account on the HP Community to personalize your profile and ask a question
04-23-2020 02:57 AM
I am having issues with an HP Z420 desktop PC that seems to be freezing under certain usage conditions.
Such use case include:
a) after windows software updates
b) during neural network training sessions (under both Tensorflow 2.1.0 and PyTorch) using NVidia GeForce RTX 2070 GPU (latest CUDA drivers)
c)while using virtualization
Of these, the least occurring is c) as I am working cross-platform in a Linux VM using Oracle's VirtualBox in C++ with Visual Studio Code. Compiling and debugging already are lagging tremendously, but if I have multiple VSC session running it will lag further to the point where the computer (not the VM) is unresponsive.
b) is the most common. Tensorflow/PyTorch are known AI software that use all available resources. So reproducing it is the easiest. Just run a tensorflow training(RNN/CNN) that uses the GPU and leave it work. After some time the browser (development is done in Jupyter Notebook) will become unresponsive and the computer just crashes.
Furthermore, if you lock the computer (because training can take hours and you can't do much in parallel as TF uses all resources) there's a high chance you won't be able to unlock it afterwards. At best your training is interrupted.
If you are doing training on the CPU, the issue is not reproduce-able.
As for a) it's enough to leave do software updates over night and by morning it's frozen
From start I tell you that:
a) the power source is HP approved, a factory model, has not been changed not tampered with (it's actually that good that I have another Z420 unit sporting 2 RTX2070 GPUs under Linux without any power or freezing issues)
b) the RAM memory is factory model, not tampered with
c) no overclocking operations have been applied to the CPU for the sake of more computational power (I don't find it ethical)
d) all these issues are reproduce-able only under Windows 10, no other Windows version has been tried
Any help support would be more than useful.
04-23-2020 06:41 AM - edited 04-23-2020 06:46 AM
Can you provide the system specification?
In my view, the BIOS version is a first item to eliminate as a problem. Second would be to evaluate the core count / clock speed, thirdly sufficient RAM; for example, the VM may need to significantly increase it's RAM allotment; it may for example require the maximum 64GB RAM, and third would be disk speed. This may a system in which a fast M.2 OS/Programs drive and cache disk would make an important difference.
From my extremely limited knowledge of Tensorflow (= I've never used it), it may be that the system will be limited in multi-tasking capabilities. Tensorflow appears to well multi-threaded as purpose-configured systems .e.g. Titan Computers, start with 10 cores and rise to 20 cores, as well as take advantage of GPU processing.
This is not say that a z420 can not be configured to run Tensorflow well. I've had two z420's with very strong performance.
Consider running Passmark Performance Test- they have a 30-day free trial, to see where the system sits in relative performance. The results can reveal the weak links.
HP z620_2 (2017) (R7) > Xeon E5-1680 v2 (8C@ 4.3GHz) / z420 Liquid Cooling / 64GB (HP/Samsung 8X 8GB DDR3-1866 ECC registered) / Quadro P2000 5GB _ GTX 1070 Ti 8GB / HP Z Turbo Drive M.2 256GB AHCI + Samsung 970 EVO M.2 NVMe 500GB + HGST 7K6000 4TB + HP / HGST Enterprise 6TB / Focusrite Scarlett 2i4 sound interface + 2X Mackie MR824 / 825W PSU /> HP OEM Windows 7 Prof.’l 64-bit > 2X Dell Ultrasharp U2715H (2560 X 1440)
[ Passmark Rating = 6280 / CPU rating = 17178 / 2D = 819 / 3D= 12629 / Mem = 3002 / Disk = 13751 / Single Thread Mark = 2368 [10.23.18]
HP z420_3: (2015) (R11) Xeon E5-1650 v2 (6C@ 4.3GHz) / z420 Liquid cooling / 32GB (HP/Samsung 4X 8GB DDR3-1866 ECC registered) / NVIDIA GeForce GTX 1060 6GB/ Samsung 860 EVO 500GB + HGST 4TB / ASUS Essence STX / Logitech z2300 2.1 / 600W PSU > Windows 7 Professional 64-bit (HP OEM ) > Samsung 40" 4K
[Passmark System Rating: = 5644 / CPU = 15293 / 2D = 847 / 3D = 10953 / Mem = 2997 Disk = 4858 /Single Thread Mark = 2384 [6.27.19]
HP z420_2 (2015) (Rev 5) > Xeon E5-1660 v2 (6-core @ 4.2GHz) / 32GB DDR3 -1866 ECC RAM / Quadro P2000 (4GB) / HP Z Turbo Drive M.2 256GB AHCI + Intel 730 480GB (9SSDSC2BP480G4R5) + Western Digital Black WD1003FZEX 1TB> Creative SB X-Fi Titanium + Logitech z2300 2.1 speakers > 600W PSU > Windows 7 Professional 64-bit > 2X Dell Ultrasharp U2715H (2560 X 1440)
[ Passmark Rating = 5920 > CPU= 15129 / 2D= 855 / 3D= 8945 / Mem= 2906 / Disk= 8576] [6.12.16] Single-Thread Mark = 2322 [4.20.17]
HP ZBook 17 G2: (2015 ) i7-4940MX Extreme (4C@3.1/ 4.0GHz) / 32GB / Quadro K3100M 4GB / Kingston 480GB SATA SSD > 17.3" LCD 1920 X1080 > HP docking station > to HP 2711x 27" LCD > Logitech 533 _2.1 speaker system
[Passmark System Rating: = 3980 / CPU = 10140 / 2D = 618 / 3D = 2779 / Mem = 2559 Disk = 4662 / Single Thread Mark = 2387 [1.3.20]
04-23-2020 07:26 AM
Thank you for your reply BambiBoomZ!
In terms of specifications, my system has:
XEON E5-1603 (4 physical @2.8GHz, no hyperthreading),
32 GB RAM,
480 GB-SSD+ 240 GB-SSD,
NVidia Geforce RTX 2070 Ventus
And the configuration for Linux VM:
4 CPUs with execution cap set to 100%
From my perspective is quite a lot of resources allocated for the VM.
Regarding Tensorflow, I still believe it is something related to the OS because under Linux I have had no freeze issues.
04-23-2020 05:40 PM
In my view, the primary concern re: the system freezing is the lack of cores and the clock speed. Running neural nets, VM's, learning, and simulation are all CPU-intensive and relying on multiple bores/threads. A 4-core, 2.8Ghz non-hyperthreading will not have sufficient resources. The Passmark CPU mark for the E5-1603 is 3597 and the single thread rating is 1354. For comparison, the E5-1650 v2 in the office z420_3 : CPU = 15293 (O/C to 4.3GHz+ z420 liquid cooling) /Single Thread Mark = 2384. That processor cost was $59 by the way. The main problem appears to be the lack of cores/ threads and single thread performance (total clock cycles per unit time).
The idea also occurred that if the Xeon E5-1603 is the original processor in the subject system, the z420 may be running a 400W power supply and lack the lower front cooling fan. This means that high loads on the Processor and the RTX 2070 may be at the edge of an adequate power supply, although the RTX 2000 series are quite energy efficient.
If possible, consider changing the motherboard to a V2- having a 2013 boot block date instead of 2011 and at the minimum,, use a Xeon E5-1650 v2 (4C@ 3.5/439GHz), which may be simplt overclocked to 4.4/4.5GHz on all cores using free Intel software (Intel Extreme Tuning Utility). If the current power supply is 400W, consider buying z429 or better, a z620 with a Xeon E5-1650 v2 and that will provide DDR3-1866 RAM instead of 1600, and an 825 W PSU, meaning that even a pair of RTX 2070 will run properly. This kind of system is available these days for as little as $250-$300. Keep in mind that that the current system can be used while the new one is being set up and the sales value of the original will help offset the new one.
While the current system with a change to an E5-1650 -and I believe that the 32RAM may be minimal for multiple OS's with multitasking may be sufficient to learn on, running a VM and demanding programs such as Tensorflow when the projects reach any level of complexity beyond the minimal will require a fairly high level system.
Perhaps consider running Passmark Performance Test (free 30-day trial) and posting the full results: Rating / CPU/ 2D/ 3D/ Mem/ Disk and the Single Thread Mark. Perhaps also verify the power supply rating.