• ×
    Information
    Need Windows 11 help?
    Check documents on compatibility, FAQs, upgrade information and available fixes.
    Windows 11 Support Center.
  • post a message
  • ×
    Information
    Need Windows 11 help?
    Check documents on compatibility, FAQs, upgrade information and available fixes.
    Windows 11 Support Center.
  • post a message
Guidelines
Are you having HotKey issues? Click here for tips and tricks.
HP Recommended
HP ZBook Fury 16 G10 Mobile Workstation PC IDS Base Model
Microsoft Windows 11

ZBOOK FURY 16G10, GPU NVIDIA RTX5000 Ada/16 Gb, Driver Version: 555.85, bought last month, CUDA 12.5/12.1,  primary setup (environment variables) for CUDA 12.1 CuDnn 9.1.1 windows, pytorch 12,3.  During training of yolov8 models for several datasets, epochs 100..300, batch size 4...32  shows a consistent bad behaviour as follows :

For a few epochs training is normal and fast, 3/16 minutes/epoch depending on batch size and dataset characteristics

nvidia-smi shows : GPU temperature 65 ...70 Celsius, Power 60..70 WATT, GPU utilization 40..75%, you can hear that fans are working;

Then for several other epochs training becomes extremelly slow, 1...8 hours/epochs, nvidia-smi shows : GPU temperature 53...63 Celsius, Power ~35 WATT, GPU utilization 100%, fans are not working.

I repeated the experiments for various datasets and yolov8 model, the behaviour is similar, does not matter the number of epochs already occured. Always it is happening that after a few epochs the temperature to drop below 63 Celsius, Power drop at around 35 WATT, utilization to increase at 100%, training time increases hugely.

 

It looks like  GPU might be thermally throttling to prevent overheating, which would explain the lower temperature, power usage, and utilization during slow epochs.

 

I want to add that I'm a Computer Vision practitioner with more than 5 years experience and I'm running for the time being the same models/experiments on a Dell Server with A4000/12 Gb with much consistent and constant results during models training, always the temperature is over 70 celsius.

 

Another remark related to this ZBOOK/GPU is that Windows 11 task manager/performance do not recognise GPU utilization (shows always 0 utilization comparing with nvidia-smi). No other tasks are ran in parallel.

 

Please let me know which could be possible causes and how to handle this because for the moment my notebook is useless. 

 

Thanks

 

Dragos

 

1 REPLY 1
HP Recommended

I am running 12 systems, CUDA apps, 24/7  but have nothing newer than RTX-2080Ti. I have to run MSI Afterburner to ensure the fans blow enough air to keep the boards cool.  I also prevent the CPU from running in turbo mode as the CPU will quickly overheat.  24/7 like this will ruin a notebook.

 

Your RTX5000 may not be recognized. If you select 3D for performance the CUDA should show up at or near %100.

BeemerBiker_0-1718978451350.png

if not then there is a problem with the drivers not providing the proper answers.  Alternately, windows is not showing the correct display report.

 

  If in doubt then use GPUz's sensor display to confirm what nvidia-smi shows

BeemerBiker_1-1718978705170.png

 

Your cooling system does both the CPU and GPU. 

To prevent the CPU from running in turbo set the % utilization to %99

Enter the phrase 'edit power plan' in the window search box
Then select the 'Advanced' tab

BattPlan.png


To keep the CPU from overheating , consider
using %99 instead of %100.

Verify CPU speed using this tool (CPU-Z)

Possibly the MSI afterburner can be used to force the GPU to a lower performance to keep from overheating

 

BeemerBiker_0-1718979141641.png

 

I have read that the maximum temperature of that GPU is 93c but in a laptop one should not get anywhere near that.


Thank you for using HP products and posting to the community.
I am a community volunteer and do not work for HP. If you find
this post useful click the Yes button. If I helped solve your
problem please mark this as a solution so others can find it
† The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the <a href="https://www8.hp.com/us/en/terms-of-use.html" class="udrlinesmall">Terms of Use</a> and <a href="/t5/custom/page/page-id/hp.rulespage" class="udrlinesmall"> Rules of Participation</a>.