07-10-2019 11:37 PM - last edited on 08-14-2020 01:08 PM by JessikaV
DISCLAIMER: Before mods start to complain about creating this thread in a wrong section - This thread was not started by me, it was originally a reply to another thread that got converted into thread by some mod without my consent - I just gladly took over it.
Hello, I'm yet another HP Customer that has to deal with the very same issue: Something in the firmware (presumably) causes NT/Linux kernel-based OSs to experience heavy usage of CPU Cores due to Interrupt storm, causing high CPU frequencies in general (shortening lifespan of the device) and making fans blow like crazy. I've done the very same diagnostics (and tried some original approaches myself) and got the same results as people reporting this issue before me - to no avail.
HP - There is clearly something wrong with the BIOS image, it happens on multiple brands of HP laptops of the same generation - I own a Zbook 17 G5 with Intel Xeon E2176M and Quadro P5200 on-board, while people are reporting the same issues on HP Omen and some other laptops. This issue is 100% OS-agnostic, it happens on GNU/Linux too.
- After several minutes of inactivity (sometimes even during active usage of the laptop), "System" process starts to utilize 100% of CPU5 and 50% of CPU0, revving CPU Frequency and Fans up to the limit.
- Analysis of the .ETL file shows that CPU is used mostly by ntoskrnl.exe!KiStartSystemThread stack
- It is accompanied by sudden ISR/DPC storm on ACPI.sys module, caused by it's functions:
- This is also accompanied by sudden, pernament DPC/ISR storm on ntoskrnl.exe
- A proof that it is indeed OS-agnostic (courtesy of Jay889)
- Output of !amli dl during the storm - these events repeat infinitely
0:41:08.273 [ffff860255d930c0] RestartContext Context=ffff8602512e8010 \_GPE._L6F QTh=0 QCt=ffff8602512e8010 QFg=00000000 pbOp=0: 0:41:08.312 [ffff860241e87040] RunContext Context=ffff8602512e8010 \_GPE._L6F QTh=0 QCt=ffff8602512e8010 QFg=00000000 0:41:08.313 [ffff860241e87040] AsyncCallBack rc=0 pEvent=a2800d \_GPE._L6F QTh=ffff860241e87040 QCt=ffff8602512e8010 QFg=00000000 0:41:08.313 [ffff860241e87040] FinishedContext Context=ffff8602512e8010 rc=8004 QTh=0 QCt=0 QFg=00000000 0:41:08.334 [ffff860255d930c0] RestartContext Context=2 \_GPE._L6F QTh=0 QCt=0 QFg=00000000 pbOp=0:
This particular problem is not caused by the HDD Caddy, I don't have such part installed in my device, just plastic fill-in with no electronics nor jumper-shorting part. Screens above come from two capture sessions: during problem occurring and when it starts (it took approx. 5 minutes, laptop was not used during cap. I'm willing to share .etl files with HP Support.
My hunch is that it is related to poor implementation of ACPI-interface in the BIOS image (Q70 in my case) and may be linked to Intel MDS mitigations.
More and more customers are demanding official statement:
1. Is HP aware of the problem?
2. Is new version of the firmware being developed for the affected devices?
3. If it is caused by fault made by Microsoft/Intel, was actual research on HP's part done to sort it out?
Known Affected Devices:
- HP Omen 17
- HP Omen 15 (coincidentally, many users are reporting high temperatures on this community)
- HP Zbook 17 G5 (including mine)
- ASUS Prime X299 Deluxe (FIXED by Bios Update)
- ASRock H110M-DGS (FIXED by Bios Update)
and many, many more... if you have the same issue, just put the link down in the comments and I'll pin it up here.
If you experience the same issue - don't be a stranger, leave a comment, share it on reddit, help making HP acknowledge our hardship - many of us spent a lot of money to get this hardware and expect it to work properly.
07-16-2019 05:44 AM
Thanks for the link. I performed extensive tests and analyzed data in WPA - aside from the hal thread spawned by acpi.sys I also noticed continous PpmCheckRun coinciding with it. This makes sense, as the problem triggers when you don't use your computer long enough for ACPI timer kicks in lower energy level - that is the trigger. Clearly the firmware is at fault. HP silence on this issue indicates that they may know about this already and it may be linked to the botched Intel Spectre/MDS mitigations that are hastely implemented. As user of HP Proliant servers, I'm deeply dissapointed with a fact that ASUS is so far ahead of HP on this case, even in the workstation segment.
07-16-2019 05:50 AM
While i'm posting this reply system process related to acpi is using 20%.. i was browsing the internet no heavy usage.
But sometimes when I'm playing games the problem appears. Actually I'm done trying to troubleshoot this issue and i'm considering to change my gaming laptop ! 20% of idle usage on a gaming laptop is a problem deserve to be considered.
Please keep this thread updated in case you could stop what triggers this issue.
Best Regards and thank you for your replies.
07-16-2019 08:43 AM
Unfortunatelly, ACPI problems are almost impossible to solve by end users as this system is based on cancerous binary blobs instead of open, declarative language to be interpreted by the OS. Today I'll try to isolate potential triggers by forcing C-states configuration on the CPU. I noticed (although it requires further investigation) that it happens less often if you set performance profile to High Performance. This profile is somewhat hidden in current build of Windows 10 in order to streamline power profile slider, but you can still bring it back by choosing it in Windows Mobility Centre. BTW Cpu is not the only one part handled by ACPI routine, if energy handling is seriously messed up, any such event is able to trigger interrupt storm.
07-16-2019 12:07 PM
Hi Jay889, I updated the thread OP to be more noticeable, let's make it gain some traction, I'm absolutely not willing to settle for generic responses and sweeping this issue under the rug - not on my watch.
I'll wait a few days for this thread to get trending, and if we don't get any response from HP Agents, I'll start escalating this issue.