-
×InformationNeed Windows 11 help?Check documents on compatibility, FAQs, upgrade information and available fixes.
Windows 11 Support Center. -
-
×InformationNeed Windows 11 help?Check documents on compatibility, FAQs, upgrade information and available fixes.
Windows 11 Support Center. -
- HP Community
- Desktops
- Business PCs, Workstations and Point of Sale Systems
- 928 fatal pcie error on a Z840

Create an account on the HP Community to personalize your profile and ask a question
10-18-2017 01:36 PM
I am using the HP Z840 workstation for FPGA development at work. I have special purpose Xilinx FPGA board which connects to the PCIe slot 2 (actually any other slot too behaves the same way)using PCI x4 on a x8 slot using an external PCIe extension cable. The FPGA device can suddently dissapear as a PCI device, depending on the loaded FPGA hardware design and other conditions. When the FPGA based PCIe device suddenly disappears, I am getting a fatal post error: 928 fatal pcie error - Sudden link down etc; with the end result being an OS crash, and possible loss of unsaved data.
On further investigation and google search, I came to know that the PCIe slots should have the "HotPlug" and "Surprise" bits enables in their status registers. Otherwise, an PCIe device, if it suddently disappears will cause a fatal error. On examining the output of "lspci -vvv" command on Ubuntu Linux (my installed OS), I saw that both these bits are disabled for all the PCIe slots on the motherboard. It appears that the BIOS needs to set these bits for proper operation of any external or other special purpose PCIe devices. Will HP point me to how to set these bits using the BIOS configuration settings or, if this feature is currently unavailable, can HP provide a new BIOS version which either sets these capability bits by default or gives the user an option to do so?
Steps to recreate: Connect a programmable PCIe device (typically an FPGA based PCIe device for example) to any available PCIe slot on the HP Z840 workstation. Now boot up the machine, with the PCIe device initially in active state. Now while the Linux OS is running, programmatically deactivate the said PCIe device. The BIOS issues the 928 Fatal Error (Surprise Link Down), Linux OS crashes as a result within a second or so. NOTE: An external PCIe graphics card when suddenly plugged out of a running system should also result in similar fatal error (unverified).
Expected behavior: The removal of a PCIe device should not result in a fatal error. The OS is otherwise capable of recovering from such "HotPlug"/"Surprise" PCIe device activities as per the PCIe standards.
BIOS post fatal error leading to Linux OS crashes
03-08-2018 05:10 PM
I suspect the issue for both posters is that the problem is not with the z840 but the pci-e card(s) themselves
you can try cleaning the card contacts and reseating the card or moving the card to another slot
you can also try going into the BIOS and restricting PCIe slot the card sits in to v2.0 speeds (5 GB/s)
also,...the HP the "Maintenance and Service Guide" for the Z440/Z640/Z840 lists the solution as:
"Move the card to a different slot. If the problem persists, replace the card."
03-08-2018 05:51 PM
The same PCIE card works on another lower end machine so I doubt it is a card or user issue.
HP needs to issue a new BIOS with the problem correction IMHO. So sad that this costly machine is not accompanied by an equally good customer service from HP. It has been almost 6 months, and still no attempt to reach out to the customers.
03-09-2018 02:54 AM
the HP work stations follow the pci-e spec exactly as it is written by the group which creates the pci-e standard
other motherboard makers may or may not follow this spec as closely
as i pointed out to you, setting the slot that this card resides in (and possibly other cards also) to a lower bus speed may allow the card to work
in reguards to your statement: "HP needs to issue a new BIOS with the problem correction IMHO"
1. hp workstations are validated with numerous software vender packages and hardware configurations, and undergo exaustive testing before the software vender will allow hp to list it as a approved configuration
2. if there was a basic hardware/bios design issue, it would most likely have been found out years ago when the system was still new and under active support (no such problem was ever listed in the bios updates)
3. the hp z820 was released around 2012/2013 timeframe and as such is past hp's timeframe for active support
(spectere issue excluded) and as such hp is no longer providing active support unless you have a extended care pack for the system in place
if you google " 928 fatal pcie error" you will see that this is a generic error that affects may people and it allmost allways is due to improper contact between the card and pci-e socket and/or pci-e bus speed
03-09-2018 01:23 PM
I am not interested in sparring with you on semantics.
However, there are a few things I need compelled to clarify in light of your comment:
1. I am talking about HP Z840 (it is not 820 as you say, and I dont know if there is a significant difference, but I assume there may be) which was purchased new in Sep/Oct 2017, shortly before my original request for support.
2. The card works correctly if not under the specific circumstances I mentioned in original post (when the said PCIE device suddenly and programmatically disappears while the machine is already up). This is not a situation that is usually tested by motherboard companies (like HP or its subcontractors), however the proper behavior is definitely not to give a stupic PCIE fatal error which does not help anybody, and surely does not follow the PCIE standards to the best of my understanding (I am a fairly knowledgable hardware engineer, IMHO). If it were a simple dust problem or non-contact problem we would have figured that out in the last 6 months, don't you think?
3. There are several corner cases when developing products supposed to meet various industry standards - that is a given and all reasonable people understand that. So, a discovery of a bug is not unusual nor does it reflect on the quality of a company's products (in this case HP). However, it is the lack of interest in solving a customer's issue (however small or percieved insignificant it may be) that is often more revealing to a company's internal culture and trust-worthiness. So, if you are an HP employee, please try to work with the customer as best as you can, or the at the very least dont try to prematurely impose your flawed technical udgement on them.