Create an account on the HP Community to personalize your profile and ask a question
06-10-2017 02:33 AM - edited 06-10-2017 09:56 AM
I have a Z620 and when I bought this it had a dual CPU E5-2630v2. I have upgraded these to dual E5-2660v2 but on doing so keep getting the following messages on boot:
932 - Warning one of the QPI links is not operating
In addition to this message I also occasionally get the following message:
929 - Fatal MCA error. QPI0 error detected CPU0
When I complete the boot the Workstation seems to be working fine, even though the performance increase isnt as much as I had hoped for from the upgrade (only really evident by stress testing but wondering if its something to do with these errors). However it is a real annoyance. I have had a look around and as far as I can tell it is something to do with the number of QPI links but as far as I can see the number on a E5-2630v2 is the same as on a E5-2630v2 so cant understand why it is happening. I have upgraded to the latest BIOS but this doesnt fix the issue. This therefore leads to the following questions:
Are there any fixes for this?
If there arent what v2 CPU's can I upgrade to so that I wont see this error?
06-10-2017 09:11 AM - edited 06-10-2017 09:24 AM
I'm a bit confused as the post mentions: ",..it had a dual CPU E5-2630 v2. I have upgraded these to dual E5-2630 v2."
I don't know the exact way QPI links work, but they are the links between multiple processors. Errors in QPI links are related to: the socket, motherboard, or the CPU.
If the system was not showing the error with the previous CPU's, then the most likely cause is a fault in the new CPU0.
If you are inclined, conisider installing the free trial of Passmark Performance Test and running the CPU test. According to Passmark, the average CPU Mark for a single E5-2630 v2 is 10453 and for dual processors: 16267.
You might try swapping the two CPU's and if the error message is listed as occurring on CPU1, then the processor needs to be replaced. But, Xeon E5's have extremly good reliability (170,000 hours MTBF I think) and I'd be inclined to look for incidental problems. While making the swap, inspect the socket pins carefully for: bent pins, old thermal paste debris or dust in the pins. If the error message still lists CPU0 after the swap, the fault is the socket or motherboard.
HP z620_2 (2017) > Xeon E5-1680 v2 (8-core@ 4.1GHz) / 64GB DDR3-1866 ECC Reg / Quadro P2000 5GB / HP Z Turbo Drive M.2 256GB + Intel 730 480GB + Seagate Constellation ES.3 1TB / ASUS Essence STX PCIe sound card / 825W PSU / Windows 7 Prof.’l 64-bit > 2X Dell Ultrasharp U2715H (2560 X 1440) / Logitech z2300 2.1 Sound
[Passmark Rating = 6166 / CPU rating = 16934 / 2D = 820 / 3D= 8849 / Mem = 2991 / Disk = 13794] 4.24.17 Single Thread Mark = 2252'
06-10-2017 09:53 AM
Thanks for the reply. I just realised I had made a typo. It should read that I upgraded from the dual E5-2630 v2 to E5-2660 v2. Sorry about that.
I have also tried swapping the CPU's from one slot to the other and the error remains with the same CPU socket. I also tried another E5-2660 v2 CPU and the same error occurred.
06-10-2017 03:35 PM - edited 06-10-2017 03:40 PM
Oh. no problem- I'd assumed you'd changed to another model processor.
Just to verify: Did the system perform well and without error messages with both E5-2630 v2's?
Also, you might consider going to BIOS setup and resetting to factory defaults, plus checking that all processors and all cores are enabled.
In Control Panel > System and Device Manager, how many processors / cores are listed? Is the total amount of RAM shown correctly? I had the thought that the error may have an oblique connection to a memory problem and you might like to try swapping and reseating the RAM associated with CPU0.
As the problem appears to be isolated to the CPU0 socket /motherboard and not the processors, only to eliminate the possibility, consider removing CPU0 and making a close inspection of the socket for bent pins, debris. dust, & etc. plus have a look at the motherboard surrounding the sockets for anything discolored or burnt.
If would be interesting also to try the Passmark test with both CPU's running and then with the 2nd CPU riser removed. If the CPU score is about the same, that would verify the QPI link problem. Especallly if the system runs on CPU0 only without the error message, then my guess (I hope others will comment!) is that the motherboard is faulty.
06-10-2017 04:19 PM
Yes the system worked perfectly fine with the old E5-2630 v2 CPU's and had no errors.
All cores and threads are enabled and showing and RAM is displaying fine. The system works as it should once it boots up past the error message.
Also if I remove the 2nd CPU board and boot with just CPU0 then it boots up fine with no error message. I have also tested this using some CPU intensive mining software and performace is about half with just 1 CPU (as you would expect).
Tomorrow I will put back in the old E5-2630 v2 CPU's to test if the error comes up.
06-12-2017 07:06 AM
As I re-read the thread this morning, the fact that the system worked properly without the 2nd CPU riser may suggest a memory problem. You might consider trying using both E5-2660 v2's but with only one RAM module on the main board and one on the CPU riser board. The modules should be placed in the first positions on both boards according to the order of placement in the manual. However, if you have already changed back to the E5-2630 v2's, those results will be revealing also.
08-11-2017 03:30 AM
I happen to suffer the same situation. Just bought a refurbished z620 and riser card that came with 2x E5 2609. It seemed to work fine at first, but now it has begun to report at boot:
929-Fatal MCA error
QPI0 error detected CPU 1
Generic interconnect level
II other transaction Generic PP - bus interconnect error - phy control error.
932-Warning one of the QPI links is not operating
Tried to swap memory dimms and take even slower dimms so I would say is not a memory issue. Took the CPU out and applied some compressed air, but result is the same.
The 929 does not show every time, the 932 I would say "always".
For the rest I cannot confirm as Damo123 said that system works as before given I had no system installed. Booted a linux live and got some recoverable kernel errors.
Bios is updated to the latest version and system does not report any error if the CPU riser card is out. Starting to suspect a riser card problem 😞
Damo123, did you find a solution to it?
Thanks in advance,
08-11-2017 06:38 AM - edited 08-11-2017 07:05 AM
Are you using ECC registered RAM? What is the amount on the mainboard, amount on the riser board, and module size? I have read that a symmetrical amount on the mainboard and riser plus the proper module installation sequence is important. There is a diagram on the inside of the access door for the order of placing the RAM. For example, in my dual CPU z620, I have 4X 8GB mainboard + 4X 8GB riser of HP DDR3-1600 ECC registered.
The conventional assessment for this situation is that the 2nd CPU riser has a defect, or the riser connector socket on the mainboard has a fault- bent pins or debris in pins. Another possibility is bent pins or debris in the riser CPU socket. The last consideration is the possible impending failure of the riser E5-2609. Can you try the riser E5-2609 on the mainboard to test it?
HP z620_1 (2012) (Rev 3) 2X Xeon E5-2690 (8-core @ 2.9 / 3.8GHz) / 64GB DDR3-1600 ECC reg) / Quadro K2200 (4GB) + Tesla M2090 (6GB) / HP Z Turbo Drive (256GB) + Samsung 850 Evo 250GB + Seagate Constellation ES.3 (1TB) / 800W / Windows 7 Professional 64-bit > > HP 2711x (27" 1980 X 1080)
[ Passmark System Rating= 5675 / CPU= 22625 / 2D= 815 / 3D = 3580 / Mem = 2522 / Disk = 12640 ] 9.25.16 Single Thread Mark = 1903
[ Cinebench R15: CPU = 2209 cb / Single core 130 cb / OpenGL= 119.23 fps / MP Ratio 16.84x] 10.31.16