08-11-2017 08:28 AM
Hi BambiBoomZ, thanks for the tips. Now my answer to your questions:
- I have tried with different sorts of RAM combinations. All ECC. All working previosly.
--> 12x 8GB ECC DDR3-1333 (8x in MB, 4x in riser)
--> 8x 8GB + 4x 2GB ECC DDR3-1333 (8x 8GB in MB, 4x 2GB in riser)
--> 8x 8GB ECC DDR3-1066 (4x in MB, 4x in riser)
All combinations, same result. I share your fears, that it is likely the riser connector or the MB pins for the riser. Will have a look under magnifier for the MB pins. I will also have a look under the magnifier the CPU socket on the riser.
Later I will use some isopropyl alcohol spray to "clean" the riser connector.
But before all I will swap the CPUs, as you suggest, to check the (I reckon unlikely) case of the CPU being the one broken. Will let you know on the outcome.
This situation is rather unfortunate, as I could swear it worked well when I installed the riser first, but now it is constantly reporting this :(. The situation is even more "painful" as the riser came WITHOUT the VERY SPECIFIC cpu heatsink and the only one that fits is a water-chilled one, so I had to do the full mod (and thus I also swapped the MB heatsink), which took me quite a while... 😞
Anyhow, I will keep you posted! thanks for the swift reply!
Best from italy,
08-11-2017 08:43 AM
Hi there Damo123,
Thanks for this comforting answer, nonetheless! it is already good news!! This situation just hit me when hunting new E5 v2 processors for the z620, so now I am debating whether is worth a dual head or just go for single CPU for the time being...
The fact that works actually makes sense as it is likely that some of the pins on the QPI "bus" have issues, but QPI has built-in reliability mode.
"Although the initial implementations use single four-quadrant links, the QPI specification permits other implementations. Each quadrant can be used independently. On high-reliability servers, a QPI link can operate in a degraded mode. If one or more of the 20+1 signals fails, the interface will operate using 10+1 or even 5+1 remaining signals, even reassigning the clock to a data signal if the clock fails.
This feature is perhaps better explained here: http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-quickpath-interconnect-qpi/4/
So, it would likely work well, and in most situations (if not all) the Bandwidth impact of some QPI lanes takedown wont be noticeable...
Again, thanks for your swift answer!
08-12-2017 05:28 AM
Dear BambiBoomZ, and *specially* Damo123: SUCCESS!!
Now, What I did:
- Swapped the CPUs (the failing CPU1 was not reporting any failure in CPU0 socket). Cleaned the thermal paste on top of the CPU and rubbed the bottom with isoprophyl alcohol on microfibre cloth.
- Cleaned the raiser card bottom connector with isoprophyl alcohol spray (did 2/3 times and then let dry for several hours)
- Verified the pinouts of all connectors (raiser, and cpu sockets -> all alright!).
- Reapplied thermal paste on the CPUs and put them gently in the sockets. This time I reassured the screws were a bit tight.
Then, for the riser card insertion:
- I put the case in horizontal position, this way, instead of laterally, the riser is inserted from above.
- inserted the riser very slowly (and without any additional PCIe cards around).
- when the lateral latches made "click" I still applied some gentle push to both edges of the riser down...
And... Voila!: (I hope this could be of help to Damo123 and/or others..)
Now, back to the xeon e5 v2 hunt! 🙂
08-12-2017 05:39 AM
Not an artist with handcrafts, but I am sure you will understand my happiness now that it is back. I put some hours of dremmel and thinking into this little project, and it was certainly frustrating to see it failing at the very end... I wasn't that thrilled to do it, but the riser came without CPU heatsink and watercooled heads were the only ones fitting...
05-06-2018 11:24 AM
This Javato artist :)) Do you remember how many problems for the Z820 to Z620 watercooler compatibility arised?
This is a bit cutting-edge technology, but it looks corsair watercooler does it job. And some StarWars looking back of the Z620
05-06-2018 11:35 AM - edited 05-06-2018 11:41 AM
you have a defective cpu which has a failing/failed QPI link or bent cpu socket pins (or in this model a bad cpu riser card)
the qpi links are used in dual/quad cpu setups to allow data to flow between the cpu's
the E5 series cpu's have dual qpi links
this is why swapping the cpu's may change the qpi error location
i had a issue with a pre production E5 that had a slower QPI link that the production models as such i had to place the pre production E5 in cpu o socket for the system to work with both cpu's
if i placed the production E5 in with the faster QPI link in cpu 0 socket the second pre production E5 would spit out a QPI error
05-06-2018 11:53 AM - edited 05-06-2018 11:59 AM
I had this QPI error 15 days ago:
I did this: open case, pull out riser board, clean RAM memories, swap RAM memories before installing them, reseat riser board. It worked for 15 days until today. Today the Z620 shut down by itself and when booting up it showed the error in first picture. The 2nd boot up it showed this error in the 2nd picture.
Do you think it could be the fact I reinstalled the riser board and not the RAM memories which removed the error? ( I did not remove any CPU or CPU Heatsink) so I exclude the error being from CPUs.
CONFIG: Z620 - Dual E5-2690 + 32GB EEC 1600 DELL Certified ( 2*8GB on Mainboard + 2*8Gb on Riserboard)
05-16-2018 01:53 PM
As mentioned on my post, just clean the contacts. Take the riser out, take the cpus out, and use compressed air and isopropyl spray on every connector with pins and females. let it dry for *few* hours before reinserting. The base of the xeons is flat, so put some isopropyl alcohol on microfibre cloth and rub gently. let dry too. Put the riser from above (i.e. put the case horizontal). press on both ends.
I had the same issues. I hope this could help. I have not had any issue since then. Happy z620 owner! 🙂