07-03-2013 01:21 AM
after reading logs collected with salinfo, I finally found out that the cause of machine checks is the data cache lev 2 in the CPU. So I need to replace my Deerfield (CPUID1=0x1f010504) with a compatible one. Few questions, I would be grateful for any advices.
1) for instance, is any Madison 1.0, 1.3 and 1.4GHz with the same CPUID compatible? Of course, I mean only electrical issues, not the program ones - all pre-Montecito Itanium2 chips can work with zx1 and firmware of zx2000 machine.
2) say, I buy a 1.3GHz chip instead of my 1.0GHz one. Did I get it right that the new chip should come with its own VRM and cooler?
3) "HP CPU replacement 2.5mm hex tool" seems to be a rare thing in the market, unlike used CPUs. Any suggestions how to replace it with something more common?
Tnank you in advance.
07-03-2013 08:51 AM
I havent actually worked on these before, but the data I found looks like the CPU's are of the same style as the rx2600 server. Those CPU modules are seperate from the VRM/power pod, but do have the heatsink and cooling fans attached. You will need to locate a CPU meant for that server (or your workstation) specifically as others have different mounting setups. I dont know of the zx2000 firmware or system board will restrict your options though. One part that I found for the zx2000 is as follows:
AB617-69010 1.4GHz (Madison, 400MHz front side bus, 1.5MB Level-3 cache)
As far as the tool, you will need a T15 tamper proof torx bit to unscrew the CPU from the board (first remove the power pod). Then you will need a 2.5mm hex bit with a long shaft to unlock the CPU from the socket. You need about 4 inches to get through the heatsink and fan assembly to reach the locking mechanism on the motherboard.
The following document for the rx2600 shows the CPU and tool pretty clearly starting on page 93:
07-04-2013 10:32 AM
Hello Bob, thank you again an again!
The most misleading thing about "the tool" is that I confused the hex holes on 4 screws of the cooler with the aperture for which I need the tool . OK, any 2.5mm hex pin can help with the CPU lock. And to
Unscrew the Turbo Fan Heatsink Captive Screws
I need T15 tamper proof torx - tons of thanks for this, Bob!
08-25-2013 01:59 PM
you'll be really suprised seeing my unlimited stupidity ))
Now I have replaced:
memory (for obvious reasons)
power supply (because there could be some problems with voltage not detected by the ipmi sensors)
and finally, the motherboard, because new CPU didn't change anything
- and still get the same "Firware Error"
But with the new motherboard things seem a bit different (giving me a chance to pretend to be an unlucky person despite of being in fact a stupid one). For instance, without any memory installed, the old system always raised "No Memory" error while now I have the same "Firmware Error" which leads to the suspicion that no memory has been tested by the time this error occured. Another unexpected symptom is the duration of booting: with the old motherboard, it took about 15 sec to show a red-blinking led and perform 6 beeps, while now it's about 40 sec from pressing the power button to raising the error.
Quering the BMC with a null-modem cable revealed that after testing power, fans, enabling acpi, the system detects "SFW HANG" where the "sfw" I believe, stands for "system firmware" (EFI?). OK, the "hang" is really consistent with the time the system needs to raise the error. Then I can print "P 0" followed by "P 1" and repeat the same error many times, BMC seems to work correctly on new motherboard. But how can I become certain that the CPU was really started and was running the EFI code, at least until went to an infinite loop and was killed by the BMC? I experimented with 2 CPUs, 1.4GHz and 1.5GHz, both seem alive with my old motherboard, al least until faced the machine check when booting Linux - but now I have literally no info about the ability of my system to execute ia64 code.
Maybe there is a magic *trick* needed after the motherboard replacement? But if it's the case, why it's required for EFI while the BMC can run properly?
Thank you in advance,
08-25-2013 03:10 PM
I recall now that the bmc version of the *new* m/b is 1.4 which is (chronologically) older than one used with the *old* machine. Maybe the EFI which comes with 1.4 bmc does not support Madison chips? ((