cancel
Showing results for 
Search instead for 
Did you mean: 
The HP Calculator Community Message Board is moving. While we prepare for the move, we are unable to accept new postings. During the move, you can look for help from other users by visiting https://www.hpmuseum.org/ . Or if you need technical support for your calculator visit: HP Calculators. .
ArchivedThis topic has been archived. Information and links in this thread may no longer be available or relevant. If you have a question create a new topic by clicking here and select the appropriate board.
keke99
Level 2
13 11 1 0
Message 1 of 15
1,549
Flag Post

Solved!

divide-by-zero weird problem on HP 8200 Elite SFF

HP Recommended

Greetings,

 

I have posted my message here, as it now appears that my problem is hardware specific.  My machine has the following specs:

HP Compaq 8200 Elite SFF
Intel Core Duo i5-2500 CPU @3.3 GHz
Memory 8 GB

Windows 7 x64

(I can provide more details if required)

 

 

This has been puzzling me for more than a week now!

We have an application where we're running into a divide-by-zero. That application is purposely built to raise exceptions in such cases, with a call to the _controlfp_s function to change the masks on floating point exceptions.

Now, when running into a divide-by-zero on pretty much all of our machines, Visual Studio 2005 debugger breaks at the proper location within our source files. However, on this machine, the break location is all over the place and appears to be irrelevant to the actual cause of the break. So as a test, I built a simple C win32 program with just the following lines of code:

int main(int argc,char*argv[])
{
float temp1, temp2, temp3;
unsigned int control;

_controlfp_s
(&control,_EM_UNDERFLOW + _EM_INEXACT, _MCW_EM);

temp1
=1.0;
temp2
=0.0;
temp3
=temp1/temp2;

return 0;
}

On all those "good" machines, the code does break at temp3.  Amongst other machines successfully tested, I had various hardware and OSes, including Server 2008 R2, Windows XP x32 and Windows 7 Pro x64.  However, on this machine, the code breaks at:

C:\Program Files (x86)\Microsoft Visual Studio 8\VC\crt\src\tidtable.c

function:

__set_flsgetvalue()

Looking at the registers as I step through the assembly code, everything looks fine until I hit the "fstp" instruction... then all the registers seem to be messed up (vs looking as expected on a good machine). When comparing the stack on the good vs bad, I also see stack entries on the bad machine, which I don't see on the good one...


Now... here's what I've done to troubleshoot:
1- Made a disk image of the original HP disk onto a spare disk and restored factory image.
2- Only installed Visual Studio 2005 Pro - C++ portion.
3- Tested code above, still failed.

4- Updated all service packs on Windows and tested code above.  Failed.

5- Updated all service packs on Studio 2005 and tested code above. Failed.

6- Uninstalled all bloatware and tested code above. Failed.

7- Updated BIOS and firmware and tested code above. Failed.

8- Installed Visual Studio 2010 and tested code above. Failed.

 

Then... I decided to take my spare imaged disk and install it on a similar but slightly older HP computer we have, which has the following specs:

 

HP Compaq 8000 Elite SFF
Intel Core Duo E8400 #3 GHz
Memory 4 GB

 

I went through the installation process fine, even though it was using the recovery partition of my 8200 machine.  On that machine, I simply installed the OS and Visual Studio 2005 Pro - C++ portion.  No service packs whatsoever (even if Studio 2005 requires an update for "vista") and the code worked as expected!!
 

So this leaves me with little else but the hardware!  Also note that I tried disabling multiple CPUs and all BIOS optimization on the 8200, with no success.

I cannot think of anything else I can try... please help!

1 ACCEPTED SOLUTION

Accepted Solutions
keke99
Author
Level 2
13 11 1 0
Message 5 of 15
Flag Post
HP Recommended

Wow!

 

After showing this post to a friend... he found an intel document showing very similar symptoms about some of their cpu's... I then found the exact document that applied to our specific CPU (intel i5-2500) here:


http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/2nd-gen-core-deskt...

See errata BJ1

 

This pretty much exactly describes what I'm running into!  I never (seriously) thought it would be an issue at THAT level!

 

View solution in original post

Was this reply helpful? Yes No
14 REPLIES 14
keke99
Author
Level 2
13 11 1 0
Message 2 of 15
Flag Post
HP Recommended

I'd love to hear from someone who has the same machine and any version of Studio installed... see this person gets the same results.

Was this reply helpful? Yes No
Big_Dave
Level 15
28,006 14,922 1,811 3,007
Message 3 of 15
Flag Post
HP Recommended

Hi,

 

Mathematically, division by 0 is an "undefined" operation.  Typically software error handling is the remendy along with data validation before the operation is started to avoid such issues. Chip and bios updates should be presenting the error handling routines with the appropriate error codes.  At this point, it's up to the software to deal with the issue.

 

I know one well know hardware vendor had a flawed chip design for a similar issue many years back and ended issuing a field warning and replacement assistance. This happened about the time of the year "2000" issues.

 

 

HP ENVY 6055, HP Deskjet 1112
HP Envy 17", i7-8550u,16GB, 512GB NVMe, 4K screen, Windows 10 x64
Custom PC - Z590, i7-11700K, 32GB, dual 512 GB NVMe, gen4 2 TB m.2 SSD, 4K screen, OC'd to 5 Ghz, NVIDIA 3080 10GB
Was this reply helpful? Yes No
keke99
Author
Level 2
13 11 1 0
Message 4 of 15
Flag Post
HP Recommended

Hi,

 

Thanks for posting back.  However, I do know about divisions by zero (which can be Indeterminate, +infinity or -infinity, depending on the numerator)... As pointed it out in my description, we already take care of that in the software by making sure the code generates the exception as it happens, instead of letting the damages go unnoticed before something else "breaks".  This is a simulator and is not your average software program.  But the important thing here is why is it behaving this way on this machine... this code has been around for years and works perfectly fine on all other machines and OSes we've had in the past.

 

I do remember that hardware problem you're referring to.  Don't remember the details though.  I hope some HP support people will be viewing this and possibly test it out on their machines.  I don't know... what if this was some similar fundamental problem as the one you're talking about... doubtful... but who knows at this point.

Was this reply helpful? Yes No
keke99
Author
Level 2
13 11 1 0
Message 5 of 15
Flag Post
HP Recommended

Wow!

 

After showing this post to a friend... he found an intel document showing very similar symptoms about some of their cpu's... I then found the exact document that applied to our specific CPU (intel i5-2500) here:


http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/2nd-gen-core-deskt...

See errata BJ1

 

This pretty much exactly describes what I'm running into!  I never (seriously) thought it would be an issue at THAT level!

 

View solution in original post

Was this reply helpful? Yes No
Big_Dave
Level 15
28,006 14,922 1,811 3,007
Message 6 of 15
Flag Post
HP Recommended

I suspect Inteled all along if it was hardware.  However, it doesn't relieve the fact that the error handlers are faultly.

 

 

BTW--- There is one "cloaked" user out here that keeps posting issues on the HP 8200 and really doesn't fool anyone. This person must have an axe to grind against HP since each post makes HP look bad.

HP ENVY 6055, HP Deskjet 1112
HP Envy 17", i7-8550u,16GB, 512GB NVMe, 4K screen, Windows 10 x64
Custom PC - Z590, i7-11700K, 32GB, dual 512 GB NVMe, gen4 2 TB m.2 SSD, 4K screen, OC'd to 5 Ghz, NVIDIA 3080 10GB
Was this reply helpful? Yes No
keke99
Author
Level 2
13 11 1 0
Message 7 of 15
Flag Post
HP Recommended

Hi Dave,

 

When you say "However, it doesn't relieve the fact that the error handlers are faultly", I'm assuming you're going back to your original comment where you question the coding practice and not the problem itself. I didn't want to get into this too much as it wasn't my question... but since you insist, I'll take some time here.

 

I could be wrong but I take it that you've never worked on nuclear simulators (or other similar types of simulators).  Some of the code behind them can go back to over 30 years ago.  Most of it is written in Fortran, to give you an idea, and there are thousands and thousands of line of code.  Unexpected divisions by zero is just one of the possible floating point exceptions we may encounter.  When you do thermodynamic calculations, there are all sorts of unforeseen circumstances where, under very specific scenarios, the code can become unstable and "blow up".  The reason we enable the floating point exception handling code is precisely to capture those unforeseen circumstances and possibly correct them, when possible.  A lot of our code ALREADY validates denominators to prevent divisions by zero.  The specific case of how this bug was discovered here had to do with someone using the wrong set of initial values for the simulator.  I'm not going to go into the details of that but suffice to say that this was "operator error" and is really not supposed to happen under normal usage.  Nevertheless, my solution to this shouldn't have been "well, this is just some freak accident which won't happen under proper usage" because what if an actual divide-by-zero DOES happen in the simulator under normal usage... how am I supposed to track where that division-by-zero is and fix it, if the debugging is failing to point me to the right location to begin with??

 

That's what I meant, when I said that simulators weren't your typical application.

 

 

Was this reply helpful? Yes No
keke99
Author
Level 2
13 11 1 0
Message 8 of 15
Flag Post
HP Recommended

And I should also add up that if the debugger failed to behave as expected under this specific scenario, was I supposed to trust the debugger under other circumstances?  What if my installation files or registries were messed up?  I had cloned that machine onto another identical one and we were getting ready to purchase 3 additional identical computers, onto which I would have cloned the same image.  Could I proceed with the purchase and cloning if my Studio setup was corrupted?  Turns out... as a result of my investigation, we may not even buy those computers anymore!

 

So that's why my focus was on understing the unexpected behavior of the debugger and not that specific divide-by-zero instance.

 

Still... I do appreciate the time you took to provide some suggestions.

Was this reply helpful? Yes No
Big_Dave
Level 15
28,006 14,922 1,811 3,007
Message 9 of 15
Flag Post
HP Recommended

Hi,

 

Using a reference to nuclear simulations and the 8200 is a total bogus.  LLNL would never use such a PC for anything near that process.

 

A true advanced simulator would validate data before starting a math operation to avoid "unpredictable" results.  A simple test for 0 is all that is needed.  Advanced scientific processors will do that process automatically and therefore kick the process back to the application error handler.

 

 

HP ENVY 6055, HP Deskjet 1112
HP Envy 17", i7-8550u,16GB, 512GB NVMe, 4K screen, Windows 10 x64
Custom PC - Z590, i7-11700K, 32GB, dual 512 GB NVMe, gen4 2 TB m.2 SSD, 4K screen, OC'd to 5 Ghz, NVIDIA 3080 10GB
Was this reply helpful? Yes No
keke99
Author
Level 2
13 11 1 0
Message 10 of 15
Flag Post
HP Recommended

dude... seems like you're also implying that I'm "that guy bashing the 8200"... relaxe.  I'm not.  Now that I know, it's not actually the 8200 that's the problem... it's the processor in it.  But I did not know this at the time.

 

As far as your "bogus" claims about nuclear simulators using the 8200.  It appears that you really have no idea of what you're talking about. Even though we're using those as development workstations and not as our main servers (here at our site - others could), they would still be plenty powerful to run the real thing!  Before our upgrades a few years ago, the real simulators were running on pentium machines which were FAR slower than the the i5.  So nice try!

 

It also appears you haven't properly read my comments because you're STILL bringing that devide by zero, and not focusing on EVERYTHING else I said.

 

If you wish to talk about thermodynamics, matrix solutions, simulators in general to test me out... have at it.  But you better be ready.

 

Sorry... but you started with the attitude!

Was this reply helpful? Yes No
ArchivedThis topic has been archived. Information and links in this thread may no longer be available or relevant. If you have a question create a new topic by clicking here and select the appropriate board.
† The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation