• ×
    Information
    Windows update impacting certain printer icons and names. Microsoft is working on a solution.
    Click here to learn more
    Information
    Need Windows 11 help?
    Check documents on compatibility, FAQs, upgrade information and available fixes.
    Windows 11 Support Center.
  • post a message
  • ×
    Information
    Windows update impacting certain printer icons and names. Microsoft is working on a solution.
    Click here to learn more
    Information
    Need Windows 11 help?
    Check documents on compatibility, FAQs, upgrade information and available fixes.
    Windows 11 Support Center.
  • post a message
Guidelines
The HP Community is where owners of HP products, like you, volunteer to help each other find solutions.
Archived This topic has been archived. Information and links in this thread may no longer be available or relevant. If you have a question create a new topic by clicking here and select the appropriate board.
HP Recommended
Z840
Microsoft Windows 10 (64-bit)

Can someone explain what teh optimal Perfromace settinsg should be in teh BIOS settings?

 

Theres an option for PCIE performance enable/disable. By default its disabled - will this speed up the system?

 

I have a dual E5 2620 2,40GHz/128 GB RAM/K5200 GPU and to be honest - I dont find it to be very fast.

 

Thanks!

11 REPLIES 11
HP Recommended

vps_dxb,

 

We have two z420's and a z620 and the best results have been when BIOS is set to factory default settings with the exception of the boot order to suit the drives and the z620 has  the fan setting an asterisk or two higher. as it has two hard working 130W E5-2690's.

 

Just to eliminate it as a possibility, go to Contrpl Panel > Power Options and ensure that the z840 power plan is set to "High Performance."  This can make a remarkable difference as it maintains the CPU at a higher speed instead of a low idle.

 

In considering possible reasons that the z840 seems slow,  it would help to know the specification, besides the E5-2620 (and is that a v3?) , RAM, Graphics card, and drives.  Then what programs, and what they're doing.

 

If you would like, download, install, and run the free trial of Passmark Performance Test and let us know the results.  Performance Test returns results that can show any obvious weaknesses in relation to the performance in the applications.  For example is the system was being used for large dateset financial analysis,  molecular biology, one would suspect the calculation density of the E5-2620  v3? are not quite up to the task.  On Passmark, a pair of E5-2620 v3's (6-core @ have an average CPU rating of 15510, which is quite good for a server, but  is equalled for example by a single  E5-2658 v3 a 12-core @ 2.2/2.9GHz.  This is logical as the E5-2658 has about the same clock speed but twice as many cores per processor as the E5-2620 v3.  Intel lists it as costing $1,832, so it's 4X the E5-2620 v3 cost.

 

The CPU clock speed too is important experientially as it is the main determinant of the single-thread performance. the majority of programs, especially visualization programs- 3D CAD, animation, are single-threaded.  The E5-2620 v3 Passmark single-thread mark is 1692 which is reasonable,  but unless coupled with a very strong graphics card might not feel especially responsive in  Revit or Solidworks.

 

The graphics card is another component to consider as the GPU contribues to the overall computational power.  We have a Dell Precision 390 from 2007 with a Xeon X3230 4-core @ 2.66GHz (=Passmark 3631) and tested with a Quadro K4200 produced a 3D mark of 4067.  In the z420 /E5-1660 v2 sysem the K4200 mark is 4694, so the 390 CPU holds the K4200 back,  but with that CPU, the 3D CAD function in the 390 was quite usable tested with medium size projects.

 

Finally the drives are important to a perception of overall system responsiveness.  The z840 is capable of using M.2 NVMe drives- the fastest which include the fastest drives in the World and that is an opportunity for noticeable improvement.  The z420 /E5-1660 v2 has a Samsung M.2 SM951 256GB- but the slightly slower AHCI version and that changed the already fast 4794 of an Intel 730 480 GB to 11559. 

 

if you install Performance Test on your z840 , it's very revealing to compare results with other z840 systems.  The highest scores in each parameter for z840: 

 

[ Passmark

 

Rating = 5636 > 2X E5-2687w v3 / 64GB RAM / Quadro K6000 / Samsung XP941 M.2 256GB

CPU= 31347 /2X E5-2699 v3

2D= 966 / Quadro K1200

3D= 11155 / GTX 980 Ti     The highest workstation card is 11077 for Quadro M6000

Mem= 2668 / 128GB in E5-2640 v3 system

Disk= 20563  / LSI MR9270-8i  (The LSI is a RAID controller and hides the actual drives connected)  2nd place is 15197 using an NVMe THSN51T02DU7 TO.  < I believe that is an OCZ RD400.   3rd is 13899 with Samsung SM951 512GB NVMe.

 

Of course, the above results are the best in every rating.  The best E5-2620 v3 system (8 tested)  is:

 

Rating = 4723 >

CPU= 10694 /2X E5-2620 v3

2D= 640 / Quadro M5000

3D= 8874 /  Q M5000

Mem= 1950 / 64GB

Disk= 20563  / Micron M550 256GB  > That may well be a pair in RAID 0

 

So, this is a method to put system performance into a useful context.

 

Cheers,

 

BambiBoom_Z

 

HP z420  (2015) > Xeon E5-1660 v2 (6-core @ 3.7 / 4.0GHz)  / 32GB DDR3 -1866 ECC RAM  / Quadro K4200 (4GB) / Samsung SM951 M.2 256GB AHCI + Intel 730 480GB (9SSDSC2BP480G4R5) + Western Digital Black WD1003FZEX  1TB> M-Audio 192 sound card > 600W PSU> > Windows 7 Professional 64-bit > Logitech z2300 speakers > 2X Dell Ultrasharp U2715H  (2560 X 1440)>
[ Passmark Rating = 5581 > CPU= 14046 / 2D= 838 / 3D= 4694 / Mem= 2777 / Disk= 11559]  [6.12.16]

 

HP Recommended

Hi - thanks for yoru detailed response.

 

We are running Pix4D Mapper Pro to process aerial imagery, point clouds etc. I have had to disable hyperthreading as the application performance is very slow when compared to a single i7 for example.

 

Graphics card is a Quadro K5200.

 

128GB of DDR4 RAM

 

Samsung SSD 850 drives.

 

So you think its the application code that does not harness HT properly thats causing the issue?

Do you think swapping out the Quadro to a GeForce will make a huge difference?

 

Overall I feel somewhat shortchanged in that I could have had a high performace i7 setup instead of dual Xeons....

 

HP Recommended
HP Recommended

Thanks - I am already aware of the requirements...thats not the issue! I have spent many days on Pix4D forum talking to other users with dual Xeons that have a similar issue.

 

I am wondering if Intel charge money for the hyperthreading SDK that would prevent a software company from investing?

 

 

HP Recommended

vps_dxb,

 

Pix4D Mapper Pro appears to be an application that intends to optimize application performance by maximizing both CPU and GPU utilization, which means the memory and disk systems have to be very good also.

 

A very interesting notice:

 

https://support.pix4d.com/hc/en-us/articles/210951143-Long-processing-time-with-Xeon-v3-processors#g...

 

__________________________________________________________________

 

Long processing time with Xeon v3 processors

Updated: July 25, 2016 03:08
 

Error

Long processing time with Xeon v3 processors.

 

Description

When processing projects with Xeon v3 processors such as, but not limited to, the Intel Xeon CPU E5-2630 v3 or the Intel Xeon CPU E5-2687W v3, processing times are significantly longer than expected.

 

Cause

Under investigation.

 

Workaround

The following steps can alleviate the problem:

- Update motherboard BIOS

- Update chipset driver

- Disable hyperthreading

- Update GPU driver (crucial)

Until the cause of the issue is identified, it is recommended to choose a different CPU if possible (due to longer processing times).

 

_________________________________<END

 

 

As you have disabled hyperthreading, you may well have seen this already.

 

My sense is that the problem may be poor application scalar multi-threading. The intial steps of the data to map conversions as in all 3D modeling is going to be highly single-threaded, but this task seems to variably distributed by the application and of course, the operating system.  When I run various 3D modeling applications on a 6-core / 12-thread Xeon E5, I can see activity in 6 threads,and the distribution may well be Windows.

 

Ordinarily, in this situation,  the problem might be attributed to the single-thread performance- so important in 3D modelling ,of the Xeon E5-2620 v3 (8-core @ 2.4 /3.2Ghz), which has a Passmark Single Thread Mark of 1670 but the notice also mentions the E5-2687W v3 10-core @ 3.1/3.5 with a rating of 1940.  I have an HP z429 with an E5-1620 (4-core @ 3.6 /3.8) with a single thread score of 1931 and that is adequate in AutoCad, Revit, Sketchup, and Adobe CS6.  I'm not aware of the details of the E5- v3 architecture,  but given the circumstances, Intel thoroughness, and the level of HP development I would supect it's an application situation.

 

However, there may be a work around:

 

In the notes on the system requirements page for Pix4D Mapper Pro:

 

https://support.pix4d.com/hc/en-us/articles/202557289#gsc.tab=0

 

_______________________________________________

 

  • An SSD hard drive can speed up processing. 
  • The graphic card may have an improvement on the processing speed for step 1 and step 2 (if the graphic card is compatible with CUDA (NVIDIA Graphic Cards). Processing time of step 3 is not affected by the GPU. The GPU affects considerably the visualization of the rayCloud. For more information about the use of the GPU: 203405619.
  • For more information about Hardware components usage when processing with Pix4Dmapper: 202559519.
  • For recommendations for a Hardware and Software Configuration: 202559159.
  • For more information regarding: Mac / Windows XP / Linux / Remote Access - Virtual machine / Distributed - Parallel processing: 202556809.
  • For more information about processing speed: 204191535.

______________________________________________ <END

 

The interesting paper in this regard is 204191535 which includes:

 

______________________________________________

 

Hardware used

 

The software is highly parallelized and takes advantages of multi-core CPUs, as well as SSE/AVX instructions and NVIDIA GPU CUDA processing.

However the different steps of processing do not use these resources the same way, and not all parts can take advantages of multi-core or CUDA.

Note that to increase the performance it is important to have a balanced configuration without bottlenecks.

CPU

Most of the processing is done on the CPU, and a faster CPU is the first key to increase processing speed. Hexa and octo cores, latest generation i7 or Xeon CPUs are recommend . Clock speed generally impacts the full project, and number of cores greatly impacts step 2.Point Cloud and Mesh.

Dual socket CPU does not double performance, but does generally provide faster processing than single socket CPU.

RAM

The amount of RAM mostly has an impact on the number of images that can be processed in a single project. It also has an impact on the processing of step 2.Point Cloud and Mesh.

Hard Disk

Step 3. DSM, Orthomosaic and Index is the most impacted by the hard disk speed. Using octo core CPUs and a fast SSD improves the performance of step 3.

Graphic Card

Starting with version 1.3, Pix4Dmapper takes advantages of NVIDIA GPUs with CUDA to further increase processing speed.

The speed increase highly depends on the project type, i.e. the image number, the image size and the image content. For example, with a GTX 970 we observed a speedup between 10% and 75% for step 1. Initial Processing, and around 5% for step 2. Point Cloud and Mesh without the SGM option. As a rule of thumb, projects with high overlap, high image content and thus high number of keypoints benefit more from the speedup.

 

_______________________________________________<END

 

This confirms the idea of the need in this kind of software where very high polygon meshes are generated  that the clock speed- and therefore the single-thread perofrmance is higly important.   Hwoever, the navigation compute and  visualization can be GPU-accelerated

 

Given your current commitments in costs, and this section of the above performance notes:

 

______________________________________________

 

Graphic Card

Starting with version 1.3, Pix4Dmapper takes advantages of NVIDIA GPUs with CUDA to further increase processing speed.

The speed increase highly depends on the project type, i.e. the image number, the image size and the image content. For example, with a GTX 970 we observed a speedup between 10% and 75% for step 1. Initial Processing, and around 5% for step 2. Point Cloud and Mesh without the SGM option. As a rule of thumb, projects with high overlap, high image content and thus high number of keypoints benefit more from the speedup.

 

______________________________________________ >END

 

 The recommendation quoted earlier, "Until the cause of the issue is identified, it is recommended to choose a different CPU if possible (due to longer processing times)" is to me quite casually suggesting shelving and replacing expensive hardware ( a pair of Xeon E5-2687w v3 costs more than $4,400) to mitigate with a particular problem in one program.   and does not inspire confidence.  However, instead of changing processors, my suggestion is to consider trying an NVIDIA Maximus configuration which would pair the Quadro K5200 with a Tesla GPU coprocessor. this will add the GPU ,CUDA, and several GB's of graphics RAM in accleration intitally, buy a used Tesla C2075 which has 6GB of memory.

 

A poster on this site, Brian1965 has been working with a Maxumus configuration of a Quadro K4200 and Tesla M2090 in a z620.  In OctaneBench Results , there are three such configuations tested withe an average score of 90.12 in which a score of 100 is the averge performnace of a GTX 980.  Brian1965 points out that that is score is better than a Quadro K6000 (average is 87) and of course a K5200 is going to improve on that. That is impressive enough, but the 6GB of memory in the Tesla iseems to me a very important consideration in your use given the very large datasets in mech generation. Keep in mind that Teslas are made in two forms, one for servers with passive cooling such as the M2090 requiring custom cooling solutions and one for workstations with active cooling such as K20, C2075, and etc. 

 

These do require two PCIe slots, 225W and good cooling conditions,  but this is the kind of hardware the z840 should incorporate easily. With all the variables- epecially the program maker's mention of the mysterious low peformance of the E5-2630 v3- there are no guarantees,  but the experiment might be done for a couple of hundred dollars that could be recovered if unsuccesful or recovered and applied to a K20 or other more modern verisons. 

 

If that approach does not accelerate the work, and given that Pix4D Mapper Pro finds Xeon E5-v3 problematic, the next step might be to consider changing to Xeon E5-2600 v4  such as the E5-2637 v4 6-core @ 3.5/3.7GHz, having a Passmark single thread rating of 1883. It seems radical,  but is 1/4 the cost of changing the software.

 

The optimization of workstations today is quite difficult as using 4 or 5 programs is common and some will need high single- thread rates, others benefit from many CPU cores and still others from GPU acceleration.

 

I would be interested to know what happens.

 

Cheers,

 

BambiBoomZ

HP Recommended

A very helpful reply - many thanks. I will look into the Tesla cost.

 

So bottom line probably is the code is not written to take advanteg of Xeon V3 architecture.

HP Recommended

vps_dxb,

 

My speculation is partly based on having looked into arc/GIS a few months ago.- which has some similarities: that there are two factors at work, first that a highly complex, high polygon, large dataset 3D mesh  generating program is CPU compute intense, and these components- the Steps 1 and 2 in the program appear highly single-threaded. 

 

The software maker states that the progtram is highly parallelized, but also say that a high clock speed is also desirable. So, in the first instance, the clock speed is going to dominate over core-count.  However following the portions that are CPU-intensive, the visualization, is GPU accelerated.

 

In combination,  my thought is that fewer, faster CPU cores, in combination with a high GPU compute might be the solution- hence the idea of even very fast 4-cores in stead of moderate paced 8 cores, coupled with a lot of memory- which you have, and then a Quadro /Tesla Maximus to generate the visualizations.

 

The other asepct, the notice regarding problems with the two particular Xeon E5-2600 v3 is a bit crytic.  I don't think I've ever seen an indication from a software man'f that a series of processors encounter trouble since they are developed at staggering expense with the idea of of extremely high backwards and forward compatibility.  So, yes, as you suggest, they may well have missed whatever happened between Xeon E5-2600 v2 and v3,..

 

The combination of factors is a worry as the E5-v3 notice suggests a situation that has hidden causes and characteristics.  It sounds as though you've researched the situation and the unconventional nature suggest not jumping into expensive experiments without knowing more.  In the end, you have a highly competent GPU in the Quadro K5200 and in my view this tends to point more towards the CPU single-thread performance.  But, then there's the odd "our software doesn't like some E5-v3's" situation clouding the problem.

 

One other aspect to consider is that processing speed is a relative perception.  Do you have a way to confirm that your running time is longer than expectations?

 

Also, have you ever run a benchmark tests on your system?  If not,  I can highly recommend Passmark Performance Test which breaks the test into the component subsystems: CPU, 2D, 3D, memory, and disk and using a weighted algorithm, creates a system rating.  It's very revealing of weak points in a system.  I had below average results for the CPU's on a z620- 19675- and worked out that I should be using registered RAM and the new CPU result is 22645.  There's a free 30 day trial also.

 

 

Cheers,

 

BambiBoom

HP Recommended

Hi - yes I and a few other Pix4D Xeon users are equally puzzled as to why they still havent solved it. Unless the majority of users are i7 based and have smaller datasets.

 

Its very annoying because when I run the application on my bootcamped Mac i7, it seems to be a lot more responsive. You shouldnt expect that compared to an expensive dual Xeon set up...right?

 

I was thinking that maybe theres a cost required from Intel for a Xeon optimisation SDK thats causing this to not move forward? Is that a possibility? I am not a developer so have no idea.

 

The real world issue here is: once you start the processing sequence it can take some days to complete. If it hangs during that time, then you haqve to start again. So its a vicious circle.

 

Also - I noticed a few apps go into the 'Program Not Responding' mode when opening large files etc. Its only for a fe moments...and then it opens.

 

Is there any way the hardware components are not installed correctly ? This is a genuine HP product - so I am guessing not.

 

 I dis run Passmark yesterday but the trial version doesnt permit copying the report.

HP Recommended

vps_dxb,

 

Last evening, a search to find differences / comparisons in architecture between Xeon E5 v2, v3, and v4  did bring up an interesting aspect. 

 

What may be worth looking into further is the Pix4D recommendation to disable hyperthreading.  Using previous series CPU's many video games, a non-hyperthreading core i5 produced better results than a 6-core hyperthreading i7. Disbaling hyperthreading lowers latency on polygon posititional calculations and the faster GPU VRAM takes over more the 3D visualization processing. As the Xeon E5's have progressed, the CAS rating of the RAM continues to increase, but the actual latency due to the higher memory clock speed is lower.  The memory bandwidth of LGA2011-3 E5's are greater than v2's as well.  If Pix4D suggests disabling hyperthreading, the fault is possibly somewhere in the paraellization /memory utilization.  With a large dataset,  is the lower actual RAM latency of E5 v3's DDR4-2133 overwhelming the CPU clock speed when hyperthreading- a CPU bottleneck? This situation  with a dual CPU system using ECC registered as the one cycle delay that in a sense guarantees the thread synchronization between two CPU's could exacerbate the problem. This could be a naive assessment however.

 

What is the processor in the Mac?

 

It occurred to me that servers deal with large throughput at high rates and servers often use somewhat lower clock speed CPU's.  A interesting article on optimizing BIOS settings:

 

FUJITSU Server PRIMERGY

BIOS optimizations for Xeon E5-2600 v3 based systems

 

In the absract of the paper:

 

"Its purpose is to optimize BIOS settings according to requirements. The objectives here are to optimize PRIMERGY servers for best performance and maximum energy efficiency. In addition to optimization for maximum throughput, application scenarios are also taken into account, in which the shortest possible response time matters.

 

Of course, the settings of the systems aren't the same as for a z840, but the nature of the recommendatiosn may apply.

 

Just to eliminate it as possibilities: have you checke that hibernation is disabled and  the power scheme is set to "high performance" on your system?  It's been some time, but twice, I've have had those reset by Windows updates.

 

Does Pix4D offer a trial of their software and sample problems?

 

Cheers,

 

BambiBoom

Archived This topic has been archived. Information and links in this thread may no longer be available or relevant. If you have a question create a new topic by clicking here and select the appropriate board.
† The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the <a href="https://www8.hp.com/us/en/terms-of-use.html" class="udrlinesmall">Terms of Use</a> and <a href="/t5/custom/page/page-id/hp.rulespage" class="udrlinesmall"> Rules of Participation</a>.