GeForce RTX 2080 Ti in HP Z840 workstation

Fcis · ‎07-23-2020

I have an HP Z840 workstation and I wonder what model of GeForce RTX 2080 Ti to have as I am upgrading my GPU to one of these but I am not sure if they will work and fit in my workstation, so could you please advise?
- Gigabyte Aorus GeForce RTX 2080 Ti Xtreme 11G
- Gigabyte GeForce Aorus RTX 2080 Ti
- Gigabyte GeForce RTX 2080 Ti Gaming OC

BambiBoomZ · ‎07-23-2020

Fcis,

Checking Passmark baselines, there are no z840's using the RTX 2080 Ti, but there are systems using the RX 2080 Super and , RTX 2070 Super. The GPU with the highest 3D mark is a Quadro P6000 = 14432 (2X Xeon E5-2632 v4), second is Titan Xp = 14432. The RTX 2080 Super is 4th = 13118 . The Passmark average 3D for the RTX 2080 Super is 19224. The z840 3D scores are relative modest given the typically lower CPU clock speeds of high core count, dual CPU systems. For comparison, the z620 office system, running 8-cores @ 4.3GHz with a GTX 1070 Ti has a Passmark 3D Mark of 12629, only about 4% under the RTX 2080 Super running in a system with 2X 12-core 3.5 Turbo boost CPUs. Probably, that 24-core dual CPU system is for e high data throughput with CUDA-core computational boost whereas the z620 has a 3D CAD modeling performance (single-thread) priority.

Given the space and power requirements, the RTX 2080 Ti should work. Consider having the latest BIOS update.

As to the selection, in a workstation such as HP z-series, strongly consider a blower-style GPU. Gaming GPU's use multiple fans on an open GPU enclosure that vents the heated air into the case. Workstations cases are configured to be as quiet as practicable. The air flow is lower than a gaming case that may have 3X 120mm or 140mm fans on the open front of the case and possibly a 360mm liquid cooling radiator on the top.

The secondary office system is a z420 running 6-cores @ 4.3Ghz and the dual fan, open enclosure GTX 1060 6GB (I think) adds about 8-10C to the CPU temperature. The 8-core in the z620 at the same clock speed, liquid cooler, and a blower GPU (MSI Aero) runs 8-10C cooler than the z420 having the same CPU liquid cooler.

RTX 2080 Ti blower style GPU's:

ASUS GeForce RTX 2080 Ti 11G Turbo Edition GDDR6 HDMI DP 1.4 Type-C Graphics Card

PNY GeForce RTX 2080 Ti 11GB Blower Graphics Card

ZOTAC GAMING GeForce RTX 2080 Ti Blower Graphic Card, ZT-T20810A-10P, 11GB GDDR6

PNY GeForce RTX 2080 Ti 11GB Blower 352-Bit GDDR6 PCI Express 3.0 SLI Support Video Card

PNY GeForce RTX 2080 Ti 11GB Blower GDDR6 VCG2080T11BLMPB Video Graphics Card GPU

GIGABYTE RTX 2080 TI TURBO GV-N208TTURBO-11GC Video Card

Be aware that NVIDIA Ampere GPU's are about two months or so away and performance is expected to be substantially better for less cost. As well, at the high end, used RTX 20XX-series are likely to fall in price, both for clearance and as used.. The plan for the z620 here is to buy whatever is the successor to the Quadro RTX 4000 (average 3D= 15754).

What are the z840's specifications and what are the typical uses?

BambiBoomZ

HP z620_2 (2017) (R7) > Xeon E5-1680 v2 (8C@ 4.3GHz) / z420 Liquid Cooling / 64GB (HP/Samsung 8X 8GB DDR3-1866 ECC registered) / Quadro P2000 5GB _ GTX 1070 Ti 8GB / HP Z Turbo Drive M.2 256GB AHCI + Samsung 970 EVO M.2 NVMe 500GB + HGST 7K6000 4TB + HP/HGST Enterprise 6TB / Focusrite Scarlett 2i4 sound interface + 2X Mackie MR824 / 825W PSU / Windows 7 Prof.’l 64-bit (HP OEM) > 2X Dell Ultrasharp U2715H (2560 X 1440)

[ Passmark Rating = 6280 / CPU rating = 17178 / 2D = 819 / 3D= 12629 / Mem = 3002 / Disk = 13751 / Single Thread Mark = 2368 [10.23.18]

HP z420_3: (2015) (R11) Xeon E5-1650 v2 (6C@ 4.3GHz) / z420 Liquid cooling / 32GB (HP/Samsung 4X 8GB DDR3-1866 ECC registered) / NVIDIA GeForce GTX 1060 6GB/ Samsung 860 EVO 500GB + HGST 4TB / ASUS Essence STX + Logitech z2300 2.1 / 600W PSU > Windows 7 Professional 64-bit (HP OEM ) > Samsung 40" 4K

[Passmark System Rating: = 5644 / CPU = 15293 / 2D = 847 / 3D = 10953 / Mem = 2997 Disk = 4858 /Single Thread Mark = 2384 [6.27.19]

Fcis · ‎07-23-2020

Thanks for the explanation. Kindly find below a screenshot of the specs of the machine, currently it has GeForce GTX 970 installed but I wanted to upgrade since I am working with image processing, 3D data modelling,...etc. So what would you suggest the best GPU to upgrade to have the most efficient and best performance without waste?

Screen Shot 2020-07-23 at 12.30.07 PM.png

BambiBoomZ · ‎07-23-2020

Fcis,

Sorry, the screen shot of the system spec is not visible.

IT's important to know the CPU. In general, the CPU single-thread performance is the most important component when there's 3D modeling as can be seen in the examples of high core count /low clock speed systems having below average 3D performance on high end GPU's. Some graphics programs and certainly rendering can use CPU cores . In my view CPU rendering is noticeably better than GPU-based.

With GPU's, there is always the question of consumer /gaming vs. workstation cards. If there are any programs using viewports, strong OpenGL performance required, or if 10-bit color is desriable, then a Quadro or Firepro is advisable, In the case of Solidworks, a Quadro is mandatory for professional use. My next GPU will be an 8GB Quadro or possibly whatever new AMD GPU goes into their imminent new professional line.

What programs are you using and how large/ complex are the 3D models? Thinking of a CPU upgrade? Is there 4K in your future?

BambiBoomZ

Fcis · ‎07-23-2020

Kindly find below the specs in text:

Computer Processor Intel® Xeon(R) CPU E5-2687W v3 @ 3.10GHz × 40
Memory 131924MB (3345MB used)

Operating System Ubuntu 16.04.6 LTS

I use it for EM programs like Relion, cryosparc, IMOD,... so I would appreciate it if you could advise me with the highest/best GPU I can get with the current specs. Thank you for your help

BambiBoomZ · ‎07-23-2020

Fcis,

A cursory survey of the programs mentioned for the subject system suggests that the most demanding in terms of hardware specification is cryoSPARC:

____________________________________________________________

https: cryosparc.com_docs_reference_install

Optimal Setup Suggestions Disks & compressions

Fast disks are a necessity for processing cryo-EM data efficiently. Fast sequential read/write throughput is needed during preprocessing stages where the volume of data is very large (10s of TB) while the amount of computation is relatively low (sequential processing for motion correction, CTF estimation, particle picking etc.)
Typically users use spinning disk arrays (RAID) to store large raw data files, and often cluster file systems are used for larger systems. As a rule of thumb, to saturate a 4-GPU machine during preprocessing, sustained sequential read of 1000MB/s is required.
Compression can greatly reduce the amount of data stored in movie files, and also greatly speeds up preprocessing because decompression is actually faster than reading uncompressed data straight from disk. Typically, counting-mode movie files are stored in LZW compressed TIFF format without gain correction, so that the gain reference file is stored separately and must be applied on-the-fly during process (which is supported by cryoSPARC). Compressing gain corrected movies can often result in much worse compression ratios than compressing pre-gain corrected (integer count) data.
cryoSPARC supports LZW compressed TIFF format and BZ2 compressed MRC format natively. In either case the gain reference must be supplied as an MRC file. Both TIFF and BZ2 compression are implemented as multicore decompression streams on-the-fly.

SSDs

For classification, refinement, and reconstruction jobs that deal with particles, having local SSDs on worker nodes can significantly speed up computation, as many algorithms rely on random-access patterns and multiple passes though the data, rather than sequentially reading the data once.
SSDs of 1TB+ are recommended to be able to store the largest particle stacks.
SSD caching can be turned off if desired, for the job types that use it.
cryoSPARC manages the SSD cache on each worker node transparently - files are cached, re-used across jobs in the same project, and deleted if more space is needed.
For more information on using SSDs in cryoSPARC, see SSD Caching in cryoSPARC

GPUs, CPUs, RAM

Only NVIDIA GPUs are supported, compute capability 3.5+ is required
The GPU RAM in each GPU limits the maximum box size allowed in several processing types
- Typically, a 12GB GPU can handle a box size up to 700^3
Older GPUs can often perform almost equally as well as the newest, fastest GPUs because most computations in cryoSPARC are not bottlenecked by GPU compute speed, but rather by memory bandwidth and disk IO speed. Many of our benchmarks are done on NVIDIA Tesla K40s which are now (2018) almost 5 years old.
Multiple CPUs are needed per GPU in each worker system, at least 2 CPUs per GPU, though more is better.
System RAM of 32GB per GPU in a system is recommended. Faster RAM (DDR4) can often speed up processing.+

___________________________________________________

Of course, the hardware demands will vary according to the file sizes and processing depth, and the above guide is referring to a server, or an optimal node, but the above optimization guide in general is specifying a system of extremely high computational capabilities; notice the mention of "at least 2 CPUs per GPU", which means core count and CUDA core count are more important than clock speed.

The key comment with regards to the GPU is, "Older GPUs can often perform almost equally as well as the newest, fastest GPUs because most computations in cryoSPARC are not bottlenecked by GPU compute speed, but rather by memory bandwidth and disk IO speed. Many of our benchmarks are done on NVIDIA Tesla K40s which are now (2018) almost 5 years old."

Given the emphasis on core count, I/O bandwidth and CUDA compution, having a high double precision, it appears that the minimum GPU should be at least an 8GB Quadro and with consideration for a future GPU co-processor. A consumer / gaming card is not advisable in high-load, analytical modeling.

Consider:

1. Quadro RTX 5000 16GB

2. Quadro RTX 4000 8GB

3. Quadro M6000 24GB

4. Quadro P5000 16GB

Of course, much depends on the budget, but my first choice is an RTX 5000, which used would cost about + $500 to a new, high specfication RTX 2080 Ti. An RTX 5000 would be a high-performing, long-term purchase that will accommodate increased workloads and system expansion. Quadro cards are built to an extremely high standard and buying used is very low risk. I have two Quadro FX580's from 2004 that work perfectly well; very good for basic setup and for server monitors. In a single processor configuration, the RTX 5000-depending on the the project scaling, might be capable without an additional GPU computation card.

The alternate would be to start with the RTX 4000 and if the workload and time considerations demand it, add a second RTX 4000. Confirm that these programs can recognize and utilize a seconf GPU; it appears that it's common practice.

Given the note above concerning data bottleneck in drives and memory, ",..because most computations in cryoSPARC are not bottlenecked by GPU compute speed, but rather by memory bandwidth and disk IO speed", I'd add to consider a fast M.2 OS/programs drive and possibly a second as a cache drive.

Let us know what happens- it's an interesting project.

BambiBoomZ

Fcis · ‎07-23-2020

Thanks a lot for your help and assistance.

- Do you think Quadro RTX 5000 is better or Quadro P5000 16GB?

- In regards to memory, would you recommend adding another 128GB to have 256GB in total?

- If yes to the previous, I currently have 8x16GB installed, and I am thinking to order 2x64GB CoreParts - DDR4 - 64 GB - DIMM 288-pin - 2133 MHz / PC4-17000 - 1.2 V - registered - ECC It should work along with the 8x16GB right?

- And if I modified the machine with this GPU and memory, then the CPU wouldn't limit them?

- Is it worth it to upgrade this machine with the mentioned GPU, memory, M.2 drives,.... or better buy a new one with similar specs?

Thanks again for your help, I really appreciate it, I've liked/voted-up your replies 🙂

BambiBoomZ · ‎07-23-2020

Fcis,

You're very welcome.

In my view, the newer RTX 5000 has significant advantages. The RTX 5000 benefits from 3.072 CUDA cores to the P5000's 2,560, GDDR6 memory to the P5000's GDDR5X, 11.2 TFLOPS to 8.9 in the P5000, and the RTX tensor cores and RT cores are machine learning capable. In Passmark benchmarks, the RTX 5000 is ranked No. 12 with an average 3D= 17288, the next rating behind GTX 1080 Ti at 17384- that's a very good league to be in for a workstation card. The P5000 is No. 49 at 12040. The RTX uses more power than the P5000 by quite margin. Both are very good, but in my view, the RTX is a better long term investment and could extend it's usefulness to another system in three or four years.

The Quadro P5000 spec:

CUDA CoresPeak Single Precision FP32 PerformanceGPU MemoryMemory BandwidthMemory InterfaceSystem InterfaceDisplay ConnectorsMaximum DVI-D DL ResolutionHDMI SupportStereo ConnectorThermal ManagementMax Power Consumption Max Digital Resolution(2x DP Links)

2560

8.9 TFLOPS

16 GB GDDR5X ECC

432 GB/Sec

256-Bit

PCI Express 3.0 x16

DisplayPort 1.4 (4) + DVI-D DL

2560 x 1600 at 60Hz

Optional Accessory

Ultra-Quiet Fansink

180 W
NVIDIA Quadro Power Guidelines

5120 x 2880 x24 bpp at 60hz
(7680 x 4320 x24 bpp at 60hz)

The Quadro RTX 5000 spec:

RTX 5000

CUDA Parallel-Processing Cores	3,072
NVIDIA Tensor Cores	384
NVIDIA RT Cores	48
GPU Memory	16 GB GDDR6
RTX-OPS	62T
Rays Cast	8 Giga Rays/Sec
FP32 Performance	11.2 TFLOPS
Max Power Consumption	265 W
Graphics Bus	PCI Express 3.0 x 16
Form Factor	4.4" (H) x 10.5" (L) Dual Sl

As for other options; according to cryoSPARK, there can never be enough CPU cores, memory, and disk speed. As CryoSPARC is so well-threaded and most flavors of Linux are said to work well in very high bandwidth memory and disk functions, it may work to add the RTX, memory, M.2, evaluate the performance- I recommend testing the current system in Passark Performance test comparing it to other z840's. Also, run a real project in the most demaning software and compare to the current perofrmance. If resullts are substandard, consider adding the 2nd E5-2687 v3.

It is of course possible to start over with a higher specification system, but , unless it's a new system with full control over the specification, the replacement system is bound to need something as well. It's possible to build the perfect system- I'd estimate it at about $6-$7K but I'm very fond of proprietary workstations such as the HP z-series as they are so very well developed, nicely built, reliable, and quiet.

However, it's also worth considering that the Xeon E5-26XX v3 series is about to fall another generation into the past while newer generation processors have more cores at higher clock speeds, less power use- and heat- and prices are dropping. The Xeon E5-1680 v2 8-core @ 3.4 /3.9GHz in the office z620 cost, $1,732 new, whereas a new Ryzen 9 3950X is 16 cores at 3.5/4.7GHz for -at the moment- about $700. TDP rating is 105W. The 3950X CPU score = 39301 and Single Thread Mark of 2748. For the E5-1680 v2- CPU= 12365 and STM= 2054. The Xeon E5-2687W v3 10C @ 3.1 / 3.5GHz : CPU= 14441 and STM = 1970. TDP is 160W. New, the E5-2687W v3 cost $2,141. Current cost is about $450. - -It's a complex equation!

BambiBoomZ

Fcis · ‎07-24-2020

Thanks again. What about this question: I currently have 8x16GB installed, and I am thinking to order 2x64GB CoreParts - DDR4 - 64 GB - DIMM 288-pin - 2133 MHz / PC4-17000 - 1.2 V - registered - ECC It should work along with the 8x16GB right?

BambiBoomZ · ‎07-24-2020

Fcis,

Occupying every RAM slot will work and from the cryoSPARK notes, the more the merrier. Low CAS latency and tight timings are preferable.

Experience with quite a few workstations since 2007-including 6X HP, has demonstrated that it's very important to use exactly matching HP-labelled RAM. The RAM in the z840 should be ECC registered allowing a second processor to be added. If necessary, buy used RAM having the same HP part number label on it. HP spends untold amounts on memory testing, verification, and probably binning. In general workstation and especially server memory is very specific down to the exact model of the server. With long-duration, high-load projects; don't take chances.

Both the office z420 and z620 have 64GB of DDR3-1866 ECC registered- used server RAM with the HP Part number stickers and the other HP systems- two other z420's and another z620 since 2014 -that about 60-70 4GB and 8GB modules switched in and out have had absolutely no problems with the HP-labelled RAM. I tried Lenovo-labelled but only as it was the identical model of Samsung to the HP and that did work. Still in the future- HP only.

See PM

BambiBoomZ

Create an account on the HP Community to personalize your profile and ask a question