NVIDIA A800 PCIe 80 GB – Data center accelerator GPU

16.786,00 

High-performance data center accelerator GPU based on the NVIDIA Ampere architecture with 8192 CUDA cores and 512 Tensor Cores for AI, HPC and data analytics workloads. Base clock around 1065 MHz with boost up to 1410 MHz, 80 GB HBM2e memory on a 5120-bit interface delivering up to 1.94 TB/s bandwidth, 300 W max power and PCIe 4.0 x16 system interface with NVLink support and Multi-Instance GPU (MIG) partitioning.

Compare

Description

The NVIDIA A800 PCIe 80 GB is a high-performance data center accelerator GPU designed for AI, high-performance computing (HPC) and large-scale data analytics workloads. Launched in 2022 as a regional variant of the A100 to comply with export regulations, it belongs to NVIDIA’s Ampere-based Tensor Core GPU family and targets enterprise servers and cloud environments rather than consumer gaming systems. Built on advanced 7 nm process technology, the A800 PCIe 80 GB combines massive floating‑point throughput, high‑bandwidth HBM2e memory and NVLink interconnect to power demanding training and inference workloads in modern data centers [web:62][web:65][web:66][web:70][web:73][web:74][web:69].

The A800 PCIe 80 GB features 8192 FP32 CUDA cores and 512 third‑generation Tensor Cores per GPU, delivering up to 19.5 TFLOPS of FP32 performance and up to 312 TFLOPS of Tensor performance with TF32 or FP16 operations depending on the configuration [web:65][web:66][web:67][web:70]. The underlying GA100-class GPU die integrates tens of billions of transistors with a large die area optimized for parallel compute rather than graphics, and typical clocks are around 1065 MHz base and up to 1410 MHz boost in PCIe form factors, balancing performance and efficiency under sustained data center workloads [web:64][web:62][web:73].

Equipped with 80 GB of HBM2 or HBM2e memory connected via a 5120‑bit memory interface, the A800 PCIe 80 GB provides up to approximately 1.9 TB/s of memory bandwidth, enabling high throughput on memory‑bound AI and HPC applications such as large neural networks, graph analytics and scientific simulations [web:62][web:65][web:70][web:73][web:74][web:69]. ECC support on the HBM memory ensures data integrity for mission‑critical workloads, reducing the risk of silent errors in long‑running training and compute jobs [web:65][web:67][web:68]. The GPU also includes substantial on‑chip cache resources, such as tens of megabytes of L2 cache, to further improve effective bandwidth and latency for irregular data access patterns [web:62].

With a maximum power consumption around 250–300 W depending on the specific A800 PCIe 80 GB implementation, this accelerator is designed for standard full‑height, full‑length, dual‑slot PCIe server configurations and validated for use in multi‑GPU systems with up to 8 GPUs per node [web:62][web:66][web:69][web:71][web:74]. It connects to the host via PCIe Gen 4 x16, providing up to 64 GB/s of bidirectional bandwidth per card, and can also leverage NVLink interconnect links to scale multi‑GPU performance with up to 400–600 GB/s of GPU‑to‑GPU bandwidth in supported platforms [web:65][web:66][web:67][web:68][web:69][web:71].

A key feature of the NVIDIA A800 PCIe 80 GB is Multi‑Instance GPU (MIG) support, which allows a single physical GPU to be partitioned into up to seven isolated GPU instances, each with dedicated compute and memory resources of around 10 GB, enabling secure and predictable performance for multi‑tenant or mixed workloads on the same accelerator [web:65][web:66][web:68][web:69][web:70][web:74][web:76]. The card supports the full NVIDIA CUDA, cuDNN and TensorRT software stack, along with frameworks such as PyTorch and TensorFlow, making it straightforward to integrate into existing AI and data science pipelines while benefiting from enterprise‑grade drivers and management tools [web:66][web:75][web:72].

Overall, the NVIDIA A800 PCIe 80 GB is a powerful data center GPU solution for organizations that need scalable AI and HPC performance with high‑bandwidth 80 GB HBM2e memory, strong FP32 and Tensor throughput, MIG partitioning and NVLink connectivity, making it an excellent choice for training large models, accelerating inference at scale and consolidating diverse compute workloads in modern enterprise and cloud environments [web:62][web:65][web:66][web:70][web:73][web:74][web:69][web:71].

General

GPU Name GA100
Foundry TSMC
Process Size 7 nm
Die Size 826 mm²
Transistors 54.200 million
Density 65.6M / mm²
Chip Package BGA-2743

Architecture

GPU Architecture Ampere
Generation Server Ampere(Axx)
Shading Units 6912.0
TMUs 432.0
ROPs 160.0
SM Count 108.0
CUDA 8.0
Tensor Cores 432.0

Frequency

GPU Base Clock 1065 MHz
GPU Boost Clock 1410 MHz
Memory Clock 1512 MHz 3 Gbps effective

Memory

Memory Size 80 GB
Memory Type HBM2e
Memory Bus 5120 bit
Bandwidth 1.94 TB/s

Cache

L1 Cache GPU 192 KB (per SM)
L2 Cache GPU 80 MB

Performance

Pixel Rate 225.6 GPixel/s
Texture Rate 609.1 GTexel/s
BF16 311.84 TFLOPS (16:1)
TF32 155.92 TFLOPs (8:1)
FP64 Tensor 19.49 TFLOPS (1:1)

APIs & Compatibility

OpenCL 3.0
NVENC No Support
NVDEC 4th Gen x5

Physical & Power

PCIe Interface PCIe 4.0 x16
Slot Width Dual-slot
Length 267 mm 10.5 inches
Height 111 mm 4.4 inches
TDP GPU 250 W
Suggested PSU 600 W
Power Connectors 8-pin EPS
Outputs No outputs