Discovery Cluster Details

Discovery is a Linux cluster that in aggregate contains 128 nodes, 6712 CPU cores, 54.7TB of memory, and more than 2.8 PB of disk space.

Node Hardware Breakdown

Cell	Vendor	CPU	Cores	Ram	GPU	Scratch	Nodes
a	Dell	AMD EPYC 75F3 (2.95GHz)	64	1TB	2 A100	5.9TB	a01-a05
p	Dell	Intel Xeon Gold 6248 (2.50GHz)	40	565GB	4 Tesla V100	1.5TB	p01-p04
q	HPE	AMD EPYC 7532 (2.4GHz)	64	512GB	None	820GB	q01-q10
r	EXXACT	AMD EPYC 7543 (2.80GHz)	64	512GB	None	290GB	r01-r21
s	Dell	AMD EPYC 7543 (2.80GHz)	64	512GB	None	718GB	s01-s44
t	Lenovo	ThinkSystem SR645 V3	64	768GB	None	719GB	t01-t10
centurion	EXXACT	AMD EPYC 7453 (2.7GHz)	56	506GB	8 A5500	7TB	centurion01-centurion09
amp	EXXACT	Intel Xeon Gold 6258R (2.70GHz)	56	506GB	10 A5000	7TB	amp01-amp06
adanova01	EXXACT	AMD EPYC 7513 (2.60GHz)	64	2TB	4 l40s	7TB	adanova01
h200	Penguin	AMD EPYC 9334 32-Core	64	1.5TB	2 H200	3.5TB	h200a-h200b
l40sx8	Penguin	INTEL XEON GOLD 6548Y+	64	1.5TB	8 L40sx	3.5TB	l40sx
l40sx4	Penguin	INTEL XEON GOLD 6548Y+	64	1TB	4 L40sx	3.5TB	l40sx

Discovery offers researchers the ability to have specialized heads nodes available inside the cluster for dedicated compute. These nodes can come equipped with up to 64 compute cores and 1.5TB of memory

Operating System :

RHEL 8 is used on Discovery, its supporting head-nodes and compute nodes.

GPU compute nodes are available to the free members of discovery using the gpuqqueue. Including other specialized GPU partitions such as:

gpuq – High-end compute nodes with A100 GPUs in MIG with 40GB slices (free members)
a100 – High-end GPU nodes with A100 GPUs – Paid tier
v100 – High-end GPU nodes with v100 GPUs – Paid tier /(preemptable)
v100_preemptable – High-end compute nodes with v100 GPUs (preemptable)
a5000 – Mid-range GPU nodes optimized for general GPU workloads (preemptable)
a5500 – Mid-range GPU nodes optimized for general GPU workloads (preemptable)
adanova01 – High-end GPU nodes optimized for general GPU workloads (preemptable)
h200 - High-end GPU node optimized for general GPU workloads **
h200_preemptable - High-end GPU nodes optimized for General GPU Workloads (preemptable)

Partitions marked (preemptable) run the risk of preemption. Use with caution and always make sure you are check pointing if possible!

Interactive nodes, named andes | polaris, are available for testing and debugging your programs interactively before submitting them to the main cluster through the scheduler.

Node Interconnects

All of the compute nodes are connected via 10GB Ethernet. The cluster itself is connected to Dartmouth’s Science DMZ; facilitating faster data transfer and stronger security

Discovery Overview

Discovery Tutorials