Investing in Discovery

Overview
    • The DISCOVERY Cluster is an exciting opportunity for researchers to participate in creating a world-class super computer devoted to furthering research at Dartmouth.
    • Researchers considering their own purchase of a Linux cluster are invited to consider the advantages of joining the cooperative Discovery cluster.
  • Discovery has been running since the fall of 2005 and currently contains over 3000 cores for jobs. Currently, 32 nodes (512 cores) in the cluster are connected with a high-speed interconnect (infiniband) which will improve parallel performance.
Benefits of participating in the Discovery cluster
    • Access to a 3000+  core cluster.
    • Use of Infiniband connected nodes for improved parallel job performance.
    • Full systems support, allows users to focus on their research without having to worry about running a system.
    • We will install your software, help get your code running and make sure the system continues to meet the needs of its user base.
    • Access to more resources than purchased. Users may utilize additional resources if these are not in use.
    • The cluster’s administrator ensures that your home directory data is backed-up, that the system is secure, and has the updates and system changes that are needed to meet your research needs.
  • All stakeholders of the Discovery cluster can help determine the direction of Discovery via the Discovery User group.
The Discovery support team provides:
    • General user support and training on cluster utilization.
    • Systems administration support as well as software installation.
    • Programming, debugging, parallelization, and optimization support.
    • Electrical power, (including uninterrupted power supply), machine room space, and cooling.
    • Network capacity
    • System software, clustering software, setup, configuration, and some standard compilers and research applications (e.g.,. Matlab, Fortran 90/95).
    • Prompt repairs.
  • Local, temporary storage for use during computational runs.
How to join the Discovery cluster
  • Discovery is a cooperative effort and all users on the cluster are encouraged to purchase enough nodes to satisfy their normal needs.
  • Each node has 16 cores and 128 GBs of memory, 10G interconnects, and a 5-year warranty.
  • Currently, a node purchase is a one-time cost of approximately $4,900 which currently contains 16 cores, 128G RAM (8Gb RAM/core), and a 1Tb SSD hard drive, which covers the 5-year life of the node.
Running jobs and scheduling
    • If there are available resources, a user can run jobs on up to 4 times more cores than they purchased.
    • In general, there are always available nodes on the cluster but when the cluster is fully utilized there may be times when jobs need to wait in the queue.
    • You may contact us in advance of any deadlines so that we can ensure resources are available to run your jobs during high-use periods.
    • Scheduling priority is based on the number of nodes purchased.
    • Users who purchase more nodes will be able to run more jobs and their queued jobs will get to the run queue faster.
    • Users are allowed to log in to nodes directly to check on the status of their jobs but all jobs need to be submitted through the queue.
  • Slurm and Torque are used to manage jobs on the cluster and users are able to check the status of their jobs.
Software and Operating Environment
    • Home directories are on an NFS server that all the nodes can see.
    • Discovery is run as a production cluster at the Etna Data Center (EDC) which will reduce downtimes due to power interruptions.
    • Currently, a 64-bit version of CentOS 6.6 is the base O/S on the nodes.
    • Portland Group, Intel, and GNU compilers for C, C++, and Fortran are installed on the system as well as multiple versions of MPI.
    • Java, Perl, Python as well as other open-source programs are installed.
  • Additional software will be installed upon request.
Frequently Asked Questions
  1. How do I get help or ask questions about Discovery?
    1. For help or to ask questions send an email to: research.computing@dartmouth.edu
  2. I’m on a deadline, how do I make sure my jobs run through quickly?
    1. We will work hard to accommodate users who have deadlines so the sooner we know about an expected high-demand period the better able we will be to meet your needs.
    2. With 1 week’s notice we will typically be able to provide a user with many more resources than they purchased however if many users all have deadlines at the same time all users will be allocated the number of nodes they purchased.
  3. My group has a special software package that we alone are allowed to run. Will other people be able to access this package when it’s installed on Discovery?
    1. No, other users will not be able to access your software. We will install licensed software that you own and will set up the system so that only users in your group will be able to use this software.