Where does the content of the terminal (standard out and standard error) go for a scheduled job (non-interactive)?
After the job starts, two files are created in the directory you submitted the jobs form. Those files are named, STDIN.o<job-id> and STDIN.e<job-id>
- The .e file will contain errors that the job generates (STD ERR).
- The .o file contains that output of the job along with prologue and epilogue information (STD OUT).
- STDIN is the name of the job since the qsub (mksub for DartFS) received the commands from STanDard INput.
- The prologue shows requested resources and the epilogue shows received resources.
How much memory do my jobs get? How do I assign more memory for a job?
Each ‘core’ comes with 4GB RAM (cells E-K) and 8GB RAM on cell M.
In you PBS script, you can specify the number of nodes and cores your job will require:
#PBS -l nodes=1:ppn=4
In this example, my job will be assign one node and 4 cores => 4x4GB=16GB RAM if it runs on cells E-K and 4x8GB=32GB on cell M.
Note that even you job does not need more than 1 core, but need more RAM, you must request the appropriate number of cores which might remain unused.
What are some of the available scheduler commands?
- qsub (mksub for DartFS) ps_script_filename — submit job
- myjobs [-rn] — view job(s) status
- qshow [-r] — view queue status
- pbsmon — view nodes & status
- checkjob [-v] jobID — view job(s) status
- qr — view your resources
- qdel jobID — remove job
- notify — notify near run end
What is an example of a PBS script?
#!/bin/bash -l # declare a name for this job #PBS -N myFirstJob # request the queue (enter the possible names, if omitted, default is the default) # if more then 600 jobs use the largeq #PBS -q default # request 1 core on 1 node # ensure you reserve enough cores for the projected memory usage # figuring 4GB/core #PBS -l nodes=1:ppn=1 # request 4 hours and 30 minutes of wall time #PBS -l walltime=04:30:00 # mail is sent to you when the job begins and when it exits or aborts # you can use all or some or none. If you don't want email leave this # and the following (#PBS -M) out of the script. #PBS -m bea # specify your email address #PBS -M John.Smith@dartmouth.edu # By default, PBS scripts execute in your home directory, not the # directory from which they were submitted. The following line # places you in the directory from which the job was submitted. cd $PBS_O_WORKDIR # run the program ./program_name arg1 arg2 ...
How do I test my script/code? How do I estimate the walltime? I do I run interactive jobs?
important, as it’ll determine your place in the queue and avoid bugs
use one of the 3 test nodes: x01, x02, x03
use “tnodeload” to chose the least busy test node
$ ssh x01
time your job:
$ time yourExecutableScript -p param1 -q param2
do not forget to ‘exit’ your test node SSH session
How do I run interactive jobs?
We only recommend to run interactive jobs on Discovery on the test nodes: x01, x02, and x03
DO NOT RUN INTERACTIVE JOBS ON THE DISCOVERY HEADNODE!!!
You can SSH into any node to measure it’s load, but do not run jobs on them directly, as it will interfere with the scheduler.
For interactive jobs, please use Andes and Polaris, which are shared resources meant to be used interactively.
What do the ‘qsub’ mean? Where can I find ‘qsub’ documentation?
or through this link: qsub online documentation
How do I connect to HPC/Andes/Polaris?
See TDX KB article
How to launch research software applications (Matlab, Stata, R) on Andes and Polaris high-performance linux machines (HPC’s)?
See TDX KB article
How can I access Discovery?
You can access Discovery by creating an account at:
with your Dartmouth NetID and password.
What Version of Python Should I Use on the HPC Systems?
There are multiple versions of python available on the HPC systems. There is the system version /usr/bin/python. This is an older version of python (v2.7), it does not have any additional python packages installed with it and it does not use any high performance libraries. If you are doing a large amount of data processing or scientific computing, we recommend that you use the Anaconda distribution of Python.
There are two versions of pythons installed on the HPC systems that we recommend if you want a standard Anaconda environment and do not need to install any other Python packages. The modules are called python/anaconda3 and python/anaconda2 and you can load by typing the command module load anaconda3 or module load anaconda2.
If you need to install additional Python packages that are not part of the base Anaconda python environment, please create your own conda environment as described here: http://dartgo.org/conda-python
Please contact Research Computing if you need help installing your own conda environment.
How to share an .ipynb (Jupyter/iPython/Colab) notebook via Dartmouth Google Drive
See KB article here: https://services.dartmouth.edu/TDClient/KB/ArticleDet?ID=74078