Skip to main content

Quickstart Guide for Sabine/Opuntia

How to log in

The only way to connect to our clusters is by secure shell (ssh), e.g. from a Linux/UNIX system:

ssh -l your_username  sabine.cacds.uh.edu
ssh -l your_username  opuntia.cacds.uh.edu

Windows users will need an SSH client installed on their machine, see e.g.  PuTTY or XShell.

Allocations

Users without project allocations cannot run jobs on Sabine or Opuntia. Users have been given a small allocions on Opuntia to continue running jobs there. For increased job allocations please refer to the allocation request. A PI (supervisor) will have to submit a project proposal for Sabine/Opuntia.

Users can check the balance for their projects using the sbank command:

sbank balance statement project <projectname>

Users have to specify the allocation upon submission of the job in the batch script, e.g. :

#!/bin/bash
### Specify job parameters
#SBATCH -J test_job # name of the job
#SBATCH -t 1:00:00 # time requested
#SBATCH -N 1 -n 2 # total number of nodes and processes

### Tell SLURM which account to charge this job to
#SBATCH -A #Allocation_AWARD_ID

or specify it when submitting an interactive job, e.g.:

srun  -A #Allocation_AWARD_ID --pty  /bin/bash   -l 

Using tmux

Using tmux on the Sabine/Opuntia cluster allows you to create interactive allocations that you can detach from. Normally, if you get an interactive allocation (e.g. srun --pty) then disconnect from the cluster, for example by putting your laptop to sleep, your allocation will be terminated and your job killed. Using tmux, you can detach gracefully and tmux will maintain your allocation. Here is how to do this correctly:

  1. ssh to Sabine/Opuntia.
  2. Start tmux.
  3. Inside your tmux session, submit an interactive job with srun.
  4. Inside your job allocation (on a compute node), start your application (e.g. matlab).
  5. Detach from tmux by typing Ctrl+b then d .
  6. Later, on the same login node, reattach by running tmux attach

Make sure to:

  • run tmux on the login node, NOT on compute nodes
  • run srun inside tmux, not the reverse.

X11 Forwarding

X11 forwarding is necessary to display editor windows (gvim, emacs, nedit, etc.) or similar on your desktop. To enable X11 forwarding, log in with the ssh -X or -Y options enabled

ssh -XY -l your_username  sabine.cacds.uh.edu
ssh -XY -l your_username opuntia.cacds.uh.edu

Windows users need an X server to handle the local display in addition to the ssh program, see  this intro (from the University of Indiana) for PuTTY users.

Transferring Data

Basic Tools

SCP (Secure CoPy): scp uses ssh for data transfer, and uses the same authentication and provides the same security as ssh. For example, copying from a local system to Sabine:

scp myfile  your_username@sabine.cacds.uh.edu:
scp myfile your_username@opuntia.cacds.uh.edu:

SFTP (Secure File Transfer Protocol): sftp is a file transfer program, similar to ftp, which performs all operations over an encrypted ssh transport. Example,  put file from local system to Sabine (this also works for Opuntia):

sftp uusername@sabine.cacds.uh.edu 
Password: 
Connected to sabine.cacds.uh.edu 

sftp> put myfile
	

For Windows users  WinSCP is a free graphical SCP and SFTP client.

Software environment

Text editors

Sabine/Opuntia have multiple editors installed including vi and nano.

Modules

Modules are a tool for users to manage the Unix environment in sabine. It is designed to simplify login scripts. A single user command,

module add module_name

can be invoked to source the appropriate environment information within the user’s current shell. Invoking the command,

module avail

will list the available packages on  Sabine/Opuntia

module rm module_name

Will remove the module from your environment

Running jobs

The Concept

A "job" refers to a program running on the compute nodes of the Sabine/Opuntia clusters. Jobs can be run on Sabine/Opuntia in two different ways:

  • A batch job allows you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to Sabine/Opuntia. The output of your program is continuously written to an output file that you can view both during and after your program runs.
  • An interactive job allows you to interact with a program by typing input, using a GUI, etc. But if your connection is interrupted, the job will abort. These are best for small, short-running jobs where you need to test out a program, or where you need to use the program's GUI.

The Code

The following shows how to run an example of a parallel program (using MPI) on Sabine/Opuntia. MPI programs are executed as one or more processes; one process is typically assigned to one physical processor core. All the processes run the exact same program, but by receiving different input they can be made to do different tasks. The most common way to differ the processes is by their rank. Together with the total number of processes, referred to as size, they form the basic method of dividing the tasks between the processes. Getting the rank of a process and the total number of processes is therefore the goal of this example. Furthermore, all MPI related instructions must be issued between MPI_Init() and MPI_Finalize(). Regular C instructions that is to be run locally for each process, e.g. some preprocessing that is equal for all processes, can be run outside the MPI context.

Below is a simple program that, when executed, will make each process print their name and rank as well as the total number of processes.

/*  Basic MPI Example - Hello World  */  
#include <stdio.h> /* printf and BUFSIZ defined there */ 
#include <stdlib.h> /* exit defined there */ 
#include "mpi.h" /* all MPI-2 functions defined there */   

int main(argc, argv) 
int argc; 
char *argv[]; 
{ 
int rank, size, length; 
char name[BUFSIZ];   

MPI_Init(&argc, &argv); 
MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
MPI_Comm_size(MPI_COMM_WORLD, &size); 
MPI_Get_processor_name(name, &length);   

printf("%s: hello world from process %d of %dn", name, rank, size);   

MPI_Finalize();   

exit(0); 
}
    • MPI_Init(); Is responsible for spawning processes and setting up the communication between them. The default communicator (collection of processes) MPI_COMM_WORLD is created.
    • MPI_Finalize(); End the MPI program.
    • MPI_Comm_rank( MPI_COMM_WORLD, &rank ); Returns the rank of the process within the communicator. The rank is used to divide tasks among the processes. The process with rank 0 might get some special task, while the rank of each process might correspond to distinct columns in a matrix, effectively partitioning the matrix between the processes.
    • MPI_Comm_size( MPI_COMM_WORLD, &size ); Returns the total number of processes within the communicator. This can be useful to e.g. know how many columns of a matrix each process will be assigned.
    • MPI_Get_processor_name( name, &length ); Is more of a curiosity than necessary in most programs; it can assure us that our MPI program is indeed running on more than one computer/node.

 

Compile & Run

Save the code in a file named helloworld.c. Load the Intel compiler and Intel MPI module files:

module load intel 	

Compile the program with the following command:

mpicc -o helloworld helloworld.c	

Make a batch job. Add the following in a file named job.sh

#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 2 -n 10
#SBATCH -A #Allocation_AWARD_ID

module load intel
mpirun ./helloworld

 

Submit the job to the queue.

sbatch job.sh
Submitted batch job 906

Note that the command sbatch returns the job ID. Note that the example runs fast. It can be finished before the status command returns a job identifier. The job identifier is used to name the output from the job together with the name of the job. The job name is given with -N option in the job.sh-script. In this example it is ‘my_mpi_job’. The standard output from the processes are logged to a log file in the working directory named my_mpi_job.o. Here is the content from on batch execution of the job.sh:

$ cat my_mpi_job.o906 

compute-2-13.local: hello world from process 9 of 10 
compute-2-12.local: hello world from process 1 of 10 
compute-2-12.local: hello world from process 3 of 10 
compute-2-12.local: hello world from process 5 of 10 
compute-2-12.local: hello world from process 6 of 10 
compute-2-12.local: hello world from process 7 of 10 
compute-2-12.local: hello world from process 8 of 10 
compute-2-12.local: hello world from process 0 of 10 
compute-2-12.local: hello world from process 2 of 10 
compute-2-12.local: hello world from process 4 of 10
		

Note that the file my_mpi_job.e contains the output to standard error from all the processes. If the processes are executed without faults, no errors are logged (the file is empty).

Batch Jobs

Below are more examples for batch jobs requesting certain resources.

Users can check the status of a job with the squeue command:

$ squeue -j <JOB_ID>

Single Whole node

#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 28  
#SBATCH -A #Allocation_AWARD_ID

module load intel  
mpirun ./helloworld
Multiple Whole nodes

This example uses 4 nodes and 28 tasks or cores per node

#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 4 --tasks-per-node=28  
#SBATCH -A #Allocation_AWARD_ID

module load intel  
mpirun ./helloworld
Single  core  job utilizing  1 GPU (If you need only a single CPU core and one GPU)
#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 1
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH -A #Allocation_AWARD_ID
 
module load cuda intel
nvidia-smi  
mpirun ./helloworld
Single  node job utilizing  1 GPU (If you need only one GPU but with multiple CPUs from same node)
#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 16
#SBATCH -A #Allocation_AWARD_ID
#SBATCH -p gpu
#SBATCH --gres=gpu:1  

module load cuda intel
nvidia-smi  
mpirun ./helloworld
Single  node utilizing  2 GPU  (if you need two gpus, along with multiple cpus all from one node)
#!/bin/bash 
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j 
#SBATCH -t 00:01:00 
#SBATCH -N 1 -n 28
#SBATCH -A #Allocation_AWARD_ID
#SBATCH -p gpu

#SBATCH --gres=gpu:2
  module load cuda intel

nvidia-smi   mpirun ./helloworld
Multiple Whole nodes job 2 GPUS per node (only on Sabine). This example uses 4 nodes and 28 tasks or cores per node
#SBATCH -J my_mpi_job 
#SBATCH -o my_mpi_job.o%j
#SBATCH -A #Allocation_AWARD_ID
#SBATCH -t 00:01:00
#SBATCH -N 4 --tasks-per-node=28
#SBATCH -p gpu

#SBATCH --gres=gpu:2  

module load cuda intel

nvidia-smi  
mpirun ./helloworld

Interactive Jobs

To open an interactive session on a compute node use the following
 srun -A  #Allocation_AWARD_ID --pty /bin/bash -l
Same as above, but requesting 1 hour of wall time  and  X11 forwarding support
 srun -A  #Allocation_AWARD_ID  -t 1:00:00 --x11=first --pty /bin/bash -l
Same as above, but requesting 28 cores or a full node on Sabine
 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -n 28 -N 1 --pty /bin/bash -l

Same as above, but requesting 20 cores or a full node on Opuntia
 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -n 20 -N 1 --pty /bin/bash -l

Requesting 20 cores or a full node and 1 gpu
on Opuntia
 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -n 20 -p gpu --gres=gpu:1 -N 1 --pty /bin/bash -l
Requesting 28 cores or a full node and 1 gpu on Sabine
 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -n 28 -p gpu --gres=gpu:1 -N 1 --pty /bin/bash -l
Requesting 28 cores or a full node and 2 gpus on Sabine
 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -n 28 -p gpu --gres=gpu:2 -N 1 --pty /bin/bash -l

Same as above, but requesting 4 nodes 20 cores per  node (on Opuntia)

 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -tasks-per-node 20 -N 4 --pty /bin/bash -l

Same as above, but requesting 4 nodes 28 cores per  node (on Sabine)

 srun  -A  #Allocation_AWARD_ID  -t 1:00:00 -tasks-per-node 28 -N 4 --pty /bin/bash -l
Requesting 28 cores per node , 2 gpus per node and 4 nodes (on Sabine)
 srun   -A  #Allocation_AWARD_ID  -t 1:00:00 --tasks-per-node=28 -p gpu --gres=gpu:2 -N 4 --pty /bin/bash -l

Tensorflow  Jobs

Tensorflow is available within Anaconda3 or Anacoda2 packages. The installed versions take advantage of gpus.

Note these python examples used here can be found in 

/project/cacds/apps/anaconda3/5.0.1/TensorFlow-Examples/examples/ 

Batch Job Examples

Single  core  job utilizing  1 GPU (If you need only a single CPU core and one GPU)

#!/bin/bash 
#SBATCH -J tensorflow_job
#SBATCH -o tensorflow_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 1
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH -A #Allocation_AWARD_ID
#SBATCH --mem=32GB

module load Anaconda3
python convolutional_network.py
 

Single node dual core  job utilizing  2 GPUs and 2 CPUs (works only on Sabine) 

#!/bin/bash 
#SBATCH -J tensorflow_job
#SBATCH -o tensorflow_job.o%j
#SBATCH -t 00:01:00
#SBATCH -N 1 -n 2
#SBATCH -p gpu
#SBATCH --gres=gpu:2
#SBATCH -A #Allocation_AWARD_ID
#SBATCH --mem=64GB

module load Anaconda3
python convolutional_network.py

GROMACS  Jobs

GROMACS is available as module on the Sabine and Opuntia cluster. The installed versions can also take advantage of gpus.

Batch GROMACS Jobs

Below are more examples for batch jobs requesting certain resources (the module names match the ones installed on Sabine - please adjust for Opuntia). Note maxh set to 4 hours to match requested walltime. 

Single Whole node
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 
#SBATCH -N 1 -n 28  
#SBATCH -A #Allocation_AWARD_ID

module add  
GROMACS/2016.5-psxe-2018-GPU-enabled
mpirun 
mdrun_mpi -ntomp 1 -v -pin on -deffnm dhfr_$SLURM_JOB_ID  -s dhfr  -maxh 4.0
Single Whole GPU node
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 --exclusive
#SBATCH -N 1 -n 4 -p gpu  --gres=gpu:tesla:2
#SBATCH -A #Allocation_AWARD_ID

module add  
GROMACS/2016.5-psxe-2018-GPU-enabled
mpirun 
mdrun_mpi_gpu -ntomp 7 -v -pin on -deffnm dhfr_$SLURM_JOB_ID -s dhfr -maxh 4.0
Multiple Whole nodes
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 
#SBATCH -N 2 -n 56  
#SBATCH -A #Allocation_AWARD_ID

module add  
GROMACS/2016.5-psxe-2018-GPU-enabled 
mpirun 
mdrun_mpi -ntomp 1 -v -pin on -deffnm dhfr_$SLURM_JOB_ID -s dhfr -maxh 4.0
Multiple Whole GPU nodes
#!/bin/bash 
#SBATCH -J my_sim_job 
#SBATCH -o my_sim_job.o%j 
#SBATCH -t 04:00:00 --exclusive
#SBATCH -N 2 -n 8 -p gpu --gres=gpu:tesla:2
#SBATCH -A #Allocation_AWARD_ID

module add 
GROMACS/2016.5-psxe-2018-GPU-enabled
mpirun 
mdrun_mpi_gpu -ntomp 7 -v -pin on -deffnm dhfr_$SLURM_JOB_ID -s dhfr -maxh 4.0