4. Execution on the ThetaGPU supercomputer (within a Jupyter notebook)#

In this tutorial we are going to learn how to run an interactive Jupyter notebook on the ThetaGPU supercomputer at the ALCF using Ray. ThetaGPU is a 3.9 petaflops system based on NVIDIA DGX A100.

After logging in Theta:

  1. From thetaloginX, start an interactive job (note which thetagpuXX node you get placed onto will vary) by replacing your $PROJECT_NAME and $QUEUE_NAME (e.g. of available queues are full-node and single-gpu):

(thetalogin5) $ qsub -I -A $PROJECT_NAME -n 1 -q $QUEUE_NAME -t 60
Job routed to queue "full-node".
Wait for job 10003623 to start...
Opening interactive session to thetagpu21
  1. Wait for the interactive session to start. Then, from the ThetaGPU compute node (thetagpuXX), execute the following commands to initialize your DeepHyper environment (adapt to your needs):

$ . /etc/profile
$ module load conda/2021-09-22
$ conda activate $CONDA_ENV_PATH
  1. Then, start the Jupyter notebook server:

$ jupyter notebook &

Note

In the case of a multi-GPUs node, it is possible that the Jupyter notebook process will lock one of the available GPUs. Therefore, launch the notebook with the following command instead:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 jupyter notebook &
  1. Take note of the hostname of the current compute node (e.g. thetagpuXX):

echo $HOSTNAME
  1. Leave the interactive session running and open a new terminal window on your local machine.

  2. In the new terminal window, execute the SSH command to link the local port to the ThetaGPU compute node after replacing with you $USERNAME and corresponding thetagpuXX:

$ ssh -tt -L 8888:localhost:8888 $USERNAME@theta.alcf.anl.gov "ssh -L 8888:localhost:8888 thetagpuXX"
  1. Open the Jupyter URL (http:localhost:8888/?token=….) in a local browser. This URL was printed out at step 4.