Skip to main content

Run a Hugging Face model

Here we provide an example of how one can run a Hugging Face Large-language model (LLM) on the NYU Greene cluster

Prepare environment

Create project directory

After logging on to a Greene login node, make a directory for this project:

[NetID@log-1 ~]$ mkdir -p /scratch/NetID/llm_example
[NetID@log-1 ~]$ cd /scratch/NetID/llm_example
note

You'll need to replace NetID above with your NetID

Move to a compute node

Some of the following steps can require significant resources, so we'll move to a compute node. This way we won't overload the login node we're on.

[NetID@log-1 llm_example]$ srun --cpus-per-task=2 --mem=10GB --time=04:00:00 --pty /bin/bash

Copy appropriate overlay file to the project directory

[NetID@cm001 llm_example]$ cp -rp /scratch/work/public/overlay-fs-ext3/overlay-50G-10M.ext3.gz .
[NetID@cm001 llm_example]$ gunzip overlay-50G-10M.ext3.gz

Launch Singularity container in read/write mode

[NetID@cm001 llm_example]$ singularity exec --overlay overlay-50G-10M.ext3:rw /scratch/work/public/singularity/cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2.sif /bin/bash

Install miniconda in the container

Singularity> wget --no-check-certificate https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
Singularity> bash Miniforge3-Linux-x86_64.sh -b -p /ext3/miniforge3

Create environment script

Use an editor like nano or vim to create the file /ext3/env.sh. The contents should be:

#!/bin/bash

unset -f which

source /ext3/miniforge3/etc/profile.d/conda.sh
export PATH=/ext3/miniforge3/bin:$PATH
export PYTHONPATH=/ext3/miniforge3/bin:$PATH

Activate the environment

Singularity> source /ext3/env.sh

Install packages in environment

Singularity> conda config --remove channels defaults
Singularity> conda update -n base conda -y
Singularity> conda clean --all --yes
Singularity> conda install pip -y
Singularity> pip install torch numpy transformers

Exit from Singularity and the compute node

Singularity> exit
[NetID@cm001 llm_example]$ exit
tip

You can find more information about using Singularity and Conda on our HPC systems in our documentation Singularity with Conda.

Prepare script

Create a python script using the following code from sections 1-9 and save it in a file called huggingface.py:

  1. Import necessary modules:

    import torch
    import numpy as np
    from transformers import AutoTokenizer, AutoModel
  2. Create a list of reviews:

    texts = ["How do I get a replacement Medicare card?",
    "What is the monthly premium for Medicare Part B?",
    "How do I terminate my Medicare Part B (medical insurance)?",
    "How do I sign up for Medicare?",
    "Can I sign up for Medicare Part B if I am working and have health insurance through an employer?",
    "How do I sign up for Medicare Part B if I already have Part A?"]
  3. Choose the model name from huggingface’s model hub and instantiate the model and tokenizer object for the given model. We are setting output_hidden_states as True as we want the output of the model to not only have loss, but also the embeddings for the sentences.

    model_name = 'cardiffnlp/twitter-roberta-base-sentiment'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
  4. Create the ids to be used in the model using the tokenizer object. We set the return_tensors as “pt” as we want to return the pytorch tensor of the ids:

    ids = tokenizer(texts, padding=True, return_tensors="pt")
  5. Set the device to cuda, and move the model and the tokenizer to cuda as well. Since, we will be extracting embeddings, we will only be performing a forward pass of the model and hence we will set the model to validation mode using eval():

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    ids = ids.to(device)
    model.eval()
  6. Performing the forward pass and storing the output tuple in out:

    with torch.no_grad():
    out = model(**ids)
  7. Extracting the embeddings of each review from the last layer:

    last_hidden_states = out.last_hidden_state	
  8. For the purpose of classification, we are extracting the CLS token which is the first embedding in the embedding list for each review:

    sentence_embedding = last_hidden_states[:, 0, :]
  9. We can check the shape of the final sentence embeddings for all the reviews. The output should look like torch.Size([6, 768]), where 6 is the batch size as we input 6 reviews as shown in step 2b, and 768 is the embedding size of the RoBERTa model used.

    print("Shape of the batch embedding: {}".format(sentence_embedding.shape))

Prepare Sbatch file

After saving the above code in a script called huggingface.py, create a file called run.SBATCH with the the following code:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --mem=64GB
#SBATCH --gres=gpu
#SBATCH --job-name=huggingface
#SBATCH --output=huggingface.out

module purge

if [ -e /dev/nvidia0 ]; then nv="--nv"; fi

singularity exec $nv \
--overlay /scratch/NetID/llm_example/overlay-50G-10M.ext3:rw \
/scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \
/bin/bash -c "source /ext3/env.sh; python /scratch/NetID/llm_example/huggingface.py"
note

You'll need to change NetID in the script above to your NetID. If you're using a different directory name and/or path you'll also need to update that in the script above.

Run the run.SBATCH file

[NetID@log-1 llm_example]$ sbatch run.SBATCH

The output can be found in huggingface.out It should be something like:

Some weights of RobertaModel were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment and are
newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Shape of the batch embedding: torch.Size([6, 768])

Acknowledgements

Instructions are developed and provided by Laiba Mehnaz, a member of AIfSR