Run a Hugging Face model
Here we provide an example of how one can run a Hugging Face Large-language model (LLM) on the NYU Greene cluster
Prepare environment
Create project directory
After logging on to a Greene login node, make a directory for this project:
[NetID@log-1 ~]$ mkdir -p /scratch/NetID/llm_example
[NetID@log-1 ~]$ cd /scratch/NetID/llm_example
You'll need to replace NetID above with your NetID
Move to a compute node
Some of the following steps can require significant resources, so we'll move to a compute node. This way we won't overload the login node we're on.
[NetID@log-1 llm_example]$ srun --cpus-per-task=2 --mem=10GB --time=04:00:00 --pty /bin/bash
Copy appropriate overlay file to the project directory
[NetID@cm001 llm_example]$ cp -rp /scratch/work/public/overlay-fs-ext3/overlay-50G-10M.ext3.gz .
[NetID@cm001 llm_example]$ gunzip overlay-50G-10M.ext3.gz
Launch Singularity container in read/write mode
[NetID@cm001 llm_example]$ singularity exec --overlay overlay-50G-10M.ext3:rw /scratch/work/public/singularity/cuda12.1.1-cudnn8.9.0-devel-ubuntu22.04.2.sif /bin/bash
Install miniconda in the container
Singularity> wget --no-check-certificate https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
Singularity> bash Miniforge3-Linux-x86_64.sh -b -p /ext3/miniforge3
Create environment script
Use an editor like nano or vim to create the file /ext3/env.sh
. The contents should be:
#!/bin/bash
unset -f which
source /ext3/miniforge3/etc/profile.d/conda.sh
export PATH=/ext3/miniforge3/bin:$PATH
export PYTHONPATH=/ext3/miniforge3/bin:$PATH
Activate the environment
Singularity> source /ext3/env.sh
Install packages in environment
Singularity> conda config --remove channels defaults
Singularity> conda update -n base conda -y
Singularity> conda clean --all --yes
Singularity> conda install pip -y
Singularity> pip install torch numpy transformers
Exit from Singularity and the compute node
Singularity> exit
[NetID@cm001 llm_example]$ exit
You can find more information about using Singularity and Conda on our HPC systems in our documentation Singularity with Conda.
Prepare script
Create a python script using the following code from sections 1-9 and save it in a file called huggingface.py
:
-
Import necessary modules:
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel -
Create a list of reviews:
texts = ["How do I get a replacement Medicare card?",
"What is the monthly premium for Medicare Part B?",
"How do I terminate my Medicare Part B (medical insurance)?",
"How do I sign up for Medicare?",
"Can I sign up for Medicare Part B if I am working and have health insurance through an employer?",
"How do I sign up for Medicare Part B if I already have Part A?"] -
Choose the model name from huggingface’s model hub and instantiate the model and tokenizer object for the given model. We are setting
output_hidden_states
asTrue
as we want the output of the model to not only have loss, but also the embeddings for the sentences.model_name = 'cardiffnlp/twitter-roberta-base-sentiment'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_hidden_states=True) -
Create the ids to be used in the model using the tokenizer object. We set the return_tensors as “pt” as we want to return the pytorch tensor of the ids:
ids = tokenizer(texts, padding=True, return_tensors="pt")
-
Set the device to cuda, and move the model and the tokenizer to cuda as well. Since, we will be extracting embeddings, we will only be performing a forward pass of the model and hence we will set the model to validation mode using
eval()
:device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)
ids = ids.to(device)
model.eval() -
Performing the forward pass and storing the output tuple in out:
with torch.no_grad():
out = model(**ids) -
Extracting the embeddings of each review from the last layer:
last_hidden_states = out.last_hidden_state
-
For the purpose of classification, we are extracting the CLS token which is the first embedding in the embedding list for each review:
sentence_embedding = last_hidden_states[:, 0, :]
-
We can check the shape of the final sentence embeddings for all the reviews. The output should look like
torch.Size([6, 768])
, where 6 is the batch size as we input 6 reviews as shown in step2b
, and 768 is the embedding size of the RoBERTa model used.print("Shape of the batch embedding: {}".format(sentence_embedding.shape))
Prepare Sbatch file
After saving the above code in a script called huggingface.py
, create a file called run.SBATCH
with the the following code:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --mem=64GB
#SBATCH --gres=gpu
#SBATCH --job-name=huggingface
#SBATCH --output=huggingface.out
module purge
if [ -e /dev/nvidia0 ]; then nv="--nv"; fi
singularity exec $nv \
--overlay /scratch/NetID/llm_example/overlay-50G-10M.ext3:rw \
/scratch/work/public/singularity/cuda11.2.2-cudnn8-devel-ubuntu20.04.sif \
/bin/bash -c "source /ext3/env.sh; python /scratch/NetID/llm_example/huggingface.py"
You'll need to change NetID
in the script above to your NetID.
If you're using a different directory name and/or path you'll also need to update that in the script above.
Run the run.SBATCH file
[NetID@log-1 llm_example]$ sbatch run.SBATCH
The output can be found in huggingface.out
It should be something like:
Some weights of RobertaModel were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment and are
newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Shape of the batch embedding: torch.Size([6, 768])
Acknowledgements
Instructions are developed and provided by Laiba Mehnaz, a member of AIfSR