HPC Storage
- High Risk data, such as those that include Personal Identifiable Information (PII) or electronic Protected Health Information (ePHI) or Controlled Unclassified Information (CUI) should NOT be stored in the HPC Environment. We recommend using the Secure Research Data Environments (SRDE) instead for this.
- The Office of Sponsored Projects (OSP) & Global Office of Information Security (GOIS) are exclusively empowered to classify the risk categories for a dataset as listed in the NYU Electronic Data and System Risk Classification Policy.
The HPC environment provides access to the file-systems listed below to better serve your needs for managing research data during all stages of the research data life cycle. Reviewing the list of available file-systems and their intended uses can help you in selecting the right file-system for your tasks. Please note that there are strict limits on the size and number of files you are allowed to have on each filesystem. To find out your current disk space & inode quota utilization refer to the section on understanding user quota limits.
User Home Directories
You have access to a home directory at /home/$USER (accessible via the environment variable $HOME) for permanently storing code and important configuration files. Home Directories provide limited storage space (50 GB) and inodes (files) capacity 30,000. You can check your quota utilization using the myquotacommand as described here. Home directories are backed up daily and old files under $HOME are not purged. Home directories are available on every cluster node (login nodes, compute nodes) and the Data Transfer Node (gDTN).
Avoid changing file and directory permissions in your home directory to allow other users to access files.
User Home Directories are not ideal for sharing files and folders with other users. HPC Scratch or Research Project Space (RPS) are better file-systems for sharing data.
inode limits- One of the common issues that users report regarding their home directories is running out of inodes (i.e. the number of files stored under their home exceeds the inode limit), which by default is set to 30,000 files
- To find out the current space and inode quota utilization and the distribution of files under your home directory, please see: Understanding user quota limits and the myquota command.
- Working with
condaenvironments: To avoid running out of inode limits in home directories, the HPC team recommends setting upcondaenvironments with Singularity overlay images as described here. Avoid creatingcondaenvironments in your$HOMEdirectory.
HPC Scratch
The HPC scratch is an all flash (VAST) file-system you can store research data needed during the analysis phase of your research projects. It provides temporary storage for datasets needed for running jobs. Your scratch directory (/scratch/$USER) has a limit of 5 TB disk quota and 5,000,000 inodes (files). The scratch file-system is available on all nodes (compute, login, etc.) on Torch as well as Data Transfer Node (gDTN). There are no backups for this file-system and files that are deleted accidentally or removed due to storage system failures cannot be recovered.
- Files on the
/scratchfile-system that have not been accessed for 60 or more days will be purged. - It is a policy violation to use scripts to change the file access time. Any user found to be violating this policy will have their HPC account locked. A second violation may result in your access to HPC being revoked.
- There are no backups of HPC Scratch file-system and you should not put important source code, scripts, libraries, executables in
/scratch. These files should instead be stored in file-systems that are backed up, such as/homeor Research Project Space (RPS). Code can also be stored in agitrepository. - Upon the completion of your research study, you are encouraged to archive your data in the HPC Archive file-system.
HPC Research Project Space
The HPC Research Project Space (RPS) provides data storage space for research projects that is easily shared amongst collaborators, backed up, and not subject to the old file purging policy. HPC RPS was introduced to ease data management in the HPC environment and eliminate the need of having to frequently copying files between Scratch and Archive file-systems by having all projects files under one area. These benefits of the HPC RPS come at a cost. The cost is determined by the allocated disk space and the number of files (inodes). For detailed information about RPS see: HPC Research Project Space
HPC Work
The HPC team makes available a number of public datasets that are commonly used in analysis jobs. The data-sets are available Read-Only under /scratch/work/public. For some of the datasets users must provide a signed usage agreement before accessing. Public datasets available on the HPC clusters can be viewed on the Datasets page.
HPC Archive
Once the Analysis stage of the research data life cycle has completed, you should compress your data before moving it onto the archive (/archive/$USER). For instance you can use the tar command to compress all your data into a single tar.gz file. The HPC Archive file-system is not accessible by running jobs; it is suitable for long-term data storage. Each user has access to a default disk quota of 2TB and is limited to 20,000 inode (files). The rather low limit on the number of inodes per user is intentional. The archive file-system is available only on login nodes of Torch. The archive file-system is backed up daily.
Here is an example tar command that combines the data in a directory named my_run_dir under $SCRATCH and outputs the tar file in the user's $ARCHIVE:
# to archive `$SCRATCH/my_run_dir`
tar cvf $ARCHIVE/simulation_01.tar -C $SCRATCH my_run_dir
NYU (Google) Drive
Google Drive (NYU Drive) is accessible from the NYU HPC environment and provides an option to users who wish to archive data or share data with external collaborators who do not have access to the NYU HPC environment.
As of December 2023, storage limits were applied to all faculty, staff, and student NYU Google accounts. Please see Google Workspace Storage for details.
There are also limits to the data transfer rate in moving to/from Google Drive. Thus, moving many small files to Google Drive is not going to be efficient. Please read the Instructions on how to use cloud storage within the NYU HPC Environment.
HPC Storage Comparison Table
| Space | Environment Variable | Purpose | Backed Up / Flushed | Quota Disk Space / # of Files |
|---|---|---|---|---|
| /home | $HOME | Personal user home space that is best for small files | YES / NO | 50 GB / 30 K |
| /scratch | $SCRATCH | Best for large files | NO / Files not accessed for 60 days | 5 TB / 5 M |
| /archive | $ARCHIVE | Long-term storage | YES / NO | 2 TB / 20 K |
| HPC Research Project Space | NA | Shared disk space for research projects | YES / NO | Payment based TB-year/inodes-year |
Please see the next page for best practices for data management on NYU HPC systems.