A Step-by-Step Guide to install scGPT
Docker container update
I made a docker container of scgpt and now it is super easy to setup! Simply run the following instructions:
docker pull xueerchen/scgpt:0.1.7
docker run --gpus all --rm -it xueerchen/scgpt:0.1.7 bash
What is scGPT?
In the rapidly evolving field of single-cell omics data analysis, a new player, scGPT, has emerged, showing promising capabilities in managing and interpreting complex single-cell data. As detailed in a recent preprint, scGPT is a generative pretrained transformer model that has been pretrained on over 33 million human cells to learn representations of cells and genes. Once pretrained, scGPT can be fine-tuned on smaller datasets to perform diverse downstream tasks such as cell type annotation, perturbation response prediction, batch correction, multi-omic data integration, and gene regulatory network inference. Impressively, scGPT has shown state-of-the-art performance on these tasks compared to previous methods.
However, to harness the full potential of scGPT, it’s crucial to understand how to install it correctly. While the GitHub repository simply suggests using pip install scgpt
for installation, the reality could be a bit more nuanced. You might encounter various issues during the process that the official documentation doesn’t cover. Therefore, in this guide, we aim to provide a comprehensive walkthrough of the installation process, preparing you for any potential hiccups you might face. Our goal is to ensure that you can fully leverage this powerful tool without being stalled by technical roadblocks.
Why Mamba?
Before diving into the installation steps, it’s essential to understand why we’re using Mamba, a fast and reliable package manager, as opposed to pip:
- Mamba uses a parallel solver, which makes it faster than pip and conda.
- It manages environments and package installations more efficiently, minimizing the chances of package conflicts.
- Lastly but most importantly, mamba can resolve dependencies quickly and accurately, especially when installing complex packages like
cuda-toolkit
.
Installing scGPT: A Step-by-Step Guide
Step 0: GPU required
Keep in mind that using scGPT requires an NVIDIA GPU in your computer, along with an up-to-date GPU driver. For reference, my installation and testing were performed with Driver Version 515.65.01
. To check driver version, simply open a terminal and type nvidia-smi
and driver version can be found on the top of the output table.
Step 1: Install Mamba
Follow the guidelines provided in the official Mamba documentation for a fresh and hassle-free installation.
Step 2: Create a New Mamba Environment
Once Mamba is installed, initiate a new environment specifically for scGPT:
mamba create -n scgpt python=3.10
Step 3: Activate the Environment
Activate the new environment with the following command:
mamba activate scgpt
Step 4: Attempt to Install scGPT
Start the scGPT installation process using pip:
pip install scgpt
Step 5: Install PyTorch 1.13
You might encounter an error indicating that PyTorch is not installed. As stated in scGPT’s to-do list, the package does not yet support PyTorch 2.0. Therefore, we need to install PyTorch 1.13:
pip install torch==1.13
Step 6: Attempt to Install scGPT Again
With PyTorch installed, try installing scGPT again:
pip install scgpt
Step 7: Install nvcc 11.7
If the installation fails again with a build flash_attn failed
error, it indicates that nvcc
was not found. To run scGPT, you need an nvcc
version compatible with the PyTorch compiled version. But how do you determine which CUDA version your PyTorch is compiled with? It’s simple - in Python, import torch and use torch.version.cuda
to check the CUDA version.
>>> import torch
>>> print(torch.version.cuda)
11.7
If PyTorch is compiled with CUDA 11.7, then you will need nvcc
11.7. Here’s how to do it:
mamba install -y -c "nvidia/label/cuda-11.7.0" cuda-toolkit
Step 8: Attempt to Install scGPT Again
With nvcc 11.7 installed, try installing scGPT once more:
pip install scgpt
If you encounter a “No such file or directory ‘R’” error message, proceed to the next step.
Step 9: Install R
To resolve the aforementioned error, install R using the following commands:
sudo apt update
sudo apt install r-base
Step 10: Attempt to Install scGPT Again
Now, you can try installing scGPT again:
pip install scgpt
Step 11: Update GCC
Even after successful installation, you might face an ImportError related to GLIBCXX_3.4.29
when trying to import scGPT. This issue can be resolved by updating GCC:
sudo apt-get update
sudo apt-get upgrade gcc
Step 12: Import scGPT
Finally, open a Python shell and import scGPT:
>>> import scgpt
If you see “Global seed set to 0”, it means scGPT has been successfully installed.
By following this guide, you should have successfully navigated through the common installation issues and are now ready to start exploring your single-cell omics data with the power of scGPT!