Notes on Pytorch
记录学习Pytorch的过程.
配置PyTorch
Setting up and Configuring CUDA, CUDNN and PYTorch for Python Machine Learning.: 解释了CUDA, cuDNN, 安装步骤详细
Why
torch.cuda.is_available()
returns False even after
installing pytorch with cuda?:
解释了nvidia-smi
输出的内容, 如CUDA
Version是GPU最高支持的CUDA版本
Getting
your NVIDIA Virtual GPU Software Version: The NVIDIA Virtual GPU
Manager version appears in the first line of text after the date,
immediately after the text NVIDIA-SMI
Windows
安装CUDA
确认GPU支持CUDA
安装CUDA Toolkit
确认安装版本:
nvcc -V
nvidia-smi
安装PyTorch
使用官方的安装方法, 可能出现下载失败的情况
1
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
为避免出现PyTorch下载速度慢的问题, 在Anaconda Cloud上直接下载相应安装包并安装
1
conda install --use-local pytorch-1.7.1-py3.8_cuda110_cudnn8_0.tar.bz2
确认安装版本
1
2import torch
print(torch.__version__)
Ubuntu on Windows
First Try
2022年春季学期, 给学校的3090装.
CUDA
Toolkit 11.7 Downloads这里可以找到详细的安装命令,
安装后nvcc -V
显示命令无效,
要求sudo apt install nvidia-cuda-toolkit
安装,
安好以后发现是10.1
版本的, 然后找一下切换版本的办法.
How to change
CUDA
version中的sudo update-alternatives --display cuda
可以明确找到安装的11.7
版本的路径.
由Multiple CUDA
versions on machine nvcc -V confusion,
sudo vim ~/.bashrc
后在末尾加上如下三行即可:
1 | export CUDA_HOME="/usr/local/cuda-11.7" |
发现使用11.3
安装的Pytorch检测不到11.7
,
那么先sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
把它卸了,
见How to remove
cuda completely from ubuntu?.
出了一些问题要重头安装,
发现sudo apt-get -y install cuda
报错you have held broken packages
,
见Problem while installing
cuda toolkit in ubuntu 18.04, 命令如下:
1 | sudo apt clean |
1 | sudo apt autoremove |
1 | sudo apt install -y cuda |
WSL2- $nvidia-smi command not running
问题为nvidia-smi
报错
1 | - Failed to initialize NVML: GPU access blocked by the operating system |
尝试了rellik的回复,
nvidia-smi.exe
能正常输出结果,
但这个结果应该是Windows主系统上的.
破案了, 原因是Windows没更新, 见Why does nvidia-smi return "GPU access blocked by the operating system" in WSL2 under Windows 10 21H2 [closed].
PyTorch not recognizing GPU on WSL - installed cudnn and cuda #73487
- conda环境用pip安装包的方法:
/anaconda/envs/venv_name/bin/pip install package_name
, 见Using Pip to install packages to Anaconda Environment. - 查看torch环境方法:
python -m torch.utils.collect_env
, 见Quick Tips #1: How to obtain environment information using PyTorch
Trouble
installing torch with CUDA using conda on
WSL2提到安装1.8.1+cu111
版本的Pytorch,
新建conda环境后安装发现有效, 此时环境输出为:
1 | Collecting environment information... |
Second Try
2022年暑假, 给笔记本RTX 2070-Max装.
Using PyTorch with CUDA on WSL2 - Christian Mills: 指出了WSL2的GPU使用过程中的现存问题
Enable NVIDIA CUDA on WSL 2 - Windows - Microsoft Docs: 包含大体流程
CUDA on WSL User Guide - NVIDIA Documentation Center: 官方链接, 包含主要步骤
具体为Windows上安装2.1. Step 1: Install NVIDIA Driver for GPU Support, WSL2上安装3. CUDA Support for WSL 2, 随后进行一些检查 (这些命令在安装前都是无法运行的, 即使Windows上曾安装过)
1 | (base) carlos@LAPTOP-00000000:~/Downloads$ nvcc --version |
1 | (base) carlos@LAPTOP-00000000:~/Downloads$ nvidia-smi |
发现nvcc --version
没有输出, 参考Nvcc
–version returns nothing despite correct install,
vim ~/.bashrc
后在末尾加入
1 | export CUDA_HOME=/usr/local/cuda-11.7 |
之后source ~/.bashrc
重载, 再检查一下
1 | (base) carlos@LAPTOP-00000000:~/Downloads$ nvcc -V |
先更换一下镜像
- Anaconda 镜像使用帮助 - 清华大学开源软件镜像站
- Pypi 镜像使用帮助 - 清华大学开源软件镜像站
运行conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
安装.
使用python -m torch.utils.collect_env
查看环境
1 | Collecting environment information... |
随后分别使用conda install -c conda-forge jupyterlab
和,
均与Windows操作一致.pip install pettingzoo
安装包
Fix First Try
2022年秋, 又用回3090了, 试试能不能把环境改到最新的cuda和pytorch版本.
Nivida-smi shows different mesages under wsl and windows
使用GPU
使用如下函数输出GPU信息并返回, 见How to check if pytorch is using the GPU?.
1 | def get_device(): |
对模型和输入调用.to(device)
, 见Porting
PyTorch code from CPU to GPU.
使用多GPU, 见How to use multiple GPUs in pytorch?.
使用watch -n 2 nvidia-smi
查看所有GPU的使用情况, 见How to check if
pytorch is using the GPU?
安装多版本CUDA
安装时遇到报错cuda you already have a newer version of the nvidia frameview sdk installed
,
依次卸载以下软件后可以继续安装:
- PhysX
- NVIDIA GeForce Experience
- NVIDIA FrameView SDK
参见WIndows 10 CUDA installation failure solved, CUDA installation problem.
In-Place Operation
1
2
3
4
5
6
7
8
9
10
11
12 1) x = torch.rand(
1) y = torch.rand(
x
tensor([0.2738])
id(x)
140736259305336
# Normal operation x = x + y
id(x)
140726604827672 # New location
x += y
id(x)
140726604827672 # Existing location used (in-place)– DeepaliDeepali Patel, What is
in-place operation
?
Which is faster?
.expand().clone()
or
.repeat()
?
Keep in mind though that if you plan on changing this expanded tensor inplace, you will need to use
.clone()
on it before so that it actually is a full tensor (with memory for each element). But even.expand().clone()
should be faster than.repeat()
I think.– albanD, Torch.repeat and torch.expand which to use?
.unsqueeze(dim=1).expand(-1, 2).clone().view(-1)
or .repeat_interleave(2)
1 | a = torch.arange(3) # tensor([0, 1, 2]) |