Whisper|Quick Start

AI小白，由于最近有个英语/西班牙语录音转记的需求，因此选择 Whisper 开源模型，记录一下折腾的过程

本地部署的硬件环境：

References

部署所需要的软件环境如下：

OS：Win11

Pytorch 2.2.2 CUDA 11.6

Python 3.9

Ffmpeg

Git

下载最新的 ffmpeg release 追加环境变量

需要根据对应的 Python 版本选择对应的 Pytorch 版本 Previous PyTorch Versions | PyTorch

这里用最新的2.3会报错不知道为什么，已经保证了对应 Python 版本是3.8+(3.9)，但是运行 Whisper 的时候还是报错：

Error loading “\lib\site-packages\torch\lib\shm.dll” or one of its dependencies

pip 卸载 2.3 版本的 Pytorch 后重新安装 2.2.2 的成功运行

1 2	`pip install git+https://github.com/openai/whisper.git pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git`

Whisper支持的模型如下：

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

默认是 small，这里显存还够用就选用 medium 模型来跑，亲测15分钟左右的英文录音转写大约花费2-4分钟

1	`whisper audio.mp3 --model medium`

#Whisper #OpenAI

Whisper|Quick Start

http://example.com/2024/05/09/Whisper-Quick-Start/

作者

Noctis64

发布于

2024年5月9日

许可协议