whisperx 的使用音频处理 - 查问我看

转 AI和Python 学习整理

|-转 whisperx 的使用音频处理

PHPer 2025-08-21 403 0 0

ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
ValueError：请求的 float16 计算类型，但目标设备或后端不支持高效的 float16 计算。

我看到代码里面要调用cuda，于是按照我电脑上cuda的版本12.9找了命令安装

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129

结果，出了各种问题。下面是报错和代码。之后再网上查

whisperx要安装特点的cuda版本和对应特定的

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

WhisperX在Windows 11下的CUDA环境配置指南

(speaker-splitter) D:\python\speaker-splitter>
(speaker-splitter) D:\python\speaker-splitter>python w.py
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\ctranslate2\__init__.py:8: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\pyannote\audio\core\io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
  torchaudio.list_audio_backends()
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\speechbrain\utils\torch_audio_backend.py:57: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
  available_backends = torchaudio.list_audio_backends()
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
config.json: 2.80kB [00:00, 2.80MB/s]
vocabulary.txt: 460kB [00:00, 1.96MB/s]
tokenizer.json: 2.20MB [00:00, 5.96MB/s]
model.bin: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [04:09<00:00, 12.4MB/s]
Traceback (most recent call last):
  File "D:\python\speaker-splitter\w.py", line 16, in 
  File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\__init__.py", line 21, in load_model
    return asr.load_model(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\asr.py", line 336, in load_model
    model = model or WhisperModel(whisper_arch,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\faster_whisper\transcribe.py", line 663, in __init__
    self.model = ctranslate2.models.Whisper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

代码入下

import whisperx
import gc
import os
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
os.environ["HUGGINGFACE_HUB_CACHE"] = r"D:\python\speaker-splitter\models"  # 可选，不影响核心功能
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"  # 国内镜像
os.environ["HUGGINGFACE_HUB_TIMEOUT"] = "60"
os.environ["HUGGINGFACE_HUB_DOWNLOAD_TIMEOUT"] = "60"
device = "cuda"
audio_file = "input.wav"
batch_size = 16 # reduce if low on GPU mem 16改为 8
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
# save model to local path (optional)
# model_dir = "/path/"
# model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir)
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment
# delete model if low on GPU resources
# import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model
# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"]) # after alignment
# delete model if low on GPU resources
# import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model_a
# 3. Assign speaker labels
diarize_model = whisperx.diarize.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
# add min/max number of speakers if known
diarize_segments = diarize_model(audio)
# diarize_model(audio, min_speakers=min_speakers, max_speakers=max_speakers)
result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs

20250821 03:17 ...

浏览更多内容请先登录。 立即注册

AI, Python

更新于：2025-08-21 03:43:06

您需要登录后才可以评论。立即注册

转 AI和Python 学习整理

|-转 whisperx 的使用 音频处理

7

1221

130w+

228

服务器搭建

WEB

个人爱好

游戏

linux

互联网

操作系统

mysql

Python

Yii2

php

WEB后端

网站建设

采集

WEB前端

Centos

经济

工具

生活

内容整理

数据库

资源

OS

电影

JS

常用命令

保险

php项目

问题整理

IT

网站

魔兽世界

composer

NodeJs

观点

AI

欧美电影

Yii扩展

美女

学习

LAMP

全文索引

Apache

前端

发现

Windows

Android

影评

服务器维护

国产电影

uwow

PHP框架

邮件服务器

评测

随笔

服务器

音乐

历史

推荐内容

|-转 whisperx 的使用音频处理