AI和Python 学习整理

whisperx 的使用 音频处理

PHPer 2025-08-21 1 0 0

ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

ValueError:请求的 float16 计算类型,但目标设备或后端不支持高效的 float16 计算。

我看到代码里面要调用cuda,于是按照我电脑上cuda的版本12.9找了命令安装

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129

结果,出了各种问题。 下面是报错和代码。之后再网上查

whisperx要安装特点的cuda版本和对应特定的

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

WhisperX在Windows 11下的CUDA环境配置指南

(speaker-splitter) D:\python\speaker-splitter>
(speaker-splitter) D:\python\speaker-splitter>python w.py
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\ctranslate2\__init__.py:8: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\pyannote\audio\core\io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
  torchaudio.list_audio_backends()
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\speechbrain\utils\torch_audio_backend.py:57: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
  available_backends = torchaudio.list_audio_backends()
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
config.json: 2.80kB [00:00, 2.80MB/s]
vocabulary.txt: 460kB [00:00, 1.96MB/s]
tokenizer.json: 2.20MB [00:00, 5.96MB/s]
model.bin: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [04:09<00:00, 12.4MB/s]
Traceback (most recent call last):
  File "D:\python\speaker-splitter\w.py", line 16, in 
  File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\__init__.py", line 21, in load_model
    return asr.load_model(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\asr.py", line 336, in load_model
    model = model or WhisperModel(whisper_arch,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\faster_whisper\transcribe.py", line 663, in __init__
    self.model = ctranslate2.models.Whisper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

11

代码入下

import whisperx
import gc
import os
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
os.environ["HUGGINGFACE_HUB_CACHE"] = r"D:\python\speaker-splitter\models"  # 可选,不影响核心功能
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"  # 国内镜像
os.environ["HUGGINGFACE_HUB_TIMEOUT"] = "60"
os.environ["HUGGINGFACE_HUB_DOWNLOAD_TIMEOUT"] = "60"
device = "cuda"
audio_file = "input.wav"
batch_size = 16 # reduce if low on GPU mem 16改为 8
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
# save model to local path (optional)
# model_dir = "/path/"
# model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir)
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment
# delete model if low on GPU resources
# import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model
# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"]) # after alignment
# delete model if low on GPU resources
# import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model_a
# 3. Assign speaker labels
diarize_model = whisperx.diarize.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
# add min/max number of speakers if known
diarize_segments = diarize_model(audio)
# diarize_model(audio, min_speakers=min_speakers, max_speakers=max_speakers)
result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs

20250821 03:17 ...

立即注册

更新于:2025-08-21 03:43:06
    您需要登录后才可以评论。 立即注册
    相关内容

    Yii中DataProvider的使用

    【PHP】COOKIE和SESSION的使用以及区别

    软件使用总结

    2018年必须要吐槽下迅雷,开了迅雷网页打开很慢

    mysqli的基本使用

    PHP Trait 使用指南

    推荐内容

    怎样使用V2Ray代理和SSTap玩如魔兽世界/绝地求生/LOL台服/战地3/黑色沙漠/彩...

    sstap游戏代理教程 从此玩如魔兽世界/绝地求生/LOL台服/战地3/黑色沙漠/彩虹六...

    BT磁力搜索网站汇总和找不到的资源

    什么是磁力链接,您如何使用?

    Z-Library:全球最大的数字图书馆/含打不开的解决方案/镜像

    使用V2Ray的mKCP协议加速游戏

    v2rayN已停止工作

    【车险课堂】什么是无赔款优待系数ncd,你“造”吗?