|-转 whisperx 的使用 音频处理
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
ValueError:请求的 float16 计算类型,但目标设备或后端不支持高效的 float16 计算。
我看到代码里面要调用cuda,于是按照我电脑上cuda的版本12.9找了命令安装
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129
结果,出了各种问题。 下面是报错和代码。之后再网上查
whisperx要安装特点的cuda版本和对应特定的
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
WhisperX在Windows 11下的CUDA环境配置指南
(speaker-splitter) D:\python\speaker-splitter>
(speaker-splitter) D:\python\speaker-splitter>python w.py
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\ctranslate2\__init__.py:8: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\pyannote\audio\core\io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
torchaudio.list_audio_backends()
G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\speechbrain\utils\torch_audio_backend.py:57: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
available_backends = torchaudio.list_audio_backends()
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
config.json: 2.80kB [00:00, 2.80MB/s]
vocabulary.txt: 460kB [00:00, 1.96MB/s]
tokenizer.json: 2.20MB [00:00, 5.96MB/s]
model.bin: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [04:09<00:00, 12.4MB/s]
Traceback (most recent call last):
File "D:\python\speaker-splitter\w.py", line 16, in
File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\__init__.py", line 21, in load_model
return asr.load_model(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\asr.py", line 336, in load_model
model = model or WhisperModel(whisper_arch,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\faster_whisper\transcribe.py", line 663, in __init__
self.model = ctranslate2.models.Whisper(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
11
代码入下
import whisperx
import gc
import os
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
os.environ["HUGGINGFACE_HUB_CACHE"] = r"D:\python\speaker-splitter\models" # 可选,不影响核心功能
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" # 国内镜像
os.environ["HUGGINGFACE_HUB_TIMEOUT"] = "60"
os.environ["HUGGINGFACE_HUB_DOWNLOAD_TIMEOUT"] = "60"
device = "cuda"
audio_file = "input.wav"
batch_size = 16 # reduce if low on GPU mem 16改为 8
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
# save model to local path (optional)
# model_dir = "/path/"
# model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir)
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment
# delete model if low on GPU resources
# import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model
# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"]) # after alignment
# delete model if low on GPU resources
# import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model_a
# 3. Assign speaker labels
diarize_model = whisperx.diarize.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
# add min/max number of speakers if known
diarize_segments = diarize_model(audio)
# diarize_model(audio, min_speakers=min_speakers, max_speakers=max_speakers)
result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs
20250821 03:17 ...
浏览更多内容请先登录。
立即注册
更新于:2025-08-21 03:43:06
相关内容
PHP Trait 使用指南
软件使用总结
2018年必须要吐槽下迅雷,开了迅雷网页打开很慢
Yii中DataProvider的使用
mysqli的基本使用
【PHP】COOKIE和SESSION的使用以及区别
推荐内容