|-转 whisperx 的使用 音频处理
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
ValueError:请求的 float16 计算类型,但目标设备或后端不支持高效的 float16 计算。
我看到代码里面要调用cuda,于是按照我电脑上cuda的版本12.9找了命令安装
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129
结果,出了各种问题。 下面是报错和代码。之后再网上查
whisperx要安装特点的cuda版本和对应特定的
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
WhisperX在Windows 11下的CUDA环境配置指南
(speaker-splitter) D:\python\speaker-splitter> (speaker-splitter) D:\python\speaker-splitter>python w.py G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\ctranslate2\__init__.py:8: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\pyannote\audio\core\io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. torchaudio.list_audio_backends() G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\speechbrain\utils\torch_audio_backend.py:57: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. available_backends = torchaudio.list_audio_backends() Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` config.json: 2.80kB [00:00, 2.80MB/s] vocabulary.txt: 460kB [00:00, 1.96MB/s] tokenizer.json: 2.20MB [00:00, 5.96MB/s] model.bin: 100%|██████████████████████████████████████████████████████████████████| 3.09G/3.09G [04:09<00:00, 12.4MB/s] Traceback (most recent call last): File "D:\python\speaker-splitter\w.py", line 16, in File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\__init__.py", line 21, in load_model return asr.load_model(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\whisperx\asr.py", line 336, in load_model model = model or WhisperModel(whisper_arch, ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\ProgramData\miniconda3\envs\speaker-splitter\Lib\site-packages\faster_whisper\transcribe.py", line 663, in __init__ self.model = ctranslate2.models.Whisper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
11
代码入下
import whisperx import gc import os import warnings warnings.filterwarnings("ignore", category=UserWarning) os.environ["HUGGINGFACE_HUB_CACHE"] = r"D:\python\speaker-splitter\models" # 可选,不影响核心功能 os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" # 国内镜像 os.environ["HUGGINGFACE_HUB_TIMEOUT"] = "60" os.environ["HUGGINGFACE_HUB_DOWNLOAD_TIMEOUT"] = "60" device = "cuda" audio_file = "input.wav" batch_size = 16 # reduce if low on GPU mem 16改为 8 compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy) # 1. Transcribe with original whisper (batched) model = whisperx.load_model("large-v2", device, compute_type=compute_type) # save model to local path (optional) # model_dir = "/path/" # model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir) audio = whisperx.load_audio(audio_file) result = model.transcribe(audio, batch_size=batch_size) print(result["segments"]) # before alignment # delete model if low on GPU resources # import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model # 2. Align whisper output model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device) result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False) print(result["segments"]) # after alignment # delete model if low on GPU resources # import gc; import torch; gc.collect(); torch.cuda.empty_cache(); del model_a # 3. Assign speaker labels diarize_model = whisperx.diarize.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device) # add min/max number of speakers if known diarize_segments = diarize_model(audio) # diarize_model(audio, min_speakers=min_speakers, max_speakers=max_speakers) result = whisperx.assign_word_speakers(diarize_segments, result) print(diarize_segments) print(result["segments"]) # segments are now assigned speaker IDs
20250821 03:17 ...
浏览更多内容请先登录。
立即注册
更新于:2025-08-21 03:43:06
相关内容
Yii中DataProvider的使用
【PHP】COOKIE和SESSION的使用以及区别
软件使用总结
2018年必须要吐槽下迅雷,开了迅雷网页打开很慢
mysqli的基本使用
PHP Trait 使用指南
推荐内容