The last version of whisper (v20240930) doesn't seem to be supported ('NoneType' object has no attribute 'shape') #212

mfucci · 2024-10-01T12:19:21Z

When I installed it (from git or with pip), it crashed with:

  File "/Users/mfucci/miniconda3/lib/python3.11/site-packages/whisper_timestamped-1.15.4-py3.11.egg/whisper_timestamped/transcribe.py", line 777, in hook_attention_weights
    if w.shape[-2] > 1:
       ^^^^^^^
AttributeError: 'NoneType' object has no attribute 'shape'

I had to downgrade whisper library to get it to work:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git@v20231117

The text was updated successfully, but these errors were encountered:

Jeronymous · 2024-10-01T14:20:13Z

Wait, what is your version of whisper-timestamped ?
I remember such a bug was fixed quite some time ago

Jeronymous · 2024-10-01T14:21:58Z

OK my bad : your version is visible (1.15.4) and it's latest.

OK indeed, if a new version of openai-whisper was released, whisper-timestamped probably need to adapt

villesau · 2024-10-02T14:26:49Z

Facing the same issue with: https://huggingface.co/openai/whisper-large-v3

whisper_timestamped.load_model("openai/whisper-large-v3", device="cuda")

E: Apparently that appears also when using gibberish model name: whisper_timestamped.load_model("lasdfdsafdsa", device="cuda")

E: Confused it would be the model that does not work but it was the whisper package version. Pinning to openai-whisper==20240927 in requirements.txt helped! I believe this prevents using the new turbo model though.

Alptimus · 2024-10-02T20:28:12Z

I had to downgrade whisper library to get it to work: pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git@v20231117

❤️

neonwatty · 2024-10-03T02:36:39Z

openai-whisper==20240927

With this pinned version of openai-whisper I can use the new turbo large v3 model.

Running on mac hardware.

neonwatty · 2024-10-03T03:21:12Z

Here's the traceback I received when attempting to use the new turbo model and the openai-whisper==20240930 version - starting from whisper_timestamped's call to openai's whisper.

Here's the release comparison of 20240930 vs 20240927.

The issue looks to be in the decoder - where serious pruning was performed for turbo.

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper_timestamped/transcribe.py:888, in _transcribe_timestamped_efficient(model, audio, remove_punctuation_from_words, compute_word_confidence, include_punctuation_in_confidence, refine_whisper_precision_nframes, alignment_heads, plot_word_alignment, word_alignement_most_top_layers, detect_disfluencies, trust_whisper_timestamps, use_timestamps_for_alignment, **whisper_options)
    885     if compute_word_confidence or no_speech_threshold is not None:
    886         all_hooks.append(model.decoder.ln.register_forward_hook(hook_output_logits))
--> 888     transcription = model.transcribe(audio, **whisper_options)
    890 finally:
    891 
    892     # Remove hooks
    893     for hook in all_hooks:

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/transcribe.py:279, in transcribe(model, audio, verbose, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, condition_on_previous_text, initial_prompt, word_timestamps, prepend_punctuations, append_punctuations, clip_timestamps, hallucination_silence_threshold, **decode_options)
    276 mel_segment = pad_or_trim(mel_segment, N_FRAMES).to(model.device).to(dtype)
    278 decode_options["prompt"] = all_tokens[prompt_reset_since:]
--> 279 result: DecodingResult = decode_with_fallback(mel_segment)
    280 tokens = torch.tensor(result.tokens)
    282 if no_speech_threshold is not None:
    283     # no voice activity check

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/transcribe.py:195, in transcribe.<locals>.decode_with_fallback(segment)
    192     kwargs.pop("best_of", None)
    194 options = DecodingOptions(**kwargs, temperature=t)
--> 195 decode_result = model.decode(segment, options)
    197 needs_fallback = False
    198 if (
    199     compression_ratio_threshold is not None
    200     and decode_result.compression_ratio > compression_ratio_threshold
    201 ):

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/decoding.py:824, in decode(model, mel, options, **kwargs)
    821 if kwargs:
    822     options = replace(options, **kwargs)
--> 824 result = DecodingTask(model, options).run(mel)
    826 return result[0] if single else result

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/decoding.py:737, in DecodingTask.run(self, mel)
    734 tokens = tokens.repeat_interleave(self.n_group, dim=0).to(audio_features.device)
    736 # call the main sampling loop
--> 737 tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
    739 # reshape the tensors to have (n_audio, n_group) as the first two dimensions
    740 audio_features = audio_features[:: self.n_group]

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/decoding.py:687, in DecodingTask._main_loop(self, audio_features, tokens)
    685 try:
    686     for i in range(self.sample_len):
--> 687         logits = self.inference.logits(tokens, audio_features)
    689         if (
    690             i == 0 and self.tokenizer.no_speech is not None
    691         ):  # save no_speech_probs
    692             probs_at_sot = logits[:, self.sot_index].float().softmax(dim=-1)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/decoding.py:163, in PyTorchInference.logits(self, tokens, audio_features)
    159 if tokens.shape[-1] > self.initial_token_length:
    160     # only need to use the last token except in the first forward pass
    161     tokens = tokens[:, -1:]
--> 163 return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/model.py:242, in TextDecoder.forward(self, x, xa, kv_cache)
    239 x = x.to(xa.dtype)
    241 for block in self.blocks:
--> 242     x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
    244 x = self.ln(x)
    245 logits = (
    246     x @ torch.transpose(self.token_embedding.weight.to(x.dtype), 0, 1)
    247 ).float()

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper/model.py:169, in ResidualAttentionBlock.forward(self, x, xa, mask, kv_cache)
    167 x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)[0]
    168 if self.cross_attn:
--> 169     x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)[0]
    170 x = x + self.mlp(self.mlp_ln(x))
    171 return x

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1616, in Module._call_impl(self, *args, **kwargs)
   1614     hook_result = hook(self, args, kwargs, result)
   1615 else:
-> 1616     hook_result = hook(self, args, result)
   1618 if hook_result is not None:
   1619     result = hook_result

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper_timestamped/transcribe.py:882, in _transcribe_timestamped_efficient.<locals>.<lambda>(layer, ins, outs, index)
    878     if i < nblocks - word_alignement_most_top_layers:
    879         continue
    880     all_hooks.append(
    881         block.cross_attn.register_forward_hook(
--> 882             lambda layer, ins, outs, index=j: hook_attention_weights(layer, ins, outs, index))
    883     )
    884     j += 1
    885 if compute_word_confidence or no_speech_threshold is not None:

File ~/Desktop/speech_app/venv/lib/python3.12/site-packages/whisper_timestamped/transcribe.py:777, in _transcribe_timestamped_efficient.<locals>.hook_attention_weights(layer, ins, outs, index)
    775 w = outs[-1]
    776 # Only the last attention weights is useful
--> 777 if w.shape[-2] > 1:
    778     w = w[:, :, -1:, :]
    779 segment_attweights[index].append(w.cpu())

AttributeError: 'NoneType' object has no attribute 'shape'

jonasrenault · 2024-10-03T13:40:41Z

The issue is related to patch #2359 which uses F.scaled_dot_product_attention if available. In this case, the attention weights returned by whisper seem to be None.

A workaround is to use the disable_sdpa context manager introduced in same patch when calling transcribe, though this will limit the performance improvement introduced by the latest version and turbo model of whisper:

import whisper_timestamped as whisperts
from whisper.model import disable_sdpa

audio = whisperts.load_audio("AUDIO.wav")
model = whisperts.load_model("turbo")
with disable_sdpa():
    results = whisperts.transcribe(model, audio)

Med280 · 2024-10-11T17:06:57Z

we supposed to not face that issue when you specify your requirements versions
openai-whisper==20231117

Jeronymous · 2024-10-30T08:16:54Z

The issue is related to patch #2359 which uses F.scaled_dot_product_attention if available. In this case, the attention weights returned by whisper seem to be None.

A workaround is to use the disable_sdpa context manager introduced in same patch when calling transcribe, though this will limit the performance improvement introduced by the latest version and turbo model of whisper:
import whisper_timestamped as whisperts
from whisper.model import disable_sdpa

audio = whisperts.load_audio("AUDIO.wav")
model = whisperts.load_model("turbo")
with disable_sdpa():
    results = whisperts.transcribe(model, audio)

Thanks a lot @jonasrenault I pushed a workaround based on that, to avoid some people being stucked

(sorry for the delay, that was broken for 1 month now ... I have unfortunately much less time now to be active on this whisper-timestamped project)

Jeronymous added the bug Something isn't working label Oct 1, 2024

Gldkslfmsd mentioned this issue Oct 14, 2024

whisper_timestamped None error when streaming to whisper_online_server ufal/whisper_streaming#129

Closed

Jeronymous changed the title ~~The last version of whisper (v20240930) doesn't seem to be supported~~ The last version of whisper (v20240930) doesn't seem to be supported ('NoneType' object has no attribute 'shape') Oct 29, 2024

This was referenced Oct 29, 2024

whisper-timestamped 1.15.4 not compliant with openai-whisper 20240930 #219

Closed

I always get "AttributeError: 'NoneType' object has no attribute 'shape'" error #220

Closed

Jeronymous closed this as completed in ee35e7c Oct 30, 2024

JHjang223 mentioned this issue Nov 3, 2024

It appears that compatibility issues persist between whisper_timestamped v1.15.5 and openai-whisper v20240930 (TypeError: 'NoneType' object is not subscriptable) #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The last version of whisper (v20240930) doesn't seem to be supported ('NoneType' object has no attribute 'shape') #212

The last version of whisper (v20240930) doesn't seem to be supported ('NoneType' object has no attribute 'shape') #212

mfucci commented Oct 1, 2024 •

edited

Loading

Jeronymous commented Oct 1, 2024

Jeronymous commented Oct 1, 2024

villesau commented Oct 2, 2024 •

edited

Loading

Alptimus commented Oct 2, 2024

neonwatty commented Oct 3, 2024

neonwatty commented Oct 3, 2024 •

edited

Loading

jonasrenault commented Oct 3, 2024

Med280 commented Oct 11, 2024

Jeronymous commented Oct 30, 2024

The last version of whisper (v20240930) doesn't seem to be supported ('NoneType' object has no attribute 'shape') #212

The last version of whisper (v20240930) doesn't seem to be supported ('NoneType' object has no attribute 'shape') #212

Comments

mfucci commented Oct 1, 2024 • edited Loading

Jeronymous commented Oct 1, 2024

Jeronymous commented Oct 1, 2024

villesau commented Oct 2, 2024 • edited Loading

Alptimus commented Oct 2, 2024

neonwatty commented Oct 3, 2024

neonwatty commented Oct 3, 2024 • edited Loading

jonasrenault commented Oct 3, 2024

Med280 commented Oct 11, 2024

Jeronymous commented Oct 30, 2024

mfucci commented Oct 1, 2024 •

edited

Loading

villesau commented Oct 2, 2024 •

edited

Loading

neonwatty commented Oct 3, 2024 •

edited

Loading