mlx-examples/whisper
amcox886 ef32379bc6
Update README.md (#530)
* Update README.md

The default behaviour of where the convert.py saved files was wrong. It also was inconsistent with how the later script test.py is trying to use them (and assuming naming convention). 

I don't actually see a quick way to automate this since--as written--the  target directory is set directly by an argument. It would probably be best to rewrite it so that the argument is used as an override variable, but the default behaviour is to construct a file path based on set and unset arugments. This also is complex because "defaults" are assumed in the naming convention as well.

* Update README.md

Created an actual script that'll run and do this correctly.

* Update README.md

Typo fix: mlx-models should have been mlx_models. This conforms with standard later in the mlx-examples/whisper code.

* Update README.md

Removed the larger script and changed it back to the simpler script as before.

* nits in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-07 06:23:43 -08:00
..
whisper work with tuple shape (#393) 2024-02-01 13:03:47 -08:00
benchmark.py Fix TypeError in whisper benchmark script (#306) 2024-01-12 13:07:15 -08:00
convert.py [Whisper] Add HF Hub upload option. (#254) 2024-01-08 06:18:24 -08:00
README.md Update README.md (#530) 2024-03-07 06:23:43 -08:00
requirements.txt work with tuple shape (#393) 2024-02-01 13:03:47 -08:00
test.py work with tuple shape (#393) 2024-02-01 13:03:47 -08:00

Whisper

Speech recognition with Whisper in MLX. Whisper is a set of open source speech recognition models from OpenAI, ranging from 39 million to 1.5 billion parameters.1

Setup

First, install the dependencies:

pip install -r requirements.txt

Install ffmpeg:

# on macOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Tip

Skip the conversion step by using pre-converted checkpoints from the Hugging Face Hub. There are a few available in the MLX Community organization.

To convert a model, first download the Whisper PyTorch checkpoint and convert the weights to the MLX format. For example, to convert the tiny model use:

python convert.py --torch-name-or-path tiny --mlx-path mlx_models/tiny

Note you can also convert a local PyTorch checkpoint which is in the original OpenAI format.

To generate a 4-bit quantized model, use -q. For a full list of options:

python convert.py --help

By default, the conversion script will make the directory mlx_models and save the converted weights.npz and config.json there.

Each time it is run, convert.py will overwrite any model in the provided path. To save different models, make sure to set --mlx-path to a unique directory for each converted model. For example:

model="tiny"
python convert.py --torch-name-or-path ${model} --mlx-path mlx_models/${model}_fp16
python convert.py --torch-name-or-path ${model} --dtype float32 --mlx-path mlx_models/${model}_fp32
python convert.py --torch-name-or-path ${model} -q --q_bits 4 --mlx-path mlx_models/${model}_quantized_4bits

Run

Transcribe audio with:

import whisper

text = whisper.transcribe(speech_file)["text"]

Choose the model by setting path_or_hf_repo. For example:

result = whisper.transcribe(speech_file, path_or_hf_repo="models/large")

This will load the model contained in models/large. The path_or_hf_repo can also point to an MLX-style Whisper model on the Hugging Face Hub. In this case, the model will be automatically downloaded.

The transcribe function also supports word-level timestamps. You can generate these with:

output = whisper.transcribe(speech_file, word_timestamps=True)
print(output["segments"][0]["words"])

To see more transcription options use:

>>> help(whisper.transcribe)

  1. Refer to the arXiv paper, blog post, and code for more details. ↩︎