mlx-examples/whisper/README.md

# Whisper

Speech recognition with Whisper in MLX. Whisper is a set of open source speech
recognition models from OpenAI, ranging from 39 million to 1.5 billion
parameters.[^1]

### Setup

Install [`ffmpeg`](https://ffmpeg.org/):

```
# on macOS using Homebrew (https://brew.sh/)
brew install ffmpeg
```

Install the `mlx-whisper` package with:

```
pip install mlx-whisper
```

### Run

#### CLI

At its simplest:

```
mlx_whisper audio_file.mp3
```

This will make a text file `audio_file.txt` with the results.

Use `-f` to specify the output format and `--model` to specify the model. There
are many other supported command line options. To see them all, run
`mlx_whisper -h`.

#### API

Transcribe audio with:

```python
import mlx_whisper

text = mlx_whisper.transcribe(speech_file)["text"]
```

The default model is "mlx-community/whisper-tiny". Choose the model by
setting `path_or_hf_repo`. For example:

```python
result = mlx_whisper.transcribe(speech_file, path_or_hf_repo="models/large")
```

This will load the model contained in `models/large`. The `path_or_hf_repo` can
also point to an MLX-style Whisper model on the Hugging Face Hub. In this case,
the model will be automatically downloaded. A [collection of pre-converted
Whisper
models](https://huggingface.co/collections/mlx-community/whisper-663256f9964fbb1177db93dc)
are in the Hugging Face MLX Community.

The `transcribe` function also supports word-level timestamps. You can generate
these with:

```python
output = mlx_whisper.transcribe(speech_file, word_timestamps=True)
print(output["segments"][0]["words"])
```

To see more transcription options use:

```
>>> help(mlx_whisper.transcribe)
```

### Converting models

> [!TIP]
> Skip the conversion step by using pre-converted checkpoints from the Hugging
> Face Hub. There are a few available in the [MLX
> Community](https://huggingface.co/mlx-community) organization.

To convert a model, first clone the MLX Examples repo:

```
git clone https://github.com/ml-explore/mlx-examples.git
```

Then run `convert.py` from `mlx-examples/whisper`. For example, to convert the
`tiny` model use:

```
python convert.py --torch-name-or-path tiny --mlx-path mlx_models/tiny
```

Note you can also convert a local PyTorch checkpoint which is in the original
OpenAI format.

To generate a 4-bit quantized model, use `-q`. For a full list of options:

```
python convert.py --help
```

By default, the conversion script will make the directory `mlx_models`
and save the converted `weights.npz` and `config.json` there. 

Each time it is run, `convert.py` will overwrite any model in the provided
path. To save different models, make sure to set `--mlx-path` to a unique
directory for each converted model. For example:

```bash
model="tiny"
python convert.py --torch-name-or-path ${model} --mlx-path mlx_models/${model}_fp16
python convert.py --torch-name-or-path ${model} --dtype float32 --mlx-path mlx_models/${model}_fp32
python convert.py --torch-name-or-path ${model} -q --q_bits 4 --mlx-path mlx_models/${model}_quantized_4bits
```

[^1]: Refer to the [arXiv paper](https://arxiv.org/abs/2212.04356), [blog post](https://openai.com/research/whisper), and [code](https://github.com/openai/whisper) for more details.
Corrected spelling of terms in whisper/README.md 2023-12-14 08:15:26 +08:00			`# Whisper`
a few examples 2023-11-30 00:17:26 +08:00
update whisper readme and requirements 2023-12-08 03:15:54 +08:00			`Speech recognition with Whisper in MLX. Whisper is a set of open source speech`
Corrected spelling of terms in whisper/README.md 2023-12-14 08:15:26 +08:00			`recognition models from OpenAI, ranging from 39 million to 1.5 billion`
[Whisper] Add word timestamps and confidence scores (#201) * Add word timestamps and confidence scores * Create a separate forward_with_cross_qk function * Move multiple ops from np to mlx, clean comments * Save alignment_heads * Cast qk to fp32 * Add test for word-level timestamps and confidence scores * format + readme * nit --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-01-08 02:01:29 +08:00			`parameters.[^1]`
a few examples 2023-11-30 00:17:26 +08:00
update whisper readme and requirements 2023-12-08 03:15:54 +08:00			`### Setup`
a few examples 2023-11-30 00:17:26 +08:00
Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			Install [`ffmpeg`](https://ffmpeg.org/):
a few examples 2023-11-30 00:17:26 +08:00
			```
Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			`# on macOS using Homebrew (https://brew.sh/)`
			`brew install ffmpeg`
a few examples 2023-11-30 00:17:26 +08:00			```

Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			Install the `mlx-whisper` package with:
a few examples 2023-11-30 00:17:26 +08:00
update whisper readme and requirements 2023-12-08 03:15:54 +08:00			```
Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			`pip install mlx-whisper`
			```

			`### Run`

Whisper: Support command line (#746) * Whisper: Add CLI command * Whisper: Prevent precision loss when converting to words dictionary * Whisper: disable json ensure_ascii * Whisper: add cli setup config * Whisper: pre-commit * Whisper: Adjust the _ in the command line arguments to - * nits * version + readme * nit --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-08-17 01:35:44 +08:00			`#### CLI`

			`At its simplest:`

			```
			`mlx_whisper audio_file.mp3`
			```

			This will make a text file `audio_file.txt` with the results.

			Use `-f` to specify the output format and `--model` to specify the model. There
			`are many other supported command line options. To see them all, run`
			`mlx_whisper -h`.

			`#### API`

Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			`Transcribe audio with:`

			```python
			`import mlx_whisper`

			`text = mlx_whisper.transcribe(speech_file)["text"]`
			```

			`The default model is "mlx-community/whisper-tiny". Choose the model by`
			setting `path_or_hf_repo`. For example:

			```python
			`result = mlx_whisper.transcribe(speech_file, path_or_hf_repo="models/large")`
			```

			This will load the model contained in `models/large`. The `path_or_hf_repo` can
			`also point to an MLX-style Whisper model on the Hugging Face Hub. In this case,`
			`the model will be automatically downloaded. A [collection of pre-converted`
			`Whisper`
			`models](https://huggingface.co/collections/mlx-community/whisper-663256f9964fbb1177db93dc)`
			`are in the Hugging Face MLX Community.`

			The `transcribe` function also supports word-level timestamps. You can generate
			`these with:`

			```python
			`output = mlx_whisper.transcribe(speech_file, word_timestamps=True)`
			`print(output["segments"][0]["words"])`
			```

			`To see more transcription options use:`

			```
			`>>> help(mlx_whisper.transcribe)`
a few examples 2023-11-30 00:17:26 +08:00			```

Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			`### Converting models`

[Lora] Fix generate (#282) * fix generate * update readme, fix test, better default * nits * typo 2024-01-11 08:13:06 +08:00			`> [!TIP]`
			`> Skip the conversion step by using pre-converted checkpoints from the Hugging`
			`> Face Hub. There are a few available in the [MLX`
			`> Community](https://huggingface.co/mlx-community) organization.`

Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			`To convert a model, first clone the MLX Examples repo:`

			```
			`git clone https://github.com/ml-explore/mlx-examples.git`
			```

			Then run `convert.py` from `mlx-examples/whisper`. For example, to convert the
			`tiny` model use:
[Whisper] Load customized MLX model & Quantization (#191) * Add option to load customized mlx model * Add quantization * Apply reviews * Separate model conversion and loading * Update test * Fix benchmark * Add notes about conversion * Improve doc 2023-12-30 02:22:15 +08:00
			```
			`python convert.py --torch-name-or-path tiny --mlx-path mlx_models/tiny`
			```

Whisper: Add pip distribution configuration to support pip installations. (#739) * Whisper: rename whisper to mlx_whisper * Whisper: add setup.py config for publish * Whisper: add assets data to setup config * Whisper: pre-commit for setup.py * Whisper: Update README.md * Whisper: Update README.md * nits * fix package data * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-05-02 00:00:02 +08:00			`Note you can also convert a local PyTorch checkpoint which is in the original`
			`OpenAI format.`
[Whisper] Load customized MLX model & Quantization (#191) * Add option to load customized mlx model * Add quantization * Apply reviews * Separate model conversion and loading * Update test * Fix benchmark * Add notes about conversion * Improve doc 2023-12-30 02:22:15 +08:00
			To generate a 4-bit quantized model, use `-q`. For a full list of options:

			```
			`python convert.py --help`
			```

Update README.md (#530) * Update README.md The default behaviour of where the convert.py saved files was wrong. It also was inconsistent with how the later script test.py is trying to use them (and assuming naming convention). I don't actually see a quick way to automate this since--as written--the target directory is set directly by an argument. It would probably be best to rewrite it so that the argument is used as an override variable, but the default behaviour is to construct a file path based on set and unset arugments. This also is complex because "defaults" are assumed in the naming convention as well. * Update README.md Created an actual script that'll run and do this correctly. * Update README.md Typo fix: mlx-models should have been mlx_models. This conforms with standard later in the mlx-examples/whisper code. * Update README.md Removed the larger script and changed it back to the simpler script as before. * nits in readme --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-03-07 22:23:43 +08:00			By default, the conversion script will make the directory `mlx_models`
			and save the converted `weights.npz` and `config.json` there.

			Each time it is run, `convert.py` will overwrite any model in the provided
			path. To save different models, make sure to set `--mlx-path` to a unique
			`directory for each converted model. For example:`

			```bash
			`model="tiny"`
			`python convert.py --torch-name-or-path ${model} --mlx-path mlx_models/${model}_fp16`
			`python convert.py --torch-name-or-path ${model} --dtype float32 --mlx-path mlx_models/${model}_fp32`
			`python convert.py --torch-name-or-path ${model} -q --q_bits 4 --mlx-path mlx_models/${model}_quantized_4bits`
			```
[Whisper] Load customized MLX model & Quantization (#191) * Add option to load customized mlx model * Add quantization * Apply reviews * Separate model conversion and loading * Update test * Fix benchmark * Add notes about conversion * Improve doc 2023-12-30 02:22:15 +08:00
update whisper readme and requirements 2023-12-08 03:15:54 +08:00			`[^1]: Refer to the [arXiv paper](https://arxiv.org/abs/2212.04356), [blog post](https://openai.com/research/whisper), and [code](https://github.com/openai/whisper) for more details.`