mlx_whisper: add support for audio input from stdin (#1012)

* add support for audio and input name from stdin

* refactored to stdin - arg, and output-name template

* fix bugs, add test coverage

* fix doc to match arg rename

* some nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
Anthony Wu
2024-11-04 14:02:13 -08:00
committed by GitHub
parent 3b526f0aa1
commit 4394633ce0
4 changed files with 53 additions and 26 deletions

View File

@@ -25,7 +25,7 @@ pip install mlx-whisper
At its simplest:
```
```sh
mlx_whisper audio_file.mp3
```
@@ -35,6 +35,15 @@ Use `-f` to specify the output format and `--model` to specify the model. There
are many other supported command line options. To see them all, run
`mlx_whisper -h`.
You can also pipe the audio content of other programs via stdin:
```sh
some-process | mlx_whisper -
```
The default output file name will be `content.*`. You can specify the name with
the `--output-name` flag.
#### API
Transcribe audio with:
@@ -103,7 +112,7 @@ python convert.py --help
```
By default, the conversion script will make the directory `mlx_models`
and save the converted `weights.npz` and `config.json` there.
and save the converted `weights.npz` and `config.json` there.
Each time it is run, `convert.py` will overwrite any model in the provided
path. To save different models, make sure to set `--mlx-path` to a unique