mlx-examples/llava
锦此 f7bbe458ae Add timeout to generate functions
Add timeout handling to various `generate` functions across multiple files.

* **cvae/main.py**
  - Add `timeout` parameter to `generate` function.
  - Implement timeout handling using `signal` module in `generate` function.

* **flux/dreambooth.py**
  - Add `timeout` parameter to `generate_progress_images` function.
  - Implement timeout handling using `signal` module in `generate_progress_images` function.

* **musicgen/generate.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

* **stable_diffusion/txt2image.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

* **llava/generate.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

* **llms/gguf_llm/generate.py**
  - Add `timeout` parameter to `generate` function.
  - Implement timeout handling using `signal` module in `generate` function.

* **llms/mlx_lm/generate.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/jincdream/mlx-examples?shareId=XXXX-XXXX-XXXX-XXXX).
2024-10-22 17:06:58 +08:00
..
.gitignore LlaVA in MLX (#461) 2024-03-01 10:28:35 -08:00
generate.py Add timeout to generate functions 2024-10-22 17:06:58 +08:00
language.py Switch to fast RMS/LN Norm (#603) 2024-03-23 07:13:51 -07:00
llava.py Fix llava model when using text-only prompt (#998) 2024-09-25 07:19:41 -07:00
README.md LlaVA in MLX (#461) 2024-03-01 10:28:35 -08:00
requirements.txt Switch to fast RMS/LN Norm (#603) 2024-03-23 07:13:51 -07:00
test.py LlaVA in MLX (#461) 2024-03-01 10:28:35 -08:00
vision.py LlaVA in MLX (#461) 2024-03-01 10:28:35 -08:00

LLaVA

An example of LLaVA: Large Language and Vision Assistant in MLX.1 LLlava is a multimodal model that can generate text given combined image and text inputs.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

You can use LLaVA to ask questions about images.

For example, using the command line:

python generate.py \
  --model llava-hf/llava-1.5-7b-hf \
  --image "http://images.cocodataset.org/val2017/000000039769.jpg" \
  --prompt "USER: <image>\nWhat are these?\nASSISTANT:" \
  --max-tokens 128 \
  --temp 0

This uses the following image:

alt text

And generates the output:

These are two cats lying on a pink couch.

You can also use LLaVA in Python:

from generate import load_model, prepare_inputs, generate_text

processor, model = load_model("llava-hf/llava-1.5-7b-hf")

max_tokens, temperature = 128, 0.0

prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
image = "http://images.cocodataset.org/val2017/000000039769.jpg"
input_ids, pixel_values = prepare_inputs(processor, image, prompt)

reply = generate_text(
    input_ids, pixel_values, model, processor, max_tokens, temperature
)

print(reply)

  1. Refer to LLaVA project webpage for more information. ↩︎