Fix cross-attention (#210)

* Fix cross-attention With the current code, ln2 is a no-op. Its output should be passed to the cross-attention layer * Add name to contributors
2025-12-16 01:49:05 +08:00 · 2023-12-18 15:27:27 -05:00
parent 4d4af12c6f
commit f4f6e17d45
2 changed files with 2 additions and 1 deletions
--- a/python/mlx/nn/layers/transformer.py
+++ b/python/mlx/nn/layers/transformer.py
@@ -157,7 +157,7 @@ class TransformerDecoderLayer(Module):
        x = x + y

        y = self.ln2(x)
-        y = self.cross_attention(x, memory, memory, memory_mask)
+        y = self.cross_attention(y, memory, memory, memory_mask)
        x = x + y

        y = self.ln3(x)