BUG FIX: Decoding results in garbled text when multiple tokens represent a single character (e.g., Chinese). (#398)

* Decoding results in garbled text when multiple tokens represent a single character (e.g., Chinese). * Decoding results in garbled text when multiple tokens represent a single character (e.g., Chinese).
2025-12-13 00:09:06 +08:00 · 2024-02-01 11:27:29 +08:00
parent 94358219cf
commit 0340113e02
1 changed files with 3 additions and 2 deletions
--- a/lora/lora.py
+++ b/lora/lora.py
@@ -292,8 +292,9 @@ def generate(model, prompt, tokenizer, args):

        tokens.append(token.item())
        s = tokenizer.decode(tokens)
-        print(s[skip:], end="", flush=True)
-        skip = len(s)
+        if len(s) - skip > 1:
+            print(s[skip:-1], end="", flush=True)
+            skip = len(s) - 1
    print(tokenizer.decode(tokens)[skip:], flush=True)
    print("=" * 10)
    if len(tokens) == 0: