BUG FIX: Decoding results in garbled text when multiple tokens represent a single character (e.g., Chinese). (#398)

* Decoding results in garbled text when multiple tokens represent a single character (e.g., Chinese).

* Decoding results in garbled text when multiple tokens represent a single character (e.g., Chinese).
This commit is contained in:
ZHAOKAI WANG
2024-02-01 11:27:29 +08:00
committed by GitHub
parent 94358219cf
commit 0340113e02

View File

@@ -292,8 +292,9 @@ def generate(model, prompt, tokenizer, args):
tokens.append(token.item())
s = tokenizer.decode(tokens)
print(s[skip:], end="", flush=True)
skip = len(s)
if len(s) - skip > 1:
print(s[skip:-1], end="", flush=True)
skip = len(s) - 1
print(tokenizer.decode(tokens)[skip:], flush=True)
print("=" * 10)
if len(tokens) == 0: