2024-08-17 01:04
起因是我发现了Llama3在Hugging Face上的一个小bug。按照官网的教程让模型进行function call的时候,会引发一个错误。
不过这不是重点。
我花了大半天时间debug,最终发现是模型的配置文件中,提示词模板的生成逻辑里有个小问题。我写了一个详细的说明,包括复现代码、问题根源、修复方法,发在Hugging Face Llama 3.1 模型的讨论区里,并且@了相关负责的人。
不过这也不是重点。
重点是,我跟ChatGPT说,让它把我的帖子语气改得轻松活泼点。不要太严肃。又不是我在给他们打工!
ChatGPT真给力。以下是我的帖子全文,包含修改前和修改后的对照。一起来学生活化的英语口语吧!
原文:I was trying the tool use example from the tutorial page on the Meta-Llama-3.1-8B-Instruct model with code (mostly also from the tutorial page):
修改:I was playing around with the tutorial example of tool use on the Meta-Llama-3.1-8B-Instruct model and noticed something off. Here’s what I ran (mostly lifted from the tutorial page):
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
def get_current_temperature(location: str, unit: str) -> float:
"""
Get the current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, Country"
unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
Returns:
The current temperature at the specified location in the specified units, as a float.
"""
return 22. # A real function should probably actually get the temperature!
def get_current_wind_speed(location: str) -> float:
"""
Get the current wind speed in km/h at a given location.
Args:
location: The location to get the temperature for, in the format "City, Country"
Returns:
The current wind speed at the given location in km/h, as a float.
"""
return 6. # A real function should probably actually get the wind speed!
tools = [get_current_temperature, get_current_wind_speed]
messages = [
{"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
{"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
原文:At this step the model generated a response of:
修改:The model’s response at this step was:
<|python_tag|>{"name": "get_current_temperature", "parameters": {"location": "Paris, France", "unit": "celsius"}}<|eom_id|>
原文:This is different from what the tutorial page claims:
修改:…which is a bit different from what’s documented on the tutorial page:
<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call><|im_end|>
原文:Note the response in tutorial has a key “arguments”, while the one from actual Llama model has a key “parameters”.
Then I use the response text in the following steps:
修改:The key difference is the tutorial mentions a key “arguments,” but the actual response from the Llama model has “parameters.”
Continuing with the example, I hit a snag at this step:
import json
out_str = tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
tool_call = json.loads(out_str)
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
原文:The tokenizer.apply_chat_template()
will raise a TypeError: Object of type Undefined is not JSON serializable.
修改:When the tokenizer.apply_chat_template()
ran, it threw a TypeError: Object of type Undefined is not JSON serializable.
原文:I did some investigation and found that:
The tokenizer uses Jinja2 to generate the prompt template which is defined in the tokenizer_config.json file. The template rendering logic includes this:
修改:Diving a bit deeper, I discovered that the issue stems from how the tokenizer uses Jinja2 for generating prompt templates defined in tokenizer_config.json. Here’s the juicy part:
{%- for message in messages %}\n
...
{%- elif 'tool_calls' in message %}\n
...
{%- set tool_call = message.tool_calls[0].function %}\n
...
{%- else %}\n
{{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n
{{- '{\"name\": \"' + tool_call.name + '\", ' }}\n
{{- '\"parameters\": ' }}\n
{{- tool_call.arguments | tojson }}\n
{{- \"}\" }}\n
{%- endif %}\n
原文:So it will read tool_call.arguments
from the message that contains tool_calls. However, the tool_call message generated by the model doesn’t have key name “arguments”. It has “parameters” instead. So Jinja2 will raise the error during json parsing.
修改:The crux of the problem is that it looks for tool_call.arguments
in the message containing tool_calls, but Llama model spits out “parameters” instead. Hence, Jinja2 isn’t happy during the JSON parsing.
原文:The reason that the model uses “parameters” key name can also be found in the same prompt template:
修改:And the model’s use of “parameters” instead of “arguments” is actually prompted by the very same template:
{{- '<|start_header_id|>user<|end_header_id|>\\n\\n' -}}\n
{{- \"Given the following functions, please respond with a JSON for a function call \" }}\n
{{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n
{{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n
{{- \"Do not use variables.\\n\\n\" }}\n
原文:It clearly asks the model to generate response in the format {"name": function name, "parameters": dictionary of argument name and its value}
.
修改:It explicitly formats the response with {"name": function name, "parameters": dictionary of argument name and its value}
.
原文:The code on the tutorial page doesn’t trigger this problem because it doesn’t use the model response as tool_call, instead, it uses a mock message which contains the correct key name:
修改:Interestingly, the tutorial doesn’t encounter this error because it uses a mock message, not a live model response, which conveniently includes the correct key:
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
原文:Modify the template in tokenizer_config.json by replacing “parameters” with “arguments”, or vice versa.
The tokenizer_config.json files in the model Meta-Llama-3.1-70B-Instruct and Meta-Llama-3.1-405B-Instruct also have the same problem.
修改:Modify the template in tokenizer_config.json by replacing “parameters” with “arguments”, or vice versa.
Also, the tokenizer_config.json files in the model Meta-Llama-3.1-70B-Instruct and Meta-Llama-3.1-405B-Instruct seem to share this glitch.
Hope this helps someone out there!