Berkeley Function-Calling Leaderboard

The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically.

Edit | Back to AI


The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: BFCL-v1 introducing AST as an evaluation metric, BFCL-v2 introducing enterprise and OSS-contributed functions, and BFCL-v3 introducing multi-turn interactions. Checkout code and data.

As of today: 27/03/2025 the chart leaders are:

    Single Turn Multi Turn Hallucination Measurement  
    Latency (s) Non-live (AST) Non-live (Exec) Live (AST) Multi turn    
Rank 🔼 Overall Acc Model Cost ($) Mean AST Summary Exec Summary Overall Acc Overall Acc Relevance Irrelevance Organization License
1 74.31 watt-tool-70B (FC) N/A 3.4 84.06 89.39 77.74 58.75 94.44 76.32 Watt AI Lab Apache-2.0
2 72.08 GPT-4o-2024-11-20 (Prompt) 13.54 0.78 88.1 89.38 79.83 47.62 83.33 83.76 OpenAI Proprietary
3 69.94 GPT-4.5-Preview-2025-02-27 (FC) 238.13 3.25 86.12 83.98 79.34 45.25 66.67 83.64 OpenAI Proprietary
4 69.58 GPT-4o-2024-11-20 (FC) 8.23 1.11 87.42 89.2 79.65 41 83.33 83.15 OpenAI Proprietary
5 68.39 ToolACE-2-8B (FC) N/A 6.95 87.58 87.11 80.05 36.88 72.22 90.11 Huawei Noah & USTC Apache-2.0
6 67.98 watt-tool-8B (FC) N/A 1.31 86.56 89.34 76.5 39.12 83.33 83.15 Watt AI Lab Apache-2.0
7 67.88 GPT-4-turbo-2024-04-09 (FC) 33.22 2.47 84.73 85.21 80.5 38.12 72.22 83.81 OpenAI Proprietary
8 67.87 o1-2024-12-17 (Prompt) 102.49 5.36 85.67 87.45 80.63 36 72.22 87.78 OpenAI Proprietary
9 67.72 BitAgent-8B N/A 26.73 86.92 89.52 76.14 38.5 83.33 82.38 Bittensor Apache-2.0
10 65.12 o3-mini-2025-01-31 (Prompt) 8.1 4.86 86.15 89.46 79.08 28.75 72.22 82.96 OpenAI Proprietary
11 64.1 CoALM-405B N/A 6.2 90.58 89.07 74.5 28.75 100 71.79 UIUC + Oumi Meta Llama 3 Community
12 64.1 GPT-4o-mini-2024-07-18 (FC) 0.51 1.49 85.21 83.57 74.41 34.12 83.33 74.75 OpenAI Proprietary
13 63.81 Command A (FC) N/A N/A 86.65 83.38 80.45 25.5 72.22 86.19 Cohere CC-BY-NC 4.0 License (w/ Acceptable Use Addendum)
14 63.53 Qwen2.5-72B-Instruct (FC) N/A 47.29 88.13 87.57 78.98 24.62 70.59 80.31 Qwen qwen
15 62.84 Qwen2.5-32B-Instruct (FC) N/A 21.0 87.21 86.77 79.69 22.25 64.71 81.92 Qwen apache-2.0
16 62.73 Functionary-Medium-v3.1 (FC) N/A 14.06 89.88 91.32 76.63 21.62 72.22 76.08 MeetKai MIT
17 62.51 GPT-4.5-Preview-2025-02-27 (Prompt) 464.51 2.51 89.98 88.5 82.99 15.88 66.67 83.81 OpenAI Proprietary
18 62.38 Gemini-2.0-Pro-Exp-02-05 (Prompt) 0.0 2.13 88.85 83.82 78.23 22.25 72.22 85 Google Proprietary
19 62.21 GPT-4o-mini-2024-07-18 (Prompt) 0.84 1.31 86.77 89.77 76.5 22 83.33 80.67 OpenAI Proprietary
20 61.83 Hammer2.1-7b (FC) N/A 2.08 88.65 85.48 75.11 23.5 82.35 78.59 MadeAgents cc-by-nc-4.0
21 61.55 Gemini-2.0-Pro-Exp-02-05 (FC) 0.0 1.79 83.94 85.11 78.5 20.75 44.44 91.83 Google Proprietary
22 61.38 Amazon-Nova-Pro-v1:0 (FC) 5.26 2.67 84.46 85.64 74.32 26.12 77.78 70.98 Amazon Proprietary
23 61.31 Qwen2.5-72B-Instruct (Prompt) N/A 3.72 90.81 92.7 75.3 18 100 72.81 Qwen qwen
24 60.42 Gemini-2.0-Flash-001 (FC) 0.62 0.7 84.9 81.23 79.12 17.88 50 94.04 Google Proprietary
25 59.67 Qwen2.5-32B-Instruct (Prompt) N/A 2.26 85.81 89.79 74.23 17.75 100 73.75 Qwen apache-2.0
26 59.57 GPT-4-turbo-2024-04-09 (Prompt) 58.87 1.24 90.88 89.45 63.84 30.25 100 35.57 OpenAI Proprietary
27 59.38 Gemini-2.0-Flash-001 (Prompt) 0.75 0.89 84.48 81.54 81.39 12.75 55.56 92.75 Google Proprietary
28 59.22 Gemma-3-27b-it (Prompt) N/A 20.08 89.02 87.2 76.37 14.5 83.33 73.33 Google gemma-terms-of-use
29 59.07 Hammer2.1-3b (FC) N/A 1.95 86.85 84.09 74.04 17.38 82.35 81.87 MadeAgents qwen-research
30 58.93 Gemini-2.0-Flash-Thinking-Exp-01-21 (Prompt) 0.0 4.81 87.4 87.07 75.97 14.5 77.78 72.75 Google Proprietary
31 58.9 Qwen2.5-14B-Instruct (FC) N/A 11.22 85.42 84.86 76.68 15.88 55.56 77.69 Qwen apache-2.0
32 58.55 DeepSeek-V3 (FC) N/A 2.62 89.17 92.32 68.41 18.62 88.89 59.36 DeepSeek DeepSeek License
33 58.45 mistral-large-2407 (FC) 12.68 3.12 86.81 84.38 69.88 23.75 72.22 52.85 Mistral AI Proprietary
34 58.42 ToolACE-8B (FC) N/A 5.24 87.54 89.21 78.59 7.75 83.33 87.88 Huawei Noah & USTC Apache-2.0
35 58.3 Claude-3.7-Sonnet-20250219 (FC) 2.56 2.78 41.29 47.07 78.41 48.38 72.22 81.4 Anthropic Proprietary
36 57.78 xLAM-8x22b-r (FC) N/A 9.26 83.69 87.88 72.59 16.25 88.89 67.81 Salesforce cc-by-nc-4.0
37 57.68 Qwen2.5-14B-Instruct (Prompt) N/A 2.02 85.69 88.84 74.14 12.25 77.78 77.06 Qwen apache-2.0
38 57.53 DeepSeek-R1 (Prompt) N/A 40.78 87.35 88.23 74.41 12.38 66.67 67.54 DeepSeek MIT
39 57.11 Haha-7B N/A 3.76 86.02 86.11 74.19 10.38 83.33 80.66 TeleAI Apache 2.0
40 56.7 Qwen2.5-7B-Instruct (FC) N/A 6.42 85.71 87.73 74.19 11.5 77.78 69.08 Qwen apache-2.0
41 56.49 Functionary-Small-v3.1 (FC) N/A 18.44 86.75 87.12 73.75 10.12 77.78 70.89 MeetKai MIT
42 56.47 Claude-3-Opus-20240229 (FC) 20.15 9.46 57.92 59.46 78.05 30.25 61.11 81.59 Anthropic Proprietary
43 56.46 Claude-3.5-Sonnet-20241022 (FC) 2.53 3.07 45.44 47.89 78.94 41 77.78 74.04 Anthropic Proprietary
44 56.12 Amazon-Nova-Lite-v1:0 (FC) 0.42 1.64 78.44 80.25 71.61 17.38 66.67 76.41 Amazon Proprietary
45 55.65 o1-2024-12-17 (FC) 68.62 4.86 40.23 46.52 78.01 41 72.22 81.97 OpenAI Proprietary
46 55.51 Gemini-2.0-Flash-Lite-001 (Prompt) 0.31 0.68 84.21 79.71 79.61 3.88 66.67 89.09 Google Proprietary
47 55.51 DeepSeek-Coder-V2 (FC) N/A 29.53 89.44 91.23 73.48 4.5 88.89 70.81 DeepSeek DeepSeek License
48 55.11 Gemini-2.0-Flash-Lite-001 (FC) 0.26 0.63 83.73 78.39 77.25 5.5 50 93.21 Google Proprietary
49 54.87 Hammer2.1-1.5b (FC) N/A 2.73 82.79 83.39 70.64 10.5 77.78 79.27 MadeAgents cc-by-nc-4.0
50 54.75 xLAM-7b-r (FC) N/A 12.74 81.06 79.88 75.22 10 94.44 77.11 Salesforce cc-by-nc-4.0
51 54.55 GoGoAgent N/A 2.66 86.23 89.86 74.01 1 77.78 83.12 BitAgent Proprietary
52 54.51 CoALM-8B N/A 3.65 85.25 86.07 72.95 4.5 83.33 85.49 UIUC + Oumi Meta Llama 3 Community
53 54.36 CoALM-70B N/A 8.95 83.08 83.7 71.92 7 66.67 85.63 UIUC + Oumi Meta Llama 3 Community
54 54.31 claude-3.5-haiku-20241022 (Prompt) 0.48 1.84 83.19 84.71 70.77 9.75 77.78 65.78 Anthropic Proprietary
55 54.19 Llama-3.1-70B-Instruct (Prompt) N/A 4.95 89.98 90.12 62.24 12.5 100 54.78 Meta Meta Llama 3 Community
56 53.91 GPT-3.5-Turbo-0125 (FC) 1.38 0.87 83.94 83.79 64.02 19.5 94.44 36.53 OpenAI Proprietary
57 53.69 Qwen2.5-7B-Instruct (Prompt) N/A 4.54 86.46 88.29 67.44 7.62 88.89 65.16 Qwen apache-2.0
58 53.44 Sky-T1-32B-Preview (Prompt) N/A 18.64 88.67 91.54 68 4.63 94.12 61.21 NovaSky-AI apache-2.0
59 53.27 claude-3.5-haiku-20241022 (FC) 0.83 2.83 40.62 50.46 72.37 40 83.33 63.68 Anthropic Proprietary
60 53.06 FireFunction-v2 (FC) N/A 2.13 88.46 87.54 65.66 8.62 94.44 53.02 Fireworks Apache 2.0
61 52.58 xLAM-8x7b-r (FC) N/A 4.96 67.65 74.05 71.08 15.5 94.44 67.15 Salesforce cc-by-nc-4.0
62 52.2 Mistral-Medium-2312 (Prompt) 11.07 4.57 73.12 81.57 77.61 0.38 66.67 85.93 Mistral AI Proprietary
63 52.15 Command R7B (FC) 0.1 1.35 81.67 84.02 69.17 5 55.56 81.02 Cohere cc-by-nc-4.0
64 51.77 Ministral-8B-Instruct-2410 (FC) N/A 12.79 83.83 79.57 64.93 11.38 70.59 55.28 Mistral AI Mistral AI Research License
65 51.76 Meta-Llama-3-70B-Instruct (Prompt) N/A 3.65 87.81 88.21 64.95 5.62 100 50.88 Meta Meta Llama 3 Community
66 51.69 Claude-3.5-Sonnet-20241022 (Prompt) 1.23 1.81 72.48 80 71.97 7.5 77.78 64.4 Anthropic Proprietary
67 51.68 MiniCPM3-4B-FC (FC) N/A 160.19 80.83 87.57 70.01 2.62 72.22 72.22 openbmb Apache-2.0
68 51.43 Llama-3.3-70B-Instruct (Prompt) N/A 6.98 85.08 90.68 62.77 6.87 100 48.71 Meta Meta Llama 3 Community
69 51.37 Amazon-Nova-Micro-v1:0 (FC) 0.23 1.49 71.12 69.23 67.04 16.12 72.22 74.2 Amazon Proprietary
70 51.37 Claude-3-Opus-20240229 (Prompt) 10.48 4.6 85.31 86.32 66.99 7.13 83.33 40.25 Anthropic Proprietary
71 51.31 Open-Mistral-Nemo-2407 (FC) 1.18 1.55 82.1 77.66 65.97 9.38 66.67 63.19 Mistral AI Proprietary
72 51.03 Gemma-3-12b-it (Prompt) N/A 7.9 83.83 82.8 67.13 4.62 88.89 61.11 Google gemma-terms-of-use
73 50.87 Llama-3.1-8B-Instruct (Prompt) N/A 10.3 84.21 86.3 61.08 9.62 77.78 48.82 Meta Meta Llama 3 Community
74 50.75 Qwen2.5-3B-Instruct (FC) N/A 4.24 78.83 78.23 69.39 6 88.89 64.26 Qwen qwen
75 50.59 Claude-3.7-Sonnet-20250219 (Prompt) 1.1 1.7 88 87.5 65.79 0.62 100 52 Anthropic Proprietary
76 50.36 Open-Mixtral-8x22b (Prompt) 12.83 1.36 88.02 87.77 66.02 0.5 83.33 55.09 Mistral AI Proprietary
77 50.2 o3-mini-2025-01-31 (FC) 4.75 4.03 42.12 43.2 77.3 26.12 77.78 80.67 OpenAI Proprietary
78 49.35 Command-R-Plus (FC) 7.8 2.58 77.02 81.21 59 13.12 72.22 53.16 Cohere For AI cc-by-nc-4.0
79 49.31 Granite-20b-FunctionCalling (FC) N/A 1.84 82.46 86.36 59.66 3.38 88.89 74.82 IBM Apache-2.0
80 48.72 Qwen2.5-1.5B-Instruct (FC) N/A 2.83 79.1 82.12 64.82 2.5 94.44 62.68 Qwen apache-2.0
81 48.32 GPT-3.5-Turbo-0125 (Prompt) 2.16 0.72 72.85 70.39 68.55 5.62 94.44 58.39 OpenAI Proprietary
82 47.3 Falcon3-10B-Instruct (FC) N/A 8.92 84.62 90.91 54.11 5 94.44 31.89 TII UAE falcon-llm-license
83 47.29 Hermes-2-Pro-Llama-3-8B (FC) N/A 3.97 76.79 76.23 64.95 2.38 44.44 60.78 NousResearch apache-2.0
84 47.27 mistral-large-2407 (Prompt) 24.91 3.32 90.54 90.12 52.82 8.38 100 4.35 Mistral AI Proprietary
85 47.09 Qwen2.5-3B-Instruct (Prompt) N/A 1.03 80.79 81.71 58.69 3.38 88.89 54.19 Qwen qwen
86 46.92 Llama-3.2-3B-Instruct (Prompt) N/A 7.79 80.56 83.7 55.8 5.25 88.89 51.69 Meta Meta Llama 3 Community
87 46.71 Qwen2.5-1.5B-Instruct (Prompt) N/A 2.51 73.37 85.61 61.08 1.12 83.33 63.04 Qwen apache-2.0
88 45.68 Falcon3-7B-Instruct (FC) N/A 11.24 82.31 86.62 54.86 3.38 88.89 33.73 TII UAE falcon-llm-license
89 45.28 Hammer2.1-0.5b (FC) N/A 1.29 69.12 70.46 62.91 2.25 77.78 73.94 MadeAgents cc-by-nc-4.0
90 44.79 Mistral-small-2402 (FC) 3.36 1.73 59.15 53.84 72.19 2.62 77.78 80.86 Mistral AI Proprietary
91 43.15 Hermes-2-Pro-Mistral-7B (FC) N/A 10.63 73.06 76 57.71 2.63 66.67 38.88 NousResearch apache-2.0
92 43.09 Open-Mixtral-8x7b (Prompt) 2.74 1.73 63.58 69.61 61.44 1.5 88.89 59.52 Mistral AI Proprietary
93 43.02 Open-Mixtral-8x22b (FC) 7.0 2.63 61.67 63.64 68.64 1.5 83.33 45.71 Mistral AI Proprietary
94 42.56 Open-Mistral-Nemo-2407 (Prompt) 1.79 1.65 86.12 89.07 49.04 0.25 88.89 6.43 Mistral AI Proprietary
95 42.3 Qwen2-7B-Instruct (Prompt) N/A 3.99 76.65 76.8 50.6 3.25 88.89 39 Qwen apache-2.0
96 41.61 Bielik-11B-v2.3-Instruct (Prompt) N/A 4.87 65.04 65.16 58.91 3.75 77.78 40.58 SpeakLeash & ACK Cyfronet AGH Apache 2.0
97 40.96 DBRX-Instruct (Prompt) 8.49 3.74 61.25 69.14 60.28 0 94.44 40.5 Databricks Databricks Open Model
98 39.98 FireFunction-v1 (FC) N/A 2.27 43 44.57 70.46 2.38 94.44 71.8 Fireworks Apache 2.0
99 39.27 xLAM-7b-fc-r (FC) N/A 6.26 72.08 60.63 53.4 0 77.78 44.95 Salesforce cc-by-nc-4.0
100 38.96 GLM-4-9b-Chat (FC) N/A 6.09 36.67 46 66.81 3.5 66.67 79.71 THUDM glm-4
101 38.59 MiniCPM3-4B (Prompt) N/A 20.78 65.88 50.59 54.46 2 50 74.43 openbmb Apache-2.0
102 37.86 Gemma-3-4b-it (Prompt) N/A 12.44 63.33 47.66 59.17 0.12 77.78 48.14 Google gemma-terms-of-use
103 36.93 Nexusflow-Raven-v2 (FC) N/A 1.13 45.88 59.11 54.2 1 61.11 78.53 Nexusflow Apache 2.0
104 36.12 Qwen2.5-0.5B-Instruct (FC) N/A 2.53 62.29 60.93 47.98 1.25 88.89 46.23 Qwen apache-2.0
105 35.43 Meta-Llama-3-8B-Instruct (Prompt) N/A 6.03 60.79 66.43 47.98 0.75 77.78 18.59 Meta Meta Llama 3 Community
106 31.26 Mistral-Small-2402 (Prompt) 3.91 1.57 26.94 30.36 58.77 0.75 44.44 69.74 Mistral AI Proprietary
107 29.98 Falcon3-3B-Instruct (FC) N/A 11.52 53.25 32.66 47.4 0.5 77.78 34.47 TII UAE falcon-llm-license
108 29.28 Qwen2-1.5B-Instruct (Prompt) N/A 3.09 54.29 52.39 39.05 0.5 94.44 21.19 Qwen apache-2.0
109 28.06 Qwen2.5-0.5B-Instruct (Prompt) N/A 0.95 53.19 61.89 31.59 0 94.44 16.44 Qwen apache-2.0
110 27.59 Llama-3.1-8B-Instruct (FC) N/A 5.79 48.21 50.18 33.5 5.38 94.44 4.86 Meta Meta Llama 3 Community
111 27.14 Llama-3.1-70B-Instruct (FC) N/A 5.44 25.29 31.62 45 4.88 100 44.85 Meta Meta Llama 3 Community
112 24.95 xLAM-1b-fc-r (FC) N/A 6.26 41.17 42.95 36.92 0.12 100 6.69 Salesforce cc-by-nc-4.0
113 22.43 DeepSeek-Coder-V2-Lite-Instruct (FC) N/A 13.9 4.88 33.18 39.4 0.12 0 96.31 DeepSeek DeepSeek License
114 20.59 Llama-3.2-1B-Instruct (Prompt) N/A 6.08 28.44 25.27 31.36 0 38.89 59.7 Meta Meta Llama 3 Community
115 17.59 QwQ-32B-Preview (Prompt) N/A 23.37 1.48 0.5 40.78 0 0 99.32 Qwen apache-2.0
116 17.49 Falcon3-1B-Instruct (FC) N/A 2.2 9.02 11.61 32.7 0 0 87.16 TII UAE falcon-llm-license
117 16.6 Gemma-3-1b-it (Prompt) N/A 2.47 21.5 20.5 30.34 0 50 30.92 Google gemma-terms-of-use

 

This fantastic resource is very valuable for all AI Researchers.

Here is a simple C# Code sample:

/// <summary>
/// Test the LLM with tools.
/// </summary>
private async void TestLLMWithTools()
{

    // Init a new Tool Registry:
    ToolRegistry toolRegistry = new ToolRegistry();

    // Init a new LlmClient:
    LlmClient llmClient = new LlmClient("http://192.168.0.2:11434", "watt-tool-8B", toolRegistry);

    List<string> queries = new List<string>();

    // search_the_web questions
    queries.Add("What are the top 5 AI companies in 2025?");
    queries.Add("Search for recent breakthroughs in quantum computing");
    queries.Add("Find me information about Mars colonization plans");

    // get_ip_address questions
    queries.Add("What's the keey words in the sentence: 'Send an email to john@example.com with subject 'Meeting' and body 'Let's meet tomorrow'");
    queries.Add("What are the key words in: Can you tell me the system's public IP?");
    queries.Add("What key words can you find in: Please show me my IP address");

    // get_ip_address questions
    queries.Add("What's my current IP address?");
    queries.Add("Can you tell me the system's public IP?");
    queries.Add("Please show me my IP address");

    // send_email questions
    queries.Add("Send an email to john@example.com with subject 'Meeting' and body 'Let's meet tomorrow'");
    queries.Add("Email sarah@test.com with subject 'Report' and body 'Here is the latest report'");
    queries.Add("Please send an email to mike@work.com, subject 'Update', body 'Project is on track'");

    // financial_ratios.interest_coverage questions
    queries.Add("What's Tesla's interest coverage ratio for the past 3 years?");
    queries.Add("Calculate Apple's interest coverage ratio over 5 years");
    queries.Add("Show me Microsoft’s interest coverage ratio for the last 2 years");

    // sales_growth.calculate questions
    queries.Add("What’s the sales growth rate for Amazon over the past 4 years?");
    queries.Add("Calculate Google's sales growth for the last 3 years");
    queries.Add("Find the sales growth rate of Netflix for the past 5 years");

    // weather_forecast questions
    queries.Add("What's the weather forecast for New York for the next 3 days?");
    queries.Add("Tell me the 5-day weather forecast for London");
    queries.Add("Show me the weather prediction for Tokyo for the next 2 days");

    // get_current_time questions
    queries.Add("What's the current time in ISO format?");
    queries.Add("Show me the time in Unix format");
    queries.Add("What’s the present time in ISO format?");


    // 
    foreach (string query in queries)
    {

        // 
        string json = await llmClient.CallLLMWithToolsAsync("/api/chat", query);

        // 
        Response response = JsonSerializer.Deserialize<Response>(json);
        textBox1.Text = response.Message.Content;
    }
}

 

This small test gives me a 100% success rate, which is fantastic for a small model like this!

Note: Ollama Tool Calls are not supported by the Watt Models, however, the models can be used like standard chat, and the function calling works as if you're chatting with a model.