The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically.
The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: BFCL-v1 introducing AST as an evaluation metric, BFCL-v2 introducing enterprise and OSS-contributed functions, and BFCL-v3 introducing multi-turn interactions. Checkout code and data.
As of today: 27/03/2025 the chart leaders are:
Single Turn | Multi Turn | Hallucination Measurement | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Latency (s) | Non-live (AST) | Non-live (Exec) | Live (AST) | Multi turn | ||||||||
Rank 🔼 | Overall Acc | Model | Cost ($) | Mean | AST Summary | Exec Summary | Overall Acc | Overall Acc | Relevance | Irrelevance | Organization | License |
1 | 74.31 | watt-tool-70B (FC) | N/A | 3.4 | 84.06 | 89.39 | 77.74 | 58.75 | 94.44 | 76.32 | Watt AI Lab | Apache-2.0 |
2 | 72.08 | GPT-4o-2024-11-20 (Prompt) | 13.54 | 0.78 | 88.1 | 89.38 | 79.83 | 47.62 | 83.33 | 83.76 | OpenAI | Proprietary |
3 | 69.94 | GPT-4.5-Preview-2025-02-27 (FC) | 238.13 | 3.25 | 86.12 | 83.98 | 79.34 | 45.25 | 66.67 | 83.64 | OpenAI | Proprietary |
4 | 69.58 | GPT-4o-2024-11-20 (FC) | 8.23 | 1.11 | 87.42 | 89.2 | 79.65 | 41 | 83.33 | 83.15 | OpenAI | Proprietary |
5 | 68.39 | ToolACE-2-8B (FC) | N/A | 6.95 | 87.58 | 87.11 | 80.05 | 36.88 | 72.22 | 90.11 | Huawei Noah & USTC | Apache-2.0 |
6 | 67.98 | watt-tool-8B (FC) | N/A | 1.31 | 86.56 | 89.34 | 76.5 | 39.12 | 83.33 | 83.15 | Watt AI Lab | Apache-2.0 |
7 | 67.88 | GPT-4-turbo-2024-04-09 (FC) | 33.22 | 2.47 | 84.73 | 85.21 | 80.5 | 38.12 | 72.22 | 83.81 | OpenAI | Proprietary |
8 | 67.87 | o1-2024-12-17 (Prompt) | 102.49 | 5.36 | 85.67 | 87.45 | 80.63 | 36 | 72.22 | 87.78 | OpenAI | Proprietary |
9 | 67.72 | BitAgent-8B | N/A | 26.73 | 86.92 | 89.52 | 76.14 | 38.5 | 83.33 | 82.38 | Bittensor | Apache-2.0 |
10 | 65.12 | o3-mini-2025-01-31 (Prompt) | 8.1 | 4.86 | 86.15 | 89.46 | 79.08 | 28.75 | 72.22 | 82.96 | OpenAI | Proprietary |
11 | 64.1 | CoALM-405B | N/A | 6.2 | 90.58 | 89.07 | 74.5 | 28.75 | 100 | 71.79 | UIUC + Oumi | Meta Llama 3 Community |
12 | 64.1 | GPT-4o-mini-2024-07-18 (FC) | 0.51 | 1.49 | 85.21 | 83.57 | 74.41 | 34.12 | 83.33 | 74.75 | OpenAI | Proprietary |
13 | 63.81 | Command A (FC) | N/A | N/A | 86.65 | 83.38 | 80.45 | 25.5 | 72.22 | 86.19 | Cohere | CC-BY-NC 4.0 License (w/ Acceptable Use Addendum) |
14 | 63.53 | Qwen2.5-72B-Instruct (FC) | N/A | 47.29 | 88.13 | 87.57 | 78.98 | 24.62 | 70.59 | 80.31 | Qwen | qwen |
15 | 62.84 | Qwen2.5-32B-Instruct (FC) | N/A | 21.0 | 87.21 | 86.77 | 79.69 | 22.25 | 64.71 | 81.92 | Qwen | apache-2.0 |
16 | 62.73 | Functionary-Medium-v3.1 (FC) | N/A | 14.06 | 89.88 | 91.32 | 76.63 | 21.62 | 72.22 | 76.08 | MeetKai | MIT |
17 | 62.51 | GPT-4.5-Preview-2025-02-27 (Prompt) | 464.51 | 2.51 | 89.98 | 88.5 | 82.99 | 15.88 | 66.67 | 83.81 | OpenAI | Proprietary |
18 | 62.38 | Gemini-2.0-Pro-Exp-02-05 (Prompt) | 0.0 | 2.13 | 88.85 | 83.82 | 78.23 | 22.25 | 72.22 | 85 | Proprietary | |
19 | 62.21 | GPT-4o-mini-2024-07-18 (Prompt) | 0.84 | 1.31 | 86.77 | 89.77 | 76.5 | 22 | 83.33 | 80.67 | OpenAI | Proprietary |
20 | 61.83 | Hammer2.1-7b (FC) | N/A | 2.08 | 88.65 | 85.48 | 75.11 | 23.5 | 82.35 | 78.59 | MadeAgents | cc-by-nc-4.0 |
21 | 61.55 | Gemini-2.0-Pro-Exp-02-05 (FC) | 0.0 | 1.79 | 83.94 | 85.11 | 78.5 | 20.75 | 44.44 | 91.83 | Proprietary | |
22 | 61.38 | Amazon-Nova-Pro-v1:0 (FC) | 5.26 | 2.67 | 84.46 | 85.64 | 74.32 | 26.12 | 77.78 | 70.98 | Amazon | Proprietary |
23 | 61.31 | Qwen2.5-72B-Instruct (Prompt) | N/A | 3.72 | 90.81 | 92.7 | 75.3 | 18 | 100 | 72.81 | Qwen | qwen |
24 | 60.42 | Gemini-2.0-Flash-001 (FC) | 0.62 | 0.7 | 84.9 | 81.23 | 79.12 | 17.88 | 50 | 94.04 | Proprietary | |
25 | 59.67 | Qwen2.5-32B-Instruct (Prompt) | N/A | 2.26 | 85.81 | 89.79 | 74.23 | 17.75 | 100 | 73.75 | Qwen | apache-2.0 |
26 | 59.57 | GPT-4-turbo-2024-04-09 (Prompt) | 58.87 | 1.24 | 90.88 | 89.45 | 63.84 | 30.25 | 100 | 35.57 | OpenAI | Proprietary |
27 | 59.38 | Gemini-2.0-Flash-001 (Prompt) | 0.75 | 0.89 | 84.48 | 81.54 | 81.39 | 12.75 | 55.56 | 92.75 | Proprietary | |
28 | 59.22 | Gemma-3-27b-it (Prompt) | N/A | 20.08 | 89.02 | 87.2 | 76.37 | 14.5 | 83.33 | 73.33 | gemma-terms-of-use | |
29 | 59.07 | Hammer2.1-3b (FC) | N/A | 1.95 | 86.85 | 84.09 | 74.04 | 17.38 | 82.35 | 81.87 | MadeAgents | qwen-research |
30 | 58.93 | Gemini-2.0-Flash-Thinking-Exp-01-21 (Prompt) | 0.0 | 4.81 | 87.4 | 87.07 | 75.97 | 14.5 | 77.78 | 72.75 | Proprietary | |
31 | 58.9 | Qwen2.5-14B-Instruct (FC) | N/A | 11.22 | 85.42 | 84.86 | 76.68 | 15.88 | 55.56 | 77.69 | Qwen | apache-2.0 |
32 | 58.55 | DeepSeek-V3 (FC) | N/A | 2.62 | 89.17 | 92.32 | 68.41 | 18.62 | 88.89 | 59.36 | DeepSeek | DeepSeek License |
33 | 58.45 | mistral-large-2407 (FC) | 12.68 | 3.12 | 86.81 | 84.38 | 69.88 | 23.75 | 72.22 | 52.85 | Mistral AI | Proprietary |
34 | 58.42 | ToolACE-8B (FC) | N/A | 5.24 | 87.54 | 89.21 | 78.59 | 7.75 | 83.33 | 87.88 | Huawei Noah & USTC | Apache-2.0 |
35 | 58.3 | Claude-3.7-Sonnet-20250219 (FC) | 2.56 | 2.78 | 41.29 | 47.07 | 78.41 | 48.38 | 72.22 | 81.4 | Anthropic | Proprietary |
36 | 57.78 | xLAM-8x22b-r (FC) | N/A | 9.26 | 83.69 | 87.88 | 72.59 | 16.25 | 88.89 | 67.81 | Salesforce | cc-by-nc-4.0 |
37 | 57.68 | Qwen2.5-14B-Instruct (Prompt) | N/A | 2.02 | 85.69 | 88.84 | 74.14 | 12.25 | 77.78 | 77.06 | Qwen | apache-2.0 |
38 | 57.53 | DeepSeek-R1 (Prompt) | N/A | 40.78 | 87.35 | 88.23 | 74.41 | 12.38 | 66.67 | 67.54 | DeepSeek | MIT |
39 | 57.11 | Haha-7B | N/A | 3.76 | 86.02 | 86.11 | 74.19 | 10.38 | 83.33 | 80.66 | TeleAI | Apache 2.0 |
40 | 56.7 | Qwen2.5-7B-Instruct (FC) | N/A | 6.42 | 85.71 | 87.73 | 74.19 | 11.5 | 77.78 | 69.08 | Qwen | apache-2.0 |
41 | 56.49 | Functionary-Small-v3.1 (FC) | N/A | 18.44 | 86.75 | 87.12 | 73.75 | 10.12 | 77.78 | 70.89 | MeetKai | MIT |
42 | 56.47 | Claude-3-Opus-20240229 (FC) | 20.15 | 9.46 | 57.92 | 59.46 | 78.05 | 30.25 | 61.11 | 81.59 | Anthropic | Proprietary |
43 | 56.46 | Claude-3.5-Sonnet-20241022 (FC) | 2.53 | 3.07 | 45.44 | 47.89 | 78.94 | 41 | 77.78 | 74.04 | Anthropic | Proprietary |
44 | 56.12 | Amazon-Nova-Lite-v1:0 (FC) | 0.42 | 1.64 | 78.44 | 80.25 | 71.61 | 17.38 | 66.67 | 76.41 | Amazon | Proprietary |
45 | 55.65 | o1-2024-12-17 (FC) | 68.62 | 4.86 | 40.23 | 46.52 | 78.01 | 41 | 72.22 | 81.97 | OpenAI | Proprietary |
46 | 55.51 | Gemini-2.0-Flash-Lite-001 (Prompt) | 0.31 | 0.68 | 84.21 | 79.71 | 79.61 | 3.88 | 66.67 | 89.09 | Proprietary | |
47 | 55.51 | DeepSeek-Coder-V2 (FC) | N/A | 29.53 | 89.44 | 91.23 | 73.48 | 4.5 | 88.89 | 70.81 | DeepSeek | DeepSeek License |
48 | 55.11 | Gemini-2.0-Flash-Lite-001 (FC) | 0.26 | 0.63 | 83.73 | 78.39 | 77.25 | 5.5 | 50 | 93.21 | Proprietary | |
49 | 54.87 | Hammer2.1-1.5b (FC) | N/A | 2.73 | 82.79 | 83.39 | 70.64 | 10.5 | 77.78 | 79.27 | MadeAgents | cc-by-nc-4.0 |
50 | 54.75 | xLAM-7b-r (FC) | N/A | 12.74 | 81.06 | 79.88 | 75.22 | 10 | 94.44 | 77.11 | Salesforce | cc-by-nc-4.0 |
51 | 54.55 | GoGoAgent | N/A | 2.66 | 86.23 | 89.86 | 74.01 | 1 | 77.78 | 83.12 | BitAgent | Proprietary |
52 | 54.51 | CoALM-8B | N/A | 3.65 | 85.25 | 86.07 | 72.95 | 4.5 | 83.33 | 85.49 | UIUC + Oumi | Meta Llama 3 Community |
53 | 54.36 | CoALM-70B | N/A | 8.95 | 83.08 | 83.7 | 71.92 | 7 | 66.67 | 85.63 | UIUC + Oumi | Meta Llama 3 Community |
54 | 54.31 | claude-3.5-haiku-20241022 (Prompt) | 0.48 | 1.84 | 83.19 | 84.71 | 70.77 | 9.75 | 77.78 | 65.78 | Anthropic | Proprietary |
55 | 54.19 | Llama-3.1-70B-Instruct (Prompt) | N/A | 4.95 | 89.98 | 90.12 | 62.24 | 12.5 | 100 | 54.78 | Meta | Meta Llama 3 Community |
56 | 53.91 | GPT-3.5-Turbo-0125 (FC) | 1.38 | 0.87 | 83.94 | 83.79 | 64.02 | 19.5 | 94.44 | 36.53 | OpenAI | Proprietary |
57 | 53.69 | Qwen2.5-7B-Instruct (Prompt) | N/A | 4.54 | 86.46 | 88.29 | 67.44 | 7.62 | 88.89 | 65.16 | Qwen | apache-2.0 |
58 | 53.44 | Sky-T1-32B-Preview (Prompt) | N/A | 18.64 | 88.67 | 91.54 | 68 | 4.63 | 94.12 | 61.21 | NovaSky-AI | apache-2.0 |
59 | 53.27 | claude-3.5-haiku-20241022 (FC) | 0.83 | 2.83 | 40.62 | 50.46 | 72.37 | 40 | 83.33 | 63.68 | Anthropic | Proprietary |
60 | 53.06 | FireFunction-v2 (FC) | N/A | 2.13 | 88.46 | 87.54 | 65.66 | 8.62 | 94.44 | 53.02 | Fireworks | Apache 2.0 |
61 | 52.58 | xLAM-8x7b-r (FC) | N/A | 4.96 | 67.65 | 74.05 | 71.08 | 15.5 | 94.44 | 67.15 | Salesforce | cc-by-nc-4.0 |
62 | 52.2 | Mistral-Medium-2312 (Prompt) | 11.07 | 4.57 | 73.12 | 81.57 | 77.61 | 0.38 | 66.67 | 85.93 | Mistral AI | Proprietary |
63 | 52.15 | Command R7B (FC) | 0.1 | 1.35 | 81.67 | 84.02 | 69.17 | 5 | 55.56 | 81.02 | Cohere | cc-by-nc-4.0 |
64 | 51.77 | Ministral-8B-Instruct-2410 (FC) | N/A | 12.79 | 83.83 | 79.57 | 64.93 | 11.38 | 70.59 | 55.28 | Mistral AI | Mistral AI Research License |
65 | 51.76 | Meta-Llama-3-70B-Instruct (Prompt) | N/A | 3.65 | 87.81 | 88.21 | 64.95 | 5.62 | 100 | 50.88 | Meta | Meta Llama 3 Community |
66 | 51.69 | Claude-3.5-Sonnet-20241022 (Prompt) | 1.23 | 1.81 | 72.48 | 80 | 71.97 | 7.5 | 77.78 | 64.4 | Anthropic | Proprietary |
67 | 51.68 | MiniCPM3-4B-FC (FC) | N/A | 160.19 | 80.83 | 87.57 | 70.01 | 2.62 | 72.22 | 72.22 | openbmb | Apache-2.0 |
68 | 51.43 | Llama-3.3-70B-Instruct (Prompt) | N/A | 6.98 | 85.08 | 90.68 | 62.77 | 6.87 | 100 | 48.71 | Meta | Meta Llama 3 Community |
69 | 51.37 | Amazon-Nova-Micro-v1:0 (FC) | 0.23 | 1.49 | 71.12 | 69.23 | 67.04 | 16.12 | 72.22 | 74.2 | Amazon | Proprietary |
70 | 51.37 | Claude-3-Opus-20240229 (Prompt) | 10.48 | 4.6 | 85.31 | 86.32 | 66.99 | 7.13 | 83.33 | 40.25 | Anthropic | Proprietary |
71 | 51.31 | Open-Mistral-Nemo-2407 (FC) | 1.18 | 1.55 | 82.1 | 77.66 | 65.97 | 9.38 | 66.67 | 63.19 | Mistral AI | Proprietary |
72 | 51.03 | Gemma-3-12b-it (Prompt) | N/A | 7.9 | 83.83 | 82.8 | 67.13 | 4.62 | 88.89 | 61.11 | gemma-terms-of-use | |
73 | 50.87 | Llama-3.1-8B-Instruct (Prompt) | N/A | 10.3 | 84.21 | 86.3 | 61.08 | 9.62 | 77.78 | 48.82 | Meta | Meta Llama 3 Community |
74 | 50.75 | Qwen2.5-3B-Instruct (FC) | N/A | 4.24 | 78.83 | 78.23 | 69.39 | 6 | 88.89 | 64.26 | Qwen | qwen |
75 | 50.59 | Claude-3.7-Sonnet-20250219 (Prompt) | 1.1 | 1.7 | 88 | 87.5 | 65.79 | 0.62 | 100 | 52 | Anthropic | Proprietary |
76 | 50.36 | Open-Mixtral-8x22b (Prompt) | 12.83 | 1.36 | 88.02 | 87.77 | 66.02 | 0.5 | 83.33 | 55.09 | Mistral AI | Proprietary |
77 | 50.2 | o3-mini-2025-01-31 (FC) | 4.75 | 4.03 | 42.12 | 43.2 | 77.3 | 26.12 | 77.78 | 80.67 | OpenAI | Proprietary |
78 | 49.35 | Command-R-Plus (FC) | 7.8 | 2.58 | 77.02 | 81.21 | 59 | 13.12 | 72.22 | 53.16 | Cohere For AI | cc-by-nc-4.0 |
79 | 49.31 | Granite-20b-FunctionCalling (FC) | N/A | 1.84 | 82.46 | 86.36 | 59.66 | 3.38 | 88.89 | 74.82 | IBM | Apache-2.0 |
80 | 48.72 | Qwen2.5-1.5B-Instruct (FC) | N/A | 2.83 | 79.1 | 82.12 | 64.82 | 2.5 | 94.44 | 62.68 | Qwen | apache-2.0 |
81 | 48.32 | GPT-3.5-Turbo-0125 (Prompt) | 2.16 | 0.72 | 72.85 | 70.39 | 68.55 | 5.62 | 94.44 | 58.39 | OpenAI | Proprietary |
82 | 47.3 | Falcon3-10B-Instruct (FC) | N/A | 8.92 | 84.62 | 90.91 | 54.11 | 5 | 94.44 | 31.89 | TII UAE | falcon-llm-license |
83 | 47.29 | Hermes-2-Pro-Llama-3-8B (FC) | N/A | 3.97 | 76.79 | 76.23 | 64.95 | 2.38 | 44.44 | 60.78 | NousResearch | apache-2.0 |
84 | 47.27 | mistral-large-2407 (Prompt) | 24.91 | 3.32 | 90.54 | 90.12 | 52.82 | 8.38 | 100 | 4.35 | Mistral AI | Proprietary |
85 | 47.09 | Qwen2.5-3B-Instruct (Prompt) | N/A | 1.03 | 80.79 | 81.71 | 58.69 | 3.38 | 88.89 | 54.19 | Qwen | qwen |
86 | 46.92 | Llama-3.2-3B-Instruct (Prompt) | N/A | 7.79 | 80.56 | 83.7 | 55.8 | 5.25 | 88.89 | 51.69 | Meta | Meta Llama 3 Community |
87 | 46.71 | Qwen2.5-1.5B-Instruct (Prompt) | N/A | 2.51 | 73.37 | 85.61 | 61.08 | 1.12 | 83.33 | 63.04 | Qwen | apache-2.0 |
88 | 45.68 | Falcon3-7B-Instruct (FC) | N/A | 11.24 | 82.31 | 86.62 | 54.86 | 3.38 | 88.89 | 33.73 | TII UAE | falcon-llm-license |
89 | 45.28 | Hammer2.1-0.5b (FC) | N/A | 1.29 | 69.12 | 70.46 | 62.91 | 2.25 | 77.78 | 73.94 | MadeAgents | cc-by-nc-4.0 |
90 | 44.79 | Mistral-small-2402 (FC) | 3.36 | 1.73 | 59.15 | 53.84 | 72.19 | 2.62 | 77.78 | 80.86 | Mistral AI | Proprietary |
91 | 43.15 | Hermes-2-Pro-Mistral-7B (FC) | N/A | 10.63 | 73.06 | 76 | 57.71 | 2.63 | 66.67 | 38.88 | NousResearch | apache-2.0 |
92 | 43.09 | Open-Mixtral-8x7b (Prompt) | 2.74 | 1.73 | 63.58 | 69.61 | 61.44 | 1.5 | 88.89 | 59.52 | Mistral AI | Proprietary |
93 | 43.02 | Open-Mixtral-8x22b (FC) | 7.0 | 2.63 | 61.67 | 63.64 | 68.64 | 1.5 | 83.33 | 45.71 | Mistral AI | Proprietary |
94 | 42.56 | Open-Mistral-Nemo-2407 (Prompt) | 1.79 | 1.65 | 86.12 | 89.07 | 49.04 | 0.25 | 88.89 | 6.43 | Mistral AI | Proprietary |
95 | 42.3 | Qwen2-7B-Instruct (Prompt) | N/A | 3.99 | 76.65 | 76.8 | 50.6 | 3.25 | 88.89 | 39 | Qwen | apache-2.0 |
96 | 41.61 | Bielik-11B-v2.3-Instruct (Prompt) | N/A | 4.87 | 65.04 | 65.16 | 58.91 | 3.75 | 77.78 | 40.58 | SpeakLeash & ACK Cyfronet AGH | Apache 2.0 |
97 | 40.96 | DBRX-Instruct (Prompt) | 8.49 | 3.74 | 61.25 | 69.14 | 60.28 | 0 | 94.44 | 40.5 | Databricks | Databricks Open Model |
98 | 39.98 | FireFunction-v1 (FC) | N/A | 2.27 | 43 | 44.57 | 70.46 | 2.38 | 94.44 | 71.8 | Fireworks | Apache 2.0 |
99 | 39.27 | xLAM-7b-fc-r (FC) | N/A | 6.26 | 72.08 | 60.63 | 53.4 | 0 | 77.78 | 44.95 | Salesforce | cc-by-nc-4.0 |
100 | 38.96 | GLM-4-9b-Chat (FC) | N/A | 6.09 | 36.67 | 46 | 66.81 | 3.5 | 66.67 | 79.71 | THUDM | glm-4 |
101 | 38.59 | MiniCPM3-4B (Prompt) | N/A | 20.78 | 65.88 | 50.59 | 54.46 | 2 | 50 | 74.43 | openbmb | Apache-2.0 |
102 | 37.86 | Gemma-3-4b-it (Prompt) | N/A | 12.44 | 63.33 | 47.66 | 59.17 | 0.12 | 77.78 | 48.14 | gemma-terms-of-use | |
103 | 36.93 | Nexusflow-Raven-v2 (FC) | N/A | 1.13 | 45.88 | 59.11 | 54.2 | 1 | 61.11 | 78.53 | Nexusflow | Apache 2.0 |
104 | 36.12 | Qwen2.5-0.5B-Instruct (FC) | N/A | 2.53 | 62.29 | 60.93 | 47.98 | 1.25 | 88.89 | 46.23 | Qwen | apache-2.0 |
105 | 35.43 | Meta-Llama-3-8B-Instruct (Prompt) | N/A | 6.03 | 60.79 | 66.43 | 47.98 | 0.75 | 77.78 | 18.59 | Meta | Meta Llama 3 Community |
106 | 31.26 | Mistral-Small-2402 (Prompt) | 3.91 | 1.57 | 26.94 | 30.36 | 58.77 | 0.75 | 44.44 | 69.74 | Mistral AI | Proprietary |
107 | 29.98 | Falcon3-3B-Instruct (FC) | N/A | 11.52 | 53.25 | 32.66 | 47.4 | 0.5 | 77.78 | 34.47 | TII UAE | falcon-llm-license |
108 | 29.28 | Qwen2-1.5B-Instruct (Prompt) | N/A | 3.09 | 54.29 | 52.39 | 39.05 | 0.5 | 94.44 | 21.19 | Qwen | apache-2.0 |
109 | 28.06 | Qwen2.5-0.5B-Instruct (Prompt) | N/A | 0.95 | 53.19 | 61.89 | 31.59 | 0 | 94.44 | 16.44 | Qwen | apache-2.0 |
110 | 27.59 | Llama-3.1-8B-Instruct (FC) | N/A | 5.79 | 48.21 | 50.18 | 33.5 | 5.38 | 94.44 | 4.86 | Meta | Meta Llama 3 Community |
111 | 27.14 | Llama-3.1-70B-Instruct (FC) | N/A | 5.44 | 25.29 | 31.62 | 45 | 4.88 | 100 | 44.85 | Meta | Meta Llama 3 Community |
112 | 24.95 | xLAM-1b-fc-r (FC) | N/A | 6.26 | 41.17 | 42.95 | 36.92 | 0.12 | 100 | 6.69 | Salesforce | cc-by-nc-4.0 |
113 | 22.43 | DeepSeek-Coder-V2-Lite-Instruct (FC) | N/A | 13.9 | 4.88 | 33.18 | 39.4 | 0.12 | 0 | 96.31 | DeepSeek | DeepSeek License |
114 | 20.59 | Llama-3.2-1B-Instruct (Prompt) | N/A | 6.08 | 28.44 | 25.27 | 31.36 | 0 | 38.89 | 59.7 | Meta | Meta Llama 3 Community |
115 | 17.59 | QwQ-32B-Preview (Prompt) | N/A | 23.37 | 1.48 | 0.5 | 40.78 | 0 | 0 | 99.32 | Qwen | apache-2.0 |
116 | 17.49 | Falcon3-1B-Instruct (FC) | N/A | 2.2 | 9.02 | 11.61 | 32.7 | 0 | 0 | 87.16 | TII UAE | falcon-llm-license |
117 | 16.6 | Gemma-3-1b-it (Prompt) | N/A | 2.47 | 21.5 | 20.5 | 30.34 | 0 | 50 | 30.92 | gemma-terms-of-use |
This fantastic resource is very valuable for all AI Researchers.
Here is a simple C# Code sample:
/// <summary>
/// Test the LLM with tools.
/// </summary>
private async void TestLLMWithTools()
{
// Init a new Tool Registry:
ToolRegistry toolRegistry = new ToolRegistry();
// Init a new LlmClient:
LlmClient llmClient = new LlmClient("http://192.168.0.2:11434", "watt-tool-8B", toolRegistry);
List<string> queries = new List<string>();
// search_the_web questions
queries.Add("What are the top 5 AI companies in 2025?");
queries.Add("Search for recent breakthroughs in quantum computing");
queries.Add("Find me information about Mars colonization plans");
// get_ip_address questions
queries.Add("What's the keey words in the sentence: 'Send an email to john@example.com with subject 'Meeting' and body 'Let's meet tomorrow'");
queries.Add("What are the key words in: Can you tell me the system's public IP?");
queries.Add("What key words can you find in: Please show me my IP address");
// get_ip_address questions
queries.Add("What's my current IP address?");
queries.Add("Can you tell me the system's public IP?");
queries.Add("Please show me my IP address");
// send_email questions
queries.Add("Send an email to john@example.com with subject 'Meeting' and body 'Let's meet tomorrow'");
queries.Add("Email sarah@test.com with subject 'Report' and body 'Here is the latest report'");
queries.Add("Please send an email to mike@work.com, subject 'Update', body 'Project is on track'");
// financial_ratios.interest_coverage questions
queries.Add("What's Tesla's interest coverage ratio for the past 3 years?");
queries.Add("Calculate Apple's interest coverage ratio over 5 years");
queries.Add("Show me Microsoft’s interest coverage ratio for the last 2 years");
// sales_growth.calculate questions
queries.Add("What’s the sales growth rate for Amazon over the past 4 years?");
queries.Add("Calculate Google's sales growth for the last 3 years");
queries.Add("Find the sales growth rate of Netflix for the past 5 years");
// weather_forecast questions
queries.Add("What's the weather forecast for New York for the next 3 days?");
queries.Add("Tell me the 5-day weather forecast for London");
queries.Add("Show me the weather prediction for Tokyo for the next 2 days");
// get_current_time questions
queries.Add("What's the current time in ISO format?");
queries.Add("Show me the time in Unix format");
queries.Add("What’s the present time in ISO format?");
//
foreach (string query in queries)
{
//
string json = await llmClient.CallLLMWithToolsAsync("/api/chat", query);
//
Response response = JsonSerializer.Deserialize<Response>(json);
textBox1.Text = response.Message.Content;
}
}
This small test gives me a 100% success rate, which is fantastic for a small model like this!
Note: Ollama Tool Calls are not supported by the Watt Models, however, the models can be used like standard chat, and the function calling works as if you're chatting with a model.