Ozzie AI - Berkeley Function-Calling Leaderboard

The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: BFCL-v1 introducing AST as an evaluation metric, BFCL-v2 introducing enterprise and OSS-contributed functions, and BFCL-v3 introducing multi-turn interactions. Checkout code and data.

As of today: 27/03/2025 the chart leaders are:

					Single Turn			Multi Turn	Hallucination Measurement
				Latency (s)	Non-live (AST)	Non-live (Exec)	Live (AST)	Multi turn
Rank 🔼	Overall Acc	Model	Cost ($)	Mean	SD	P95	AST Summary	Simple	Multiple	Parallel	Multiple Parallel	Exec Summary	Simple	Multiple	Parallel	Multiple Parallel	Overall Acc	Simple	Multiple	Parallel	Multiple Parallel	Overall Acc	Base	Miss Func	Miss Param	Long Context	Relevance	Irrelevance	Organization	License
1	74.31	watt-tool-70B (FC)	N/A	3.4	12.61	7.7	84.06	78.75	94	85.5	78	89.39	98.57	94	90	75	77.74	86.05	83.48	81.25	62.5	58.75	67.5	57.5	48.5	61.5	94.44	76.32	Watt AI Lab	Apache-2.0
2	72.08	GPT-4o-2024-11-20 (Prompt)	13.54	0.78	0.93	1.48	88.1	79.42	95.5	94	83.5	89.38	100	94	86	77.5	79.83	84.88	79.77	87.5	75	47.62	59	41	35.5	55	83.33	83.76	OpenAI	Proprietary
3	69.94	GPT-4.5-Preview-2025-02-27 (FC)	238.13	3.25	5.8	7.36	86.12	75.5	91.5	92.5	85	83.98	79.43	86	88	82.5	79.34	80.23	78.63	68.75	70.83	45.25	57.5	37.5	43	43	66.67	83.64	OpenAI	Proprietary
4	69.58	GPT-4o-2024-11-20 (FC)	8.23	1.11	1.73	2.29	87.42	77.17	93.5	93	86	89.2	88.29	92	94	82.5	79.65	81.4	78.82	87.5	75	41	62.5	6	37.5	58	83.33	83.15	OpenAI	Proprietary
5	68.39	ToolACE-2-8B (FC)	N/A	6.95	23.41	33.5	87.58	75.33	92.5	92.5	90	87.11	95.43	92	86	75	80.05	70.93	79.01	81.25	54.17	36.88	48.5	29	28	42	72.22	90.11	Huawei Noah & USTC	Apache-2.0
6	67.98	watt-tool-8B (FC)	N/A	1.31	2.79	4.04	86.56	76.75	95	94	80.5	89.34	97.86	94	88	77.5	76.5	76.74	77.49	87.5	70.83	39.12	47.5	41.5	27	40.5	83.33	83.15	Watt AI Lab	Apache-2.0
7	67.88	GPT-4-turbo-2024-04-09 (FC)	33.22	2.47	6.27	5.08	84.73	70.42	91	90	87.5	85.21	87.36	90	86	77.5	80.5	83.72	78.63	81.25	70.83	38.12	54	13.5	35.5	49.5	72.22	83.81	OpenAI	Proprietary
8	67.87	o1-2024-12-17 (Prompt)	102.49	5.36	4.48	13.29	85.67	72.67	93.5	91.5	85	87.45	89.29	92	86	82.5	80.63	82.95	76.54	81.25	75	36	50.5	0.5	48.5	44.5	72.22	87.78	OpenAI	Proprietary
9	67.72	BitAgent-8B	N/A	26.73	23.43	74.82	86.92	76.17	95	94	82.5	89.52	98.57	94	88	77.5	76.14	77.91	77.4	87.5	70.83	38.5	48	40	26.5	39.5	83.33	82.38	Bittensor	Apache-2.0
10	65.12	o3-mini-2025-01-31 (Prompt)	8.1	4.86	8.69	11.69	86.15	75.08	92.5	90.5	86.5	89.46	97.86	96	84	80	79.08	82.95	77.11	87.5	70.83	28.75	31.5	26.5	23	34	72.22	82.96	OpenAI	Proprietary
11	64.1	CoALM-405B	N/A	6.2	9.72	18.39	90.58	82.33	97.5	95	87.5	89.07	99.29	94	88	75	74.5	84.88	83	87.5	70.83	28.75	37	29	18.5	30.5	100	71.79	UIUC + Oumi	Meta Llama 3 Community
12	64.1	GPT-4o-mini-2024-07-18 (FC)	0.51	1.49	9.88	3.01	85.21	74.83	92	90	84	83.57	83.29	92	84	75	74.41	78.68	76.16	87.5	70.83	34.12	47.5	19.5	29	40.5	83.33	74.75	OpenAI	Proprietary
13	63.81	Command A (FC)	N/A	N/A	N/A	N/A	86.65	74.08	93	93.5	86	83.38	98	90	78	67.5	80.45	84.5	77.68	81.25	62.5	25.5	35	24.5	21.5	21	72.22	86.19	Cohere	CC-BY-NC 4.0 License (w/ Acceptable Use Addendum)
14	63.53	Qwen2.5-72B-Instruct (FC)	N/A	47.29	49.13	96.58	88.13	76	95.5	92	89	87.57	98.29	92	80	80	78.98	79.84	78.25	56.25	62.5	24.62	31.5	24	22.5	20.5	70.59	80.31	Qwen	qwen
15	62.84	Qwen2.5-32B-Instruct (FC)	N/A	21.0	33.42	53.06	87.21	72.83	94	93.5	88.5	86.77	97.57	88	84	77.5	79.69	80.23	80.06	43.75	62.5	22.25	29.5	25.5	20.5	13.5	64.71	81.92	Qwen	apache-2.0
16	62.73	Functionary-Medium-v3.1 (FC)	N/A	14.06	57.4	35.06	89.88	76	97	95	91.5	91.32	99.29	94	92	80	76.63	81.78	83.29	68.75	70.83	21.62	32	21.5	26.5	6.5	72.22	76.08	MeetKai	MIT
17	62.51	GPT-4.5-Preview-2025-02-27 (Prompt)	464.51	2.51	14.41	4.49	89.98	80.42	95.5	92.5	91.5	88.5	99.5	90	82	82.5	82.99	87.98	81.86	93.75	66.67	15.88	20.5	4.5	16.5	22	66.67	83.81	OpenAI	Proprietary
18	62.38	Gemini-2.0-Pro-Exp-02-05 (Prompt)	0.0	2.13	10.7	2.15	88.85	76.92	95	90.5	93	83.82	77.79	92	88	77.5	78.23	81.01	75.12	93.75	83.33	22.25	28.5	17.5	17.5	25.5	72.22	85	Google	Proprietary
19	62.21	GPT-4o-mini-2024-07-18 (Prompt)	0.84	1.31	6.89	2.4	86.77	80.08	90.5	89.5	87	89.77	98.57	96	82	82.5	76.5	81.4	76.73	93.75	79.17	22	33	12	17	26	83.33	80.67	OpenAI	Proprietary
20	61.83	Hammer2.1-7b (FC)	N/A	2.08	4.12	5.38	88.65	78.08	95	93.5	88	85.48	86.43	92	86	77.5	75.11	76.74	77.4	81.25	70.83	23.5	35.5	25.5	19	14	82.35	78.59	MadeAgents	cc-by-nc-4.0
21	61.55	Gemini-2.0-Pro-Exp-02-05 (FC)	0.0	1.79	8.58	1.85	83.94	69.75	89	93	84	85.11	80.43	94	86	80	78.5	76.74	68.85	87.5	75	20.75	23.5	17.5	18.5	23.5	44.44	91.83	Google	Proprietary
22	61.38	Amazon-Nova-Pro-v1:0 (FC)	5.26	2.67	2.6	5.58	84.46	68.83	92.5	92	84.5	85.64	97.07	84	84	77.5	74.32	80.23	77.49	81.25	58.33	26.12	37.5	19	22	26	77.78	70.98	Amazon	Proprietary
23	61.31	Qwen2.5-72B-Instruct (Prompt)	N/A	3.72	6.88	9.64	90.81	80.25	97.5	93.5	92	92.7	99.29	94	90	87.5	75.3	85.27	82.15	62.5	75	18	24.5	20	15.5	12	100	72.81	Qwen	qwen
24	60.42	Gemini-2.0-Flash-001 (FC)	0.62	0.7	1.24	0.99	84.9	71.58	88.5	90.5	89	81.23	75.43	90	82	77.5	79.12	74.42	68.28	81.25	66.67	17.88	20	12.5	16.5	22.5	50	94.04	Google	Proprietary
25	59.67	Qwen2.5-32B-Instruct (Prompt)	N/A	2.26	4.62	5.92	85.81	70.25	94.5	90.5	88	89.79	96.64	90	90	82.5	74.23	82.95	78.54	62.5	58.33	17.75	25	20	15	11	100	73.75	Qwen	apache-2.0
26	59.57	GPT-4-turbo-2024-04-09 (Prompt)	58.87	1.24	1.33	2.58	90.88	82.5	95.5	93.5	92	89.45	99.29	96	80	82.5	63.84	87.98	84.14	100	79.17	30.25	42.5	25	20.5	33	100	35.57	OpenAI	Proprietary
27	59.38	Gemini-2.0-Flash-001 (Prompt)	0.75	0.89	1.12	1.23	84.48	74.92	89.5	86.5	87	81.54	87.14	84	80	75	81.39	75.58	73.12	81.25	83.33	12.75	15.5	9	12.5	14	55.56	92.75	Google	Proprietary
28	59.22	Gemma-3-27b-it (Prompt)	N/A	20.08	32.89	52.41	89.02	78.58	94.5	91.5	91.5	87.2	93.29	92	86	77.5	76.37	86.43	78.73	93.75	66.67	14.5	18	12	11.5	16.5	83.33	73.33	Google	gemma-terms-of-use
29	59.07	Hammer2.1-3b (FC)	N/A	1.95	4.31	5.09	86.85	81.42	95	89.5	81.5	84.09	82.86	92	84	77.5	74.04	73.26	73.31	62.5	66.67	17.38	27.5	17.5	14.5	10	82.35	81.87	MadeAgents	qwen-research
30	58.93	Gemini-2.0-Flash-Thinking-Exp-01-21 (Prompt)	0.0	4.81	21.56	8.16	87.4	75.58	95	92	87	87.07	97.29	90	86	75	75.97	86.43	81.01	87.5	87.5	14.5	17.5	19	10	11.5	77.78	72.75	Google	Proprietary
31	58.9	Qwen2.5-14B-Instruct (FC)	N/A	11.22	19.32	29.03	85.42	69.67	95	88	89	84.86	90.43	92	72	85	76.68	77.13	75.02	75	70.83	15.88	19.5	17	16.5	10.5	55.56	77.69	Qwen	apache-2.0
32	58.55	DeepSeek-V3 (FC)	N/A	2.62	5.88	4.53	89.17	78.67	95.5	91	91.5	92.32	98.29	94	92	85	68.41	83.72	82.15	81.25	62.5	18.62	21	20.5	19	14	88.89	59.36	DeepSeek	DeepSeek License
33	58.45	mistral-large-2407 (FC)	12.68	3.12	10.75	6.21	86.81	74.25	92.5	90	90.5	84.38	75	94	86	82.5	69.88	85.27	78.54	62.5	79.17	23.75	33.5	18	23.5	20	72.22	52.85	Mistral AI	Proprietary
34	58.42	ToolACE-8B (FC)	N/A	5.24	15.7	9.8	87.54	76.67	93.5	90.5	89.5	89.21	97.36	94	88	77.5	78.59	73.26	76.73	81.25	70.83	7.75	7.5	11.5	5	7	83.33	87.88	Huawei Noah & USTC	Apache-2.0
35	58.3	Claude-3.7-Sonnet-20250219 (FC)	2.56	2.78	1.73	4.49	41.29	71.67	93.5	0	0	47.07	98.29	90	0	0	78.41	82.56	76.45	0	0	48.38	54	45	46	48.5	72.22	81.4	Anthropic	Proprietary
36	57.78	xLAM-8x22b-r (FC)	N/A	9.26	11.66	21.27	83.69	77.75	94.5	86.5	76	87.88	95	94	90	72.5	72.59	80.23	79.68	81.25	70.83	16.25	25.5	16	11.5	12	88.89	67.81	Salesforce	cc-by-nc-4.0
37	57.68	Qwen2.5-14B-Instruct (Prompt)	N/A	2.02	4.99	5.0	85.69	73.25	92.5	92	85	88.84	92.36	90	88	85	74.14	74.42	75.78	62.5	66.67	12.25	19	11.5	12	6.5	77.78	77.06	Qwen	apache-2.0
38	57.53	DeepSeek-R1 (Prompt)	N/A	40.78	39.19	110.02	87.35	76.42	94.5	90.5	88	88.23	96.43	88	86	82.5	74.41	84.11	79.87	87.5	70.83	12.38	11.5	15.5	11	11.5	66.67	67.54	DeepSeek	MIT
39	57.11	Haha-7B	N/A	3.76	9.91	8.47	86.02	78.08	95.5	89.5	81	86.11	80.43	96	88	80	74.19	78.29	77.59	75	70.83	10.38	13	10	11.5	7	83.33	80.66	TeleAI	Apache 2.0
40	56.7	Qwen2.5-7B-Instruct (FC)	N/A	6.42	19.0	13.58	85.71	71.83	95	90	86	87.73	95.43	94	84	77.5	74.19	75.58	75.59	68.75	66.67	11.5	13.5	14.5	11	7	77.78	69.08	Qwen	apache-2.0
41	56.49	Functionary-Small-v3.1 (FC)	N/A	18.44	35.32	51.23	86.75	74	94.5	90.5	88	87.12	89.5	94	90	75	73.75	79.84	78.16	81.25	62.5	10.12	18	2.5	14	6	77.78	70.89	MeetKai	MIT
42	56.47	Claude-3-Opus-20240229 (FC)	20.15	9.46	9.94	17.14	57.92	67.17	93	39.5	32	59.46	80.36	88	42	27.5	78.05	79.07	75.78	31.25	37.5	30.25	41.5	14	33.5	32	61.11	81.59	Anthropic	Proprietary
43	56.46	Claude-3.5-Sonnet-20241022 (FC)	2.53	3.07	5.61	4.78	45.44	78.75	94.5	3.5	5	47.89	97.57	90	4	0	78.94	84.11	81.96	25	20.83	41	55	19	42.5	47.5	77.78	74.04	Anthropic	Proprietary
44	56.12	Amazon-Nova-Lite-v1:0 (FC)	0.42	1.64	2.57	2.52	78.44	69.75	94	84	66	80.25	92	84	80	65	71.61	72.87	70.09	75	66.67	17.38	27.5	5.5	17.5	19	66.67	76.41	Amazon	Proprietary
45	55.65	o1-2024-12-17 (FC)	68.62	4.86	5.11	13.78	40.23	67.92	93	0	0	46.52	92.07	94	0	0	78.01	81.78	79.01	0	0	41	52.5	38	30.5	43	72.22	81.97	OpenAI	Proprietary
46	55.51	Gemini-2.0-Flash-Lite-001 (Prompt)	0.31	0.68	0.41	0.98	84.21	78.83	93.5	86.5	78	79.71	91.36	88	82	57.5	79.61	79.07	74.74	75	58.33	3.88	4.5	2.5	4.5	4	66.67	89.09	Google	Proprietary
47	55.51	DeepSeek-Coder-V2 (FC)	N/A	29.53	108.9	59.61	89.44	78.75	94.5	93.5	91	91.23	96.43	94	92	82.5	73.48	80.62	77.02	43.75	70.83	4.5	7.5	3	4	3.5	88.89	70.81	DeepSeek	DeepSeek License
48	55.11	Gemini-2.0-Flash-Lite-001 (FC)	0.26	0.63	1.08	0.86	83.73	63.92	89.5	93	88.5	78.39	67.07	90	84	72.5	77.25	71.32	67.05	81.25	70.83	5.5	7	1	6.5	7.5	50	93.21	Google	Proprietary
49	54.87	Hammer2.1-1.5b (FC)	N/A	2.73	3.86	7.45	82.79	74.67	92	84.5	80	83.39	86.57	90	82	75	70.64	71.32	69.8	50	62.5	10.5	14.5	12.5	9	6	77.78	79.27	MadeAgents	cc-by-nc-4.0
50	54.75	xLAM-7b-r (FC)	N/A	12.74	25.13	24.76	81.06	74.25	95.5	81	73.5	79.88	74	96	82	67.5	75.22	72.09	74.93	50	62.5	10	16.5	8.5	7.5	7.5	94.44	77.11	Salesforce	cc-by-nc-4.0
51	54.55	GoGoAgent	N/A	2.66	3.08	5.56	86.23	75.42	93	92	84.5	89.86	95.43	96	88	80	74.01	72.87	75.4	68.75	66.67	1	1.5	2	0.5	0	77.78	83.12	BitAgent	Proprietary
52	54.51	CoALM-8B	N/A	3.65	13.86	7.78	85.25	68.5	95	89	88.5	86.07	93.29	90	86	75	72.95	71.71	66.67	56.25	54.17	4.5	5	5	4	4	83.33	85.49	UIUC + Oumi	Meta Llama 3 Community
53	54.36	CoALM-70B	N/A	8.95	30.94	27.95	83.08	68.83	92.5	88	83	83.7	83.79	90	86	75	71.92	70.16	65.34	68.75	58.33	7	7.5	11	4.5	5	66.67	85.63	UIUC + Oumi	Meta Llama 3 Community
54	54.31	claude-3.5-haiku-20241022 (Prompt)	0.48	1.84	5.21	3.57	83.19	76.25	93	84	79.5	84.71	97.86	90	76	75	70.77	84.88	75.02	87.5	54.17	9.75	16	0.5	8	14.5	77.78	65.78	Anthropic	Proprietary
55	54.19	Llama-3.1-70B-Instruct (Prompt)	N/A	4.95	14.92	14.75	89.98	77.92	96	94.5	91.5	90.12	94	98	86	82.5	62.24	78.29	76.16	87.5	66.67	12.5	16.5	13	10.5	10	100	54.78	Meta	Meta Llama 3 Community
56	53.91	GPT-3.5-Turbo-0125 (FC)	1.38	0.87	1.45	1.47	83.94	74.25	93.5	89	79	83.79	96.14	88	86	65	64.02	81.4	79.68	43.75	58.33	19.5	32.5	11.5	21.5	12.5	94.44	36.53	OpenAI	Proprietary
57	53.69	Qwen2.5-7B-Instruct (Prompt)	N/A	4.54	11.64	11.02	86.46	75.33	94.5	91.5	84.5	88.29	92.14	90	86	85	67.44	76.74	74.93	62.5	70.83	7.62	9.5	8.5	7	5.5	88.89	65.16	Qwen	apache-2.0
58	53.44	Sky-T1-32B-Preview (Prompt)	N/A	18.64	57.21	217.9	88.67	78.67	94.5	94	87.5	91.54	97.14	92	92	85	68	77.52	77.11	62.5	62.5	4.63	8	3.5	3	4	94.12	61.21	NovaSky-AI	apache-2.0
59	53.27	claude-3.5-haiku-20241022 (FC)	0.83	2.83	1.94	5.18	40.62	68	92	2.5	0	50.46	87.86	90	24	0	72.37	82.95	78.35	18.75	0	40	54.5	26.5	35	44	83.33	63.68	Anthropic	Proprietary
60	53.06	FireFunction-v2 (FC)	N/A	2.13	1.17	3.87	88.46	80.33	94	91.5	88	87.54	96.64	92	84	77.5	65.66	79.07	78.35	56.25	70.83	8.62	13.5	7	11	3	94.44	53.02	Fireworks	Apache 2.0
61	52.58	xLAM-8x7b-r (FC)	N/A	4.96	5.57	10.33	67.65	73.58	90	69	38	74.05	89.21	90	72	45	71.08	74.81	79.3	43.75	58.33	15.5	26	13	11.5	11.5	94.44	67.15	Salesforce	cc-by-nc-4.0
62	52.2	Mistral-Medium-2312 (Prompt)	11.07	4.57	18.11	9.97	73.12	69.5	88.5	69	65.5	81.57	93.29	86	72	75	77.61	75.97	74.07	81.25	54.17	0.38	1	0	0	0.5	66.67	85.93	Mistral AI	Proprietary
63	52.15	Command R7B (FC)	0.1	1.35	4.86	2.47	81.67	68.17	91.5	85.5	81.5	84.02	87.07	92	82	75	69.17	63.18	58.69	56.25	62.5	5	6.5	1.5	6.5	5.5	55.56	81.02	Cohere	cc-by-nc-4.0
64	51.77	Ministral-8B-Instruct-2410 (FC)	N/A	12.79	45.03	47.12	83.83	71.83	91.5	84.5	87.5	79.57	71.29	86	86	75	64.93	75.58	72.27	62.5	62.5	11.38	21.5	8.5	10	5.5	70.59	55.28	Mistral AI	Mistral AI Research License
65	51.76	Meta-Llama-3-70B-Instruct (Prompt)	N/A	3.65	3.82	10.68	87.81	76.75	95	92.5	87	88.21	95.86	94	78	85	64.95	81.01	78.25	75	66.67	5.62	10	4	6	2.5	100	50.88	Meta	Meta Llama 3 Community
66	51.69	Claude-3.5-Sonnet-20241022 (Prompt)	1.23	1.81	1.35	3.3	72.48	81.42	92	70.5	46	80	100	92	68	60	71.97	86.82	80.06	81.25	45.83	7.5	9	5.5	5	10.5	77.78	64.4	Anthropic	Proprietary
67	51.68	MiniCPM3-4B-FC (FC)	N/A	160.19	184.1	464.0	80.83	69.83	91.5	82.5	79.5	87.57	89.29	90	86	85	70.01	74.81	63.91	43.75	62.5	2.62	5	1	3	1.5	72.22	72.22	openbmb	Apache-2.0
68	51.43	Llama-3.3-70B-Instruct (Prompt)	N/A	6.98	25.27	23.42	85.08	74.83	94.5	84	87	90.68	95.71	98	84	85	62.77	81.78	77.11	93.75	66.67	6.87	9	8	4.5	6	100	48.71	Meta	Meta Llama 3 Community
69	51.37	Amazon-Nova-Micro-v1:0 (FC)	0.23	1.49	1.99	2.32	71.12	63.5	88	77.5	55.5	69.23	80.43	76	68	52.5	67.04	65.89	64.2	62.5	45.83	16.12	24.5	5.5	14	20.5	72.22	74.2	Amazon	Proprietary
70	51.37	Claude-3-Opus-20240229 (Prompt)	10.48	4.6	8.24	10.54	85.31	79.75	95	85.5	81	86.32	99.29	90	86	70	66.99	85.27	79.11	68.75	54.17	7.13	11.5	2.5	6	8.5	83.33	40.25	Anthropic	Proprietary
71	51.31	Open-Mistral-Nemo-2407 (FC)	1.18	1.55	1.96	3.42	82.1	64.42	93.5	85.5	85	77.66	56.14	94	88	72.5	65.97	77.13	69.61	75	70.83	9.38	15	4	9.5	9	66.67	63.19	Mistral AI	Proprietary
72	51.03	Gemma-3-12b-it (Prompt)	N/A	7.9	9.01	21.47	83.83	77.33	95	90	73	82.8	84.71	94	80	72.5	67.13	84.88	70.85	87.5	62.5	4.62	8	3.5	2.5	4.5	88.89	61.11	Google	gemma-terms-of-use
73	50.87	Llama-3.1-8B-Instruct (Prompt)	N/A	10.3	52.75	24.53	84.21	72.83	93.5	87	83.5	86.3	83.71	96	88	77.5	61.08	74.03	73.31	56.25	54.17	9.62	13	10	7.5	8	77.78	48.82	Meta	Meta Llama 3 Community
74	50.75	Qwen2.5-3B-Instruct (FC)	N/A	4.24	8.17	10.16	78.83	73.33	92	73.5	76.5	78.23	86.93	90	66	70	69.39	74.03	72.08	62.5	45.83	6	8.5	6	4.5	5	88.89	64.26	Qwen	qwen
75	50.59	Claude-3.7-Sonnet-20250219 (Prompt)	1.1	1.7	1.51	2.85	88	79	95	89.5	88.5	87.5	99	94	82	75	65.79	87.98	83.57	68.75	50	0.62	0	0	1	1.5	100	52	Anthropic	Proprietary
76	50.36	Open-Mixtral-8x22b (Prompt)	12.83	1.36	6.03	3.17	88.02	78.58	94	89.5	90	87.77	93.57	96	84	77.5	66.02	83.33	72.65	81.25	70.83	0.5	1	0	0	1	83.33	55.09	Mistral AI	Proprietary
77	50.2	o3-mini-2025-01-31 (FC)	4.75	4.03	5.43	10.14	42.12	75.5	93	0	0	43.2	76.79	96	0	0	77.3	81.4	78.63	0	0	26.12	32.5	17.5	24	30.5	77.78	80.67	OpenAI	Proprietary
78	49.35	Command-R-Plus (FC)	7.8	2.58	9.12	3.87	77.02	72.08	89.5	82.5	64	81.21	90.86	90	84	60	59	70.54	58.78	62.5	45.83	13.12	16.5	10	9	17	72.22	53.16	Cohere For AI	cc-by-nc-4.0
79	49.31	Granite-20b-FunctionCalling (FC)	N/A	1.84	1.74	5.0	82.46	72.83	91.5	84	81.5	86.36	84.93	92	86	82.5	59.66	68.22	56.32	43.75	58.33	3.38	6	1.5	4.5	1.5	88.89	74.82	IBM	Apache-2.0
80	48.72	Qwen2.5-1.5B-Instruct (FC)	N/A	2.83	10.74	5.16	79.1	72.42	87	81.5	75.5	82.12	88	90	78	72.5	64.82	74.03	66.1	50	45.83	2.5	4	1.5	3	1.5	94.44	62.68	Qwen	apache-2.0
81	48.32	GPT-3.5-Turbo-0125 (Prompt)	2.16	0.72	1.79	1.21	72.85	77.92	93.5	67	53	70.39	57.57	90	74	60	68.55	80.62	78.63	75	58.33	5.62	9	2	7	4.5	94.44	58.39	OpenAI	Proprietary
82	47.3	Falcon3-10B-Instruct (FC)	N/A	8.92	22.84	22.27	84.62	70.5	93.5	87.5	87	90.91	97.14	92	92	82.5	54.11	76.36	76.16	50	41.67	5	6	5	4.5	4.5	94.44	31.89	TII UAE	falcon-llm-license
83	47.29	Hermes-2-Pro-Llama-3-8B (FC)	N/A	3.97	3.91	8.21	76.79	64.17	89.5	80	73.5	76.23	70.43	94	78	62.5	64.95	72.09	65.81	56.25	50	2.38	4.5	1.5	2	1.5	44.44	60.78	NousResearch	apache-2.0
84	47.27	mistral-large-2407 (Prompt)	24.91	3.32	3.99	6.94	90.54	82.17	97	92.5	90.5	90.12	100	94	84	82.5	52.82	86.05	81.96	93.75	83.33	8.38	15	6	6	6.5	100	4.35	Mistral AI	Proprietary
85	47.09	Qwen2.5-3B-Instruct (Prompt)	N/A	1.03	1.43	1.78	80.79	74.17	90.5	79.5	79	81.71	80.86	86	80	80	58.69	69.77	66.48	56.25	62.5	3.38	5.5	3.5	2	2.5	88.89	54.19	Qwen	qwen
86	46.92	Llama-3.2-3B-Instruct (Prompt)	N/A	7.79	45.67	14.71	80.56	73.75	92	80.5	76	83.7	87.29	92	78	77.5	55.8	63.95	64.86	12.5	45.83	5.25	8.5	2.5	4.5	5.5	88.89	51.69	Meta	Meta Llama 3 Community
87	46.71	Qwen2.5-1.5B-Instruct (Prompt)	N/A	2.51	6.07	4.63	73.37	71	86	70	66.5	85.61	80.43	94	88	80	61.08	70.54	59.26	56.25	41.67	1.12	1.5	2.5	0.5	0	83.33	63.04	Qwen	apache-2.0
88	45.68	Falcon3-7B-Instruct (FC)	N/A	11.24	28.01	25.0	82.31	64.75	89.5	86.5	88.5	86.62	89	94	86	77.5	54.86	74.03	66.48	75	62.5	3.38	3.5	3.5	3.5	3	88.89	33.73	TII UAE	falcon-llm-license
89	45.28	Hammer2.1-0.5b (FC)	N/A	1.29	3.16	2.85	69.12	68	83	71.5	54	70.46	68.36	84	82	47.5	62.91	60.08	58.02	50	45.83	2.25	4	0.5	3	1.5	77.78	73.94	MadeAgents	cc-by-nc-4.0
90	44.79	Mistral-small-2402 (FC)	3.36	1.73	4.04	3.5	59.15	67.58	94	24.5	50.5	53.84	87.36	92	16	20	72.19	65.5	71.51	12.5	12.5	2.62	4.5	0	3	3	77.78	80.86	Mistral AI	Proprietary
91	43.15	Hermes-2-Pro-Mistral-7B (FC)	N/A	10.63	32.72	23.76	73.06	60.75	87.5	78.5	65.5	76	61	94	84	65	57.71	69.77	60.02	43.75	41.67	2.63	3.5	4	2.5	0.5	66.67	38.88	NousResearch	apache-2.0
92	43.09	Open-Mixtral-8x7b (Prompt)	2.74	1.73	4.5	3.51	63.58	64.83	86	59	44.5	69.61	77.93	86	62	52.5	61.44	63.57	66.1	68.75	50	1.5	2.5	0	1.5	2	88.89	59.52	Mistral AI	Proprietary
93	43.02	Open-Mixtral-8x22b (FC)	7.0	2.63	15.88	5.36	61.67	71.67	94	10.5	70.5	63.64	83.57	94	22	55	68.64	77.13	73.12	6.25	45.83	1.5	3.5	0	1	1.5	83.33	45.71	Mistral AI	Proprietary
94	42.56	Open-Mistral-Nemo-2407 (Prompt)	1.79	1.65	10.01	3.26	86.12	77	93.5	89.5	84.5	89.07	93.79	92	88	82.5	49.04	77.91	74.45	87.5	66.67	0.25	0.5	0	0	0.5	88.89	6.43	Mistral AI	Proprietary
95	42.3	Qwen2-7B-Instruct (Prompt)	N/A	3.99	10.26	9.78	76.65	68.08	88	75.5	75	76.8	80.21	84	78	65	50.6	56.59	62.01	37.5	66.67	3.25	4	4.5	2.5	2	88.89	39	Qwen	apache-2.0
96	41.61	Bielik-11B-v2.3-Instruct (Prompt)	N/A	4.87	13.23	12.3	65.04	71.17	93.5	46	49.5	65.16	76.64	90	44	50	58.91	72.87	69.33	43.75	54.17	3.75	7	0.5	3	4.5	77.78	40.58	SpeakLeash & ACK Cyfronet AGH	Apache 2.0
97	40.96	DBRX-Instruct (Prompt)	8.49	3.74	8.22	11.19	61.25	73.5	92	42.5	37	69.14	90.07	88	46	52.5	60.28	78.29	73.03	75	41.67	0	0	0	0	0	94.44	40.5	Databricks	Databricks Open Model
98	39.98	FireFunction-v1 (FC)	N/A	2.27	3.77	3.71	43	80	92	0	0	44.57	88.29	90	0	0	70.46	71.71	72.93	0	0	2.38	5	0	2	2.5	94.44	71.8	Fireworks	Apache 2.0
99	39.27	xLAM-7b-fc-r (FC)	N/A	6.26	4.43	13.94	72.08	76.83	93.5	77	41	60.63	84.5	92	56	10	53.4	78.68	58.02	31.25	25	0	0	0	0	0	77.78	44.95	Salesforce	cc-by-nc-4.0
100	38.96	GLM-4-9b-Chat (FC)	N/A	6.09	15.35	13.2	36.67	65.17	81.5	0	0	46	94	90	0	0	66.81	72.48	64.39	0	0	3.5	3.5	4	2.5	4	66.67	79.71	THUDM	glm-4
101	38.59	MiniCPM3-4B (Prompt)	N/A	20.78	49.16	64.58	65.88	63.5	72.5	65.5	62	50.59	40.36	34	48	80	54.46	46.51	34.76	43.75	41.67	2	3	3.5	1	0.5	50	74.43	openbmb	Apache-2.0
102	37.86	Gemma-3-4b-it (Prompt)	N/A	12.44	18.06	30.4	63.33	64.33	91.5	56.5	41	47.66	68.14	80	30	12.5	59.17	72.87	62.77	37.5	29.17	0.12	0	0	0.5	0	77.78	48.14	Google	gemma-terms-of-use
103	36.93	Nexusflow-Raven-v2 (FC)	N/A	1.13	0.55	2.27	45.88	57.5	53	34	39	59.11	47.93	86	40	62.5	54.2	41.47	38.65	56.25	41.67	1	1.5	0.5	1	1	61.11	78.53	Nexusflow	Apache 2.0
104	36.12	Qwen2.5-0.5B-Instruct (FC)	N/A	2.53	9.25	4.36	62.29	61.17	78	60	50	60.93	51.21	88	52	52.5	47.98	56.2	41.31	56.25	20.83	1.25	1	2	1	1	88.89	46.23	Qwen	apache-2.0
105	35.43	Meta-Llama-3-8B-Instruct (Prompt)	N/A	6.03	8.86	20.61	60.79	62.67	82.5	48	50	66.43	77.71	86	42	60	47.98	61.24	61.44	37.5	33.33	0.75	1.5	0	1	0.5	77.78	18.59	Meta	Meta Llama 3 Community
106	31.26	Mistral-Small-2402 (Prompt)	3.91	1.57	0.96	3.37	26.94	23.25	74	8.5	2	30.36	52.93	64	2	2.5	58.77	36.43	65.24	0	8.33	0.75	0.5	0	1.5	1	44.44	69.74	Mistral AI	Proprietary
107	29.98	Falcon3-3B-Instruct (FC)	N/A	11.52	33.0	41.43	53.25	58	69	61	25	32.66	54.64	46	20	10	47.4	55.43	56.32	31.25	37.5	0.5	0.5	0.5	0	1	77.78	34.47	TII UAE	falcon-llm-license
108	29.28	Qwen2-1.5B-Instruct (Prompt)	N/A	3.09	11.89	5.41	54.29	51.17	79	46.5	40.5	52.39	46.57	76	52	35	39.05	48.84	40.27	12.5	25	0.5	0.5	1	0	0.5	94.44	21.19	Qwen	apache-2.0
109	28.06	Qwen2.5-0.5B-Instruct (Prompt)	N/A	0.95	1.25	1.47	53.19	58.25	68	53.5	33	61.89	63.07	70	62	52.5	31.59	53.88	34.76	56.25	16.67	0	0	0	0	0	94.44	16.44	Qwen	apache-2.0
110	27.59	Llama-3.1-8B-Instruct (FC)	N/A	5.79	16.83	13.08	48.21	55.83	54	48.5	34.5	50.18	58.71	58	54	30	33.5	51.94	49	37.5	41.67	5.38	5	7.5	5	4	94.44	4.86	Meta	Meta Llama 3 Community
111	27.14	Llama-3.1-70B-Instruct (FC)	N/A	5.44	12.05	12.13	25.29	49.17	24.5	12.5	15	31.62	53	36	30	7.5	45	52.33	52.61	31.25	25	4.88	7	4	4.5	4	100	44.85	Meta	Meta Llama 3 Community
112	24.95	xLAM-1b-fc-r (FC)	N/A	6.26	14.51	13.84	41.17	71.67	86	5	2	42.95	77.79	90	4	0	36.92	63.95	53.37	6.25	0	0.12	0.5	0	0	0	100	6.69	Salesforce	cc-by-nc-4.0
113	22.43	DeepSeek-Coder-V2-Lite-Instruct (FC)	N/A	13.9	30.47	39.18	4.88	0	1.5	3.5	14.5	33.18	17.71	42	28	45	39.4	2.33	3.8	0	8.33	0.12	0.5	0	0	0	0	96.31	DeepSeek	DeepSeek License
114	20.59	Llama-3.2-1B-Instruct (Prompt)	N/A	6.08	17.77	32.86	28.44	29.25	33.5	36	15	25.27	34.07	28	34	5	31.36	31.4	7.6	12.5	4.17	0	0	0	0	0	38.89	59.7	Meta	Meta Llama 3 Community
115	17.59	QwQ-32B-Preview (Prompt)	N/A	23.37	26.08	74.38	1.48	0.92	4.5	0.5	0	0.5	0	0	2	0	40.78	7.36	2.75	0	0	0	0	0	0	0	0	99.32	Qwen	apache-2.0
116	17.49	Falcon3-1B-Instruct (FC)	N/A	2.2	5.42	5.39	9.02	3.58	6	17.5	9	11.61	9.43	4	18	15	32.7	4.65	2.37	0	12.5	0	0	0	0	0	0	87.16	TII UAE	falcon-llm-license
117	16.6	Gemma-3-1b-it (Prompt)	N/A	2.47	3.4	5.91	21.5	43.5	38.5	2	2	20.5	34	44	4	0	30.34	31.01	10.54	0	0	0	0	0	0	0	50	30.92	Google	gemma-terms-of-use

This fantastic resource is very valuable for all AI Researchers.

Here is a simple C# Code sample:

/// <summary>
/// Test the LLM with tools.
/// </summary>
private async void TestLLMWithTools()
{

    // Init a new Tool Registry:
    ToolRegistry toolRegistry = new ToolRegistry();

    // Init a new LlmClient:
    LlmClient llmClient = new LlmClient("http://192.168.0.2:11434", "watt-tool-8B", toolRegistry);

    List<string> queries = new List<string>();

    // search_the_web questions
    queries.Add("What are the top 5 AI companies in 2025?");
    queries.Add("Search for recent breakthroughs in quantum computing");
    queries.Add("Find me information about Mars colonization plans");

    // get_ip_address questions
    queries.Add("What's the keey words in the sentence: 'Send an email to john@example.com with subject 'Meeting' and body 'Let's meet tomorrow'");
    queries.Add("What are the key words in: Can you tell me the system's public IP?");
    queries.Add("What key words can you find in: Please show me my IP address");

    // get_ip_address questions
    queries.Add("What's my current IP address?");
    queries.Add("Can you tell me the system's public IP?");
    queries.Add("Please show me my IP address");

    // send_email questions
    queries.Add("Send an email to john@example.com with subject 'Meeting' and body 'Let's meet tomorrow'");
    queries.Add("Email sarah@test.com with subject 'Report' and body 'Here is the latest report'");
    queries.Add("Please send an email to mike@work.com, subject 'Update', body 'Project is on track'");

    // financial_ratios.interest_coverage questions
    queries.Add("What's Tesla's interest coverage ratio for the past 3 years?");
    queries.Add("Calculate Apple's interest coverage ratio over 5 years");
    queries.Add("Show me Microsoft’s interest coverage ratio for the last 2 years");

    // sales_growth.calculate questions
    queries.Add("What’s the sales growth rate for Amazon over the past 4 years?");
    queries.Add("Calculate Google's sales growth for the last 3 years");
    queries.Add("Find the sales growth rate of Netflix for the past 5 years");

    // weather_forecast questions
    queries.Add("What's the weather forecast for New York for the next 3 days?");
    queries.Add("Tell me the 5-day weather forecast for London");
    queries.Add("Show me the weather prediction for Tokyo for the next 2 days");

    // get_current_time questions
    queries.Add("What's the current time in ISO format?");
    queries.Add("Show me the time in Unix format");
    queries.Add("What’s the present time in ISO format?");


    // 
    foreach (string query in queries)
    {

        // 
        string json = await llmClient.CallLLMWithToolsAsync("/api/chat", query);

        // 
        Response response = JsonSerializer.Deserialize<Response>(json);
        textBox1.Text = response.Message.Content;
    }
}

This small test gives me a 100% success rate, which is fantastic for a small model like this!

Note: Ollama Tool Calls are not supported by the Watt Models, however, the models can be used like standard chat, and the function calling works as if you're chatting with a model.