Upload 5 files

Browse files

Files changed (5) hide show

README.md +50 -6
media/xlam-bfcl.png +0 -0
media/xlam-toolbench.png +0 -0
media/xlam-unified_toolquery.png +0 -0
media/xlam-webshop_toolquery.png +0 -0

README.md CHANGED Viewed

@@ -26,9 +26,10 @@ tags:
 <img width="500px" alt="xLAM" src="https://huggingface.co/datasets/jianguozhang/logos/resolve/main/xlam-no-background.png">
 </p>
 <p align="center">
-  <a href="">[Homepage]</a>  |
-  <a href="">[Paper]</a> |
-  <a href="https://github.com/SalesforceAIResearch/xLAM">[Github]</a>
   <a href="https://huggingface.co/spaces/Tonic/Salesforce-Xlam-7b-r">[Community Demo]</a>
 </p>
 <hr>
@@ -419,12 +420,55 @@ Output:
 {"thought": "", "tool_calls": [{"name": "get_earthquake_info", "arguments": {"location": "California"}}]}
 ````
 ## License
 The model is distributed under the CC-BY-NC-4.0 license.
-<!-- ## Citation
-If you find this repo helpful, please cite our paper:
 ```bibtex
-``` -->

 <img width="500px" alt="xLAM" src="https://huggingface.co/datasets/jianguozhang/logos/resolve/main/xlam-no-background.png">
 </p>
 <p align="center">
+  <a href="https://www.salesforceairesearch.com/projects/xlam-large-action-models">[Homepage]</a>  |
+  <a href="https://arxiv.org/abs/2409.03215">[Paper]</a> |
+  <a href="https://github.com/SalesforceAIResearch/xLAM">[Github]</a> |
+  <a href="https://blog.salesforceairesearch.com/large-action-model-ai-agent/">[Blog]</a> |
   <a href="https://huggingface.co/spaces/Tonic/Salesforce-Xlam-7b-r">[Community Demo]</a>
 </p>
 <hr>
 {"thought": "", "tool_calls": [{"name": "get_earthquake_info", "arguments": {"location": "California"}}]}
 ````
+## Benchmark Results
+Note: **Bold** and <u>Underline</u> results denote the best result and the second best result for Success Rate, respectively.
+### Berkeley Function-Calling Leaderboard (BFCL)
+![xlam-bfcl](media/xlam-bfcl.png)
+*Table 1: Performance comparison on BFCL-v2 leaderboard (cutoff date 09/03/2024). The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.*
+### Webshop and ToolQuery
+![xlam-webshop_toolquery](media/xlam-webshop_toolquery.png)
+*Table 2: Testing results on Webshop and ToolQuery. Bold and Underline results denote the best result and the second best result for Success Rate, respectively.*
+### Unified ToolQuery
+![xlam-unified_toolquery](media/xlam-unified_toolquery.png)
+*Table 3: Testing results on ToolQuery-Unified. Bold and Underline results denote the best result and the second best result for Success Rate, respectively. Values in brackets indicate corresponding performance on ToolQuery*
+### ToolBench
+![xlam-toolbench](media/xlam-toolbench.png)
+*Table 4: Pass Rate on ToolBench on three distinct scenarios. Bold and Underline results denote the best result and the second best result for each setting, respectively. The results for xLAM-8x22b-r are unavailable due to the ToolBench server being down between 07/28/2024 and our evaluation cutoff date 09/03/2024.*
 ## License
 The model is distributed under the CC-BY-NC-4.0 license.
+## Citation
+If you find this repo helpful, please consider to cite our papers:
+```bibtex
+@article{zhang2024xlam,
+  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
+  author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
+  journal={arXiv preprint arXiv:2409.03215},
+  year={2024}
+}
+```
+```bibtex
+@article{liu2024apigen,
+  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
+  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
+  journal={arXiv preprint arXiv:2406.18518},
+  year={2024}
+}
+```
 ```bibtex
+@article{zhang2024agentohana,
+  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
+  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
+  journal={arXiv preprint arXiv:2402.15506},
+  year={2024}
+}
+```

media/xlam-bfcl.png ADDED Viewed

media/xlam-toolbench.png ADDED Viewed

media/xlam-unified_toolquery.png ADDED Viewed

media/xlam-webshop_toolquery.png ADDED Viewed