Bill Yuchen Lin's picture

Bill Yuchen Lin

yuchenlin

·

https://yuchenlin.xyz

AI & ML interests

Research @allenai LLMs and Multimodality, Agents

Articles

ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models

Organizations

yuchenlin's activity

New activity in meta-llama/Llama-3.1-8B-Instruct 3 months ago

new tokenizer contains the cutoff date and today date by default

#74 opened 3 months ago by

commented a paper 4 months ago

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7 • 26 •

New activity in goodbadgreedy/GoodBadGreedy 4 months ago

Update README.md

#1 opened 4 months ago by

commented 2 papers 4 months ago

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22 •

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22 •

New activity in princeton-nlp/Llama-3-Base-8B-SFT-SimPO 4 months ago

no tokenizer?

#1 opened 4 months ago by

commented a paper 5 months ago

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26 • 12 •

New activity in allenai/WildBench 5 months ago

Is there any way for private model testing?

#9 opened 5 months ago by

Example IDs for GPT4o vs Claude3.5Sonnet

#8 opened 5 months ago by

Model to test, please

#7 opened 5 months ago by

commented 4 papers 5 months ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16 • 13 •

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16 • 13 •

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16 • 13 •

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65 •

New activity in allenai/WildBench 5 months ago

[Changelog] 2024-06-13 Update the WB-scores with gpt-4o version

#6 opened 5 months ago by

commented a paper 5 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65 •

New activity in allenai/BaseChat 5 months ago

Llama-3-8B thinks it is built by OpenAI

#1 opened 5 months ago by

Update README.md

#2 opened 5 months ago by

New activity in allenai/WildBench 5 months ago

Add paper link to connect the Space to its paper on Daily Papers page

#5 opened 5 months ago by

[Changelog] 2024-06-09 Update Elo with 0606 version and change default ranking options

#4 opened 5 months ago by