🦅 🐍 FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 1 day ago • 25
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Paper • 2407.16607 • Published Jul 23 • 21
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 107
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30 • 18
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding Paper • 2401.03003 • Published Jan 5 • 12
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation Paper • 2312.17276 • Published Dec 27, 2023 • 15
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 49
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 138
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 70
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization Paper • 2311.06243 • Published Nov 10, 2023 • 17
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems Paper • 2311.05884 • Published Nov 10, 2023 • 5
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 40
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models Paper • 2310.13127 • Published Oct 19, 2023 • 11
Tuna: Instruction Tuning using Feedback from Large Language Models Paper • 2310.13385 • Published Oct 20, 2023 • 10
Teaching Language Models to Self-Improve through Interactive Demonstrations Paper • 2310.13522 • Published Oct 20, 2023 • 11
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 18
Eureka: Human-Level Reward Design via Coding Large Language Models Paper • 2310.12931 • Published Oct 19, 2023 • 26
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 55
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58
SlimPajama-DC: Understanding Data Combinations for LLM Training Paper • 2309.10818 • Published Sep 19, 2023 • 10
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch Paper • 2309.10706 • Published Sep 19, 2023 • 16
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170
Studying Large Language Model Generalization with Influence Functions Paper • 2308.03296 • Published Aug 7, 2023 • 11
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition Paper • 2308.03279 • Published Aug 7, 2023 • 21