melsiddieg (MOHAMMED ABDALLAH)

upvoted a paper about 1 month ago

Med42-v2: A Suite of Clinical LLMs

Paper • 2408.06142 • Published Aug 12 • 50

upvoted a collection about 1 month ago

🦅 🐍 FalconMamba 7B

Collection

This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 1 day ago • 25

upvoted an article about 1 month ago

Article

Welcome FalconMamba: The first strong attention-free 7B model

Aug 12

• 96

upvoted a paper about 2 months ago

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Paper • 2407.16607 • Published Jul 23 • 21

upvoted a paper 2 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 64

upvoted an article 4 months ago

Article

Introducing the Open Arabic LLM Leaderboard

May 14

• 62

upvoted a paper 6 months ago

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 103

upvoted a paper 7 months ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

upvoted a collection 7 months ago

💫 StarCoder2

Collection

StarCoder2 models and datasets! • 8 items • Updated Mar 1 • 79

upvoted 4 papers 7 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 107

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13 • 34

Model Editing with Canonical Examples

Paper • 2402.06155 • Published Feb 9 • 11

upvoted 4 papers 8 months ago

upvoted 4 papers 9 months ago

PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation

Paper • 2312.17276 • Published Dec 27, 2023 • 15

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

Paper • 2312.14187 • Published Dec 20, 2023 • 49

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 40

Context Tuning for Retrieval Augmented Generation

Paper • 2312.05708 • Published Dec 9, 2023 • 16

upvoted 7 papers 10 months ago

Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 79

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118

Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 33

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Paper • 2311.06243 • Published Nov 10, 2023 • 17

Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

Paper • 2311.05884 • Published Nov 10, 2023 • 5

upvoted 8 papers 11 months ago

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 35

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Paper • 2310.13127 • Published Oct 19, 2023 • 11

Tuna: Instruction Tuning using Feedback from Large Language Models

Paper • 2310.13385 • Published Oct 20, 2023 • 10

Teaching Language Models to Self-Improve through Interactive Demonstrations

Paper • 2310.13522 • Published Oct 20, 2023 • 11

Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

Paper • 2310.13671 • Published Oct 20, 2023 • 18

Eureka: Human-Level Reward Design via Coding Large Language Models

Paper • 2310.12931 • Published Oct 19, 2023 • 26

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

upvoted 6 papers 12 months ago

Large Language Models as Analogical Reasoners

Paper • 2310.01714 • Published Oct 3, 2023 • 15

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 55

DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 58

SlimPajama-DC: Understanding Data Combinations for LLM Training

Paper • 2309.10818 • Published Sep 19, 2023 • 10

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

Paper • 2309.10706 • Published Sep 19, 2023 • 16

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

upvoted 7 papers about 1 year ago

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31

Studying Large Language Model Generalization with Influence Functions

Paper • 2308.03296 • Published Aug 7, 2023 • 11

UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Paper • 2308.03279 • Published Aug 7, 2023 • 21

Med-Flamingo: a Multimodal Medical Few-shot Learner

Paper • 2307.15189 • Published Jul 27, 2023 • 22

Less is More: Focus Attention for Efficient DETR

Paper • 2307.12612 • Published Jul 24, 2023 • 6

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142

MOHAMMED ABDALLAH

AI & ML interests

Organizations

melsiddieg's activity

Welcome FalconMamba: The first strong attention-free 7B model

Introducing the Open Arabic LLM Leaderboard