DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Abstract
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.
Community
Very nice work, would be amazing to dive in the dataset!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence (2024)
- ReFT: Reasoning with Reinforced Fine-Tuning (2024)
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (2024)
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism (2024)
- Augmenting Math Word Problems via Iterative Question Composing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
deepseek ain't no joke
So can we use GRPO to tune LLM models now?
I mean is GRPO supported and open-source on Huggingface?
DeepSeekMath: Revolutionizing Mathematical Reasoning in Open-Source AI
Links ๐:
๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/
Models citing this paper 26
Browse 26 models citing this paperDatasets citing this paper 0
No dataset linking this paper