1 10 14

Jaward Sesay

Jaward

https://github.com/Jaykef

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22

• 2

On Coding Your First Attention

Apr 21

• 7

Organizations

Posts 49

Post

1374

nanoGPT with Sigmoid Self-Attention
I couldn’t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb

Post

1144

The breakthrough in OpenAI’s release goes way beyond just another family of capable models - it’s a monumental leap in LLM reasoning capabilities. One in which the limitations in pre-training become obsolete and the dream of scaling during inference becomes a reality.

Once again reinforcement learning (when rightly done) proves to be the ultimate “tool” that drives reasoning in AI models. OpenAI o1 (aka strawberry 🍓) can think and learn while thinking before giving a response. This is how we humans approach solving difficult problems.

In technical terms, o1 is trained with an RL algorithm to think productively using its chain of thought. In other words “the longer it thinks, the better it does on reasoning tasks”. Similar to how AlphaGo was able to beat the world champion at Go.

Read more: https://openai.com/index/learning-to-reason-with-llms/

View all posts