Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Organizations

Posts 49

view post
Post
1374
nanoGPT with Sigmoid Self-Attention
I couldn’t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb
view post
Post
1144
The breakthrough in OpenAI’s release goes way beyond just another family of capable models - it’s a monumental leap in LLM reasoning capabilities. One in which the limitations in pre-training become obsolete and the dream of scaling during inference becomes a reality.

Once again reinforcement learning (when rightly done) proves to be the ultimate “tool” that drives reasoning in AI models. OpenAI o1 (aka strawberry 🍓) can think and learn while thinking before giving a response. This is how we humans approach solving difficult problems.

In technical terms, o1 is trained with an RL algorithm to think productively using its chain of thought. In other words “the longer it thinks, the better it does on reasoning tasks”. Similar to how AlphaGo was able to beat the world champion at Go.

Read more: https://openai.com/index/learning-to-reason-with-llms/

datasets

None public yet