Exploring Karpathy's New NanoChat Project

Exploring Karpathy's New NanoChat Project

OK. This is Big. If you don't know who Andrej Karpathy is, please get to know him and all of his amazing writing and projects.

💡
If you're a hands-on developer, I recommend Karpathy's "Technical Track" set of youtube videos where he basically builds an LLM from absolute scratch, including tokenization. Just be warned - it's about as deep as a rabbit hole goes.

Recently, Andrej released a new repository project called NanoChat -

"A minimal, from scratch, full stack training/inference pipelined of a simple chatGPT clone. "

You can read al about it in his own words:

Now What?

My background is software development & engineering, and less research, and I've been slowly getting into the "deep" side of LLMs and how models work under the covers. So I'll be warking through the code , rung the training myself and document any insights or findings here on this blog.

I'll be starting from this page, and as we speak I'm running the training on lambda.ai. But my goal is to understand all the steps sin a deep way. Today I understand them in a very surface level way, and have been working to change that. Will report back as I progress.

Training in progress

Read more

How I (Currently) Map Our AI Tool Stack for Dev, Research, Production and Non-Tech

How I (Currently) Map Our AI Tool Stack for Dev, Research, Production and Non-Tech

TLDR; see main image of this post, and this is the structure: AI Stack │ ├── Day-to-Day Chat │ ├── Development Centric │ ├── AI Coding Tools │ └── Background Coding Agents │ ├── Research Centric │ ├── AI Studio (Research) │ └── Model Training, Eval & Lifecycle │ ├── Production Facing │ ├── AI Gateways │ ├── AI Guardrails │ ├── Prompt Mgmt & Monitoring │ └── Agentic Frameworks │ └── Model Serving ├── Self-Hosted └── Managed

By Roy Osherove