Reference Reference

📚 Sources

Sources

Every module in this curriculum is built on the work of researchers, engineers, bloggers, and communities who publish openly. This page collects the sources we used, organized by module.

Module 1: Get Running

Tools

Models

Hardware and Emerging Techniques

General Guides

Module 2: Choose Wisely

Training Quality vs Model Size

Quantization

Model Naming

Inference Engines

Memory Bandwidth

Mixture of Experts

Module 3A: How Models Think

Transformer Architecture

Attention Is All You Need — Vaswani et al., 2017 — The original transformer paper.
The Illustrated Transformer — Jay Alammar
The Annotated Transformer — Harvard NLP
3Blue1Brown: But what is a GPT?

Attention Mechanism

Tokenization

Embeddings

Scaling Laws

Temperature & Sampling

General

Module 3B: How Models Fit

Quantization

KV Cache & Memory

Understanding the KV Cache — Hugging Face
PagedAttention — Kwon et al., 2023 — vLLM’s KV cache memory management.
Multi-head Latent Attention — DeepSeek-V2

TurboQuant

LLM in a Flash

Alternative Architectures

Inference Optimization

Module 4: What Can You Do With This?

Chat Interfaces

Document Q&A / RAG

Coding Agents

Agent Concepts

Agent Security

OWASP Top 10 for LLM Applications — Excessive agency, insecure output handling, prompt injection.
Prompt injection and AI agent risks — Simon Willison
AI assistant security risks — Trail of Bits

Automation

Ollama API Documentation

Module 5: Agents

Foundations

Agent Frameworks

Coding Agents

Multi-Agent Orchestration

Reliability and Security

No-Code Platforms

Module 6: Build Custom Tools

APIs and SDKs

Function Calling / Tool Use

MCP (Model Context Protocol)

RAG (Retrieval-Augmented Generation)

Testing and Evaluation

Module 7: Local + Cloud

Decision Frameworks

Cost Optimization

Model Routing

Pricing References

Module 8: Supercharge Your Setup

Skills and Plugins

everything-claude-code — GitHub — 100+ skills, 28 agents, 59 slash commands
Claude Code Skills Documentation — Anthropic

MCP Ecosystem

IDE Configuration

Community

Module 9: Go Further

Image Generation

Video Generation

Audio and TTS

Music Generation

Benchmarks and Evaluation

Fine-Tuning

Module 10: What’s Next

Compression Research

On-Device Inference

Speculative Decoding

Speculative Decoding — Leviathan et al., 2023

Alternative Architectures

Model Context Protocol

Foundational Papers Referenced