Supercharge Your Setup
The Ecosystem: Why Setup Matters More Than Model Choice
You can spend hours comparing models and tweaking levels, but the biggest jump in usefulness usually comes from what you plug into the model.
A 9B model connected to your files, your calendar, and your codebase is more useful than a 70B model sitting in a terminal with no context. This module covers the tools and plugins that make that happen. The specific recommendations change often, but the categories are stable.
By the end of this module, you’ll know how to extend your agent with MCP servers, configure memory across sessions, and use skills and hooks to shape how it works.
Skills and Plugins: Extending Your Agent
What Skills Are
A skill is a set of instructions that teaches your a repeatable workflow. Instead of typing the same detailed prompt every time you want a code review, you write those instructions once as a skill and call it with a slash command.
Without a skill:
You type a 200-word prompt explaining your review criteria
every single time you want a code review.
With a skill:
/review-component src/components/UserCard.tsx
The agent loads the instructions automatically.
Skills live as markdown files in your project. Claude Code reads them from .claude/skills/, Kilo Code has a similar system. The format varies by agent, but the concept is the same: reusable expertise you write once.
What Makes a Good Skill
Good skills are specific and scoped. “Review this file” is too vague. “Check this React component for accessibility issues, missing error boundaries, prop type validation, and unnecessary re-renders, then output a structured review with severity levels” is a skill that gives consistent results.
Three patterns that work well:
Workflow skills automate a multi-step process. “Read the PR diff, check for security issues, verify test coverage, and write a review comment.” You save hours per week if you review code regularly.
Standard enforcement skills check code against your team’s rules. “Verify this file follows our API conventions: consistent error format, pagination pattern, authentication middleware.” These catch things linters miss because they’re about design decisions, not syntax.
You can also build generation skills that produce structured output from minimal input: “Given a database table name, generate the CRUD endpoints, types, and tests following our project patterns.” The skill contains your project’s conventions, so the output matches what you’d write by hand.
Community Skill Libraries
You don’t have to write every skill yourself. The everything-claude-code repository has over 100 skills organized by language and framework: Django, Laravel, Go, Python, TypeScript, video processing, and more. Install the ones relevant to your stack and customize from there.
Other agents are building similar ecosystems. The pattern matters more than the specific agent: reusable instructions, shared by community, customized per project. Claude Code uses slash-command skills (installable, run on demand). Cursor uses .cursorrules files, which are always-on context rules rather than slash commands. The concept is the same (giving the agent standing instructions), but the mechanism differs per agent.
MCP Servers: The Plugin Store for AI
Module 6 covered how to wire up an server: the .mcp.json config, the protocol, how function calling works under the hood. This section is about what’s available and how to find it.
What’s Out There
The MCP ecosystem has grown from a handful of reference servers to thousands. They fall into a few categories:
| Category | What It Connects To | Examples |
|---|---|---|
| Developer tools | Code repos, CI/CD, issue trackers | GitHub, GitLab, Linear, CircleCI |
| Data sources | Databases, analytics, logs | PostgreSQL, Datadog, BigQuery |
| Communication | Chat, email, docs | Slack, Gmail, Confluence |
| Productivity | Calendars, notes, files | Google Calendar, Notion, local filesystem |
| Browser and web | Web pages, search, screenshots | Browser control, web search, screenshot capture |
| Specialized | Domain-specific tools | Stripe (payments), Twilio (SMS), AWS services |
Not all servers are equal. Some are maintained by the tool’s creator (Anthropic’s reference servers, Datadog’s official server). Others are community-built with varying quality. Check the README, look at recent commits, and test on non-critical data before trusting a server with anything important.
How to Find Servers
Four places to look:
| Directory | What It’s Good For |
|---|---|
| mcp.so | Largest community directory. 1000+ servers, searchable by category. |
| Smithery | Curated marketplace with one-command installs. Higher average quality. |
| Glama | Quality scores and compatibility info. Good for comparing options. |
| MCP repo | Protocol spec and reference implementations. Start here if building your own. |
The site’s Ecosystem Reference page has a curated list by category.
How to Evaluate a Server
Before adding a server to your setup:
Check what permissions it needs. A filesystem server that requests write access to your entire home directory is a red flag. A Slack server that only reads channels you specify is reasonable. Principle of least privilege applies.
Look at the tool list. Every MCP server exposes a set of tools the model can call. Fewer, well-defined tools usually work better than a server that tries to do everything. Each tool adds to the model’s decision space, and too many tools slow down tool selection.
Then test it on something low-stakes. Connect it, ask the model to use it, and check whether the results make sense before relying on it for real work.
How Many Servers Is Too Many?
Start with two or three that match your daily workflow. Every server adds tools that the model has to reason about on every request. Ten servers with five tools each means fifty possible actions. The model spends more time deciding which tool to call, and sometimes picks the wrong one.
A practical setup for a developer: GitHub + your database + Slack. For a writer or researcher: filesystem + web search + a note-taking tool. Add more when you hit a specific need, not preemptively.
Memory Tools: AI That Remembers
The Problem
Most chat sessions start fresh. The model has no idea what you told it yesterday, what conventions your team uses, or what you’ve been working on. Every session, you re-explain context. The tenth time you tell an agent “we use Zod for validation, not Joi,” you start wondering why it can’t just remember.
Three Layers of Memory
Memory for AI agents works at three levels, each solving a different problem.
Static Context: Project Files
The simplest layer. CLAUDE.md, .cursorrules, .opencode.yaml. You write your project’s conventions, stack, and preferences once. Every session starts with that context. Module 5’s coding agents section covered these per-agent.
Static context works well for things that don’t change: your framework, your coding conventions, your team’s rules. The limitation is that you have to maintain it by hand, and it captures how you work, not what you’ve been working on.
Session Memory: What the Agent Learns
Some agents now extract and persist facts from your conversations automatically. Claude Code watches what you say and saves things it judges important: preferences, project context, corrections. Windsurf’s Cascade Memories do the same, automatically identifying facts worth keeping and loading relevant ones into future sessions.
This is more useful than static context because it captures things you’d never think to write down. “Last time we touched the auth module, the session token format was the problem” is the kind of context that makes the tenth session on a project feel like a continuation rather than a fresh start.
Structured Handoffs: Bridging Sessions
When a session ends, the most valuable context isn’t individual facts. It’s the state of the work: what was done, what decisions were made and why, what’s next, and what files matter. A structured handoff document captures this.
What happened: Added JWT auth endpoints, wired refresh token flow.
Key decisions: Storing tokens in httpOnly cookies, not localStorage (XSS risk).
What's next: Rate limiting on /auth/refresh, then integration tests.
Files to read: src/auth/middleware.ts, docs/auth-design.md.
This pattern works across any agent. It’s a markdown file. The next session reads it and has full context. The value is in capturing decisions and reasoning alongside the raw facts. When you record why you made a choice, not only what you chose, future context becomes more useful. “We moved to a separate module because the user wants a unified ‘your data + your model’ framing” is higher-value context than “the project uses RAG.”
Dedicated Memory Frameworks
For applications that need memory beyond what’s built into a single agent, dedicated frameworks handle the storage, retrieval, and lifecycle of persistent context.
| Framework | Approach | Best For |
|---|---|---|
| Mem0 | Hybrid storage: vector search, key-value, and optional knowledge graph. Sits between your app and the LLM. Auto-extracts facts and injects them into new sessions. | General-purpose persistent memory for agents and apps. |
| Zep / Graphiti | Temporal knowledge graph. Every fact has a time window. “User works at Acme Corp” gets superseded by “User works at Initech” without deleting the history. | Scenarios where you need to reason about how things changed over time. |
| Letta (formerly MemGPT) | The agent manages memory itself across three tiers: core (), recall (conversation history), archival (long-term). The agent decides what to keep and what to forget. | Autonomous long-running agents that manage their own state. |
| Cognee | Builds a full knowledge graph from any data format before you query, then exposes it via MCP. Runs fully local with Ollama. | Document-heavy use cases where you need entity relationships, not text retrieval alone. |
MCP Memory Servers
“Knowledge & Memory” is the largest category on MCP directories, with 280+ servers. A few worth knowing:
The official MCP memory server (from Anthropic) implements a simple knowledge graph: entities, relations, and observations stored locally. Composable and lightweight.
Basic Memory stores its knowledge graph as readable Markdown files on your machine. The AI reads and writes to your local knowledge base via MCP, and everything stays human-readable and version-controllable. No proprietary format.
What’s Still Unsolved
Three problems the current generation of memory tools hasn’t solved. First, cross-agent portability: memory built in Claude Code doesn’t transfer to Cursor or Windsurf. No standard format exists, so switching agents means starting over.
Second, staleness. Memory systems accumulate facts but rarely know when a fact has become outdated. “The auth module uses JWT” might have been true six months ago. Unless something explicitly contradicts it, the memory persists.
The third is strategic context. Most memory tools capture facts about the codebase but not the reasoning behind decisions: why a choice was made, what alternatives were considered, what the broader goal is. Handoff documents do this, but they’re manual.
IDE Power-Ups: Getting More From Your Editor
Configuration Files
Every AI-enabled editor reads project-level configuration that shapes how the AI behaves. This is the highest-leverage thing most developers underuse.
| Editor | Config File | What You Put In It |
|---|---|---|
| Claude Code | CLAUDE.md | Project conventions, do/don’t rules, architecture notes |
| Cursor | .cursorrules | Coding style, framework preferences, project patterns |
| GitHub Copilot | .github/copilot-instructions.md | Repository-level instructions |
| OpenCode | .opencode.yaml | Provider config, model preferences, context |
| Windsurf | .windsurfrules | Project rules similar to Cursor’s format |
A well-written config file is worth more than upgrading your model. A 9B model with good project context produces more useful code than a guessing at your conventions.
What to Put in Your Config File
Focus on things the model can’t infer from reading the code. Architectural decisions (“We use server components by default. Client components only for interactivity.”), naming conventions (“API routes go in src/app/api/. Tests go in tests/ next to the source.”), do-not-touch rules (“Don’t modify migration files directly. Don’t add dependencies without asking.”), and project-specific knowledge the code doesn’t make obvious (“The auth module wraps a custom ORM in src/db/orm.ts.”).
Skip things the model can see: don’t list every file in your project, don’t describe your language or framework (it can tell), and don’t paste your entire README in.
Hooks: Event-Driven Automation
Some agents support hooks: shell commands that fire automatically when the agent takes specific actions. You configure them by mapping event names to commands. A pre-edit hook runs your linter on every file before the agent saves changes. A post-commit hook runs your test suite after every commit.
The main value of hooks is that the agent sees the output. A pre-commit hook that runs a linter and fails will output the linter errors. The agent reads that output and adjusts the code before retrying. This creates a feedback loop: the agent gets the same error messages a developer would see, and can correct its own work without you intervening. It’s how you build guardrails without micromanaging the agent.
Claude Code’s hook system is the most mature, with events for file edits, commits, errors, and tool calls. Other agents are adding similar capabilities.
Staying Current: Where the Community Lives
Tools in this space change quickly. Here’s where people track what’s happening.
Where to Watch
r/LocalLLaMA is the most active community for local AI. New model releases, hardware benchmarks, tool recommendations, and real-world experience reports. If something works or doesn’t, someone has posted about it here.
HuggingFace is where models, datasets, and tools get published. The trending page shows what people are using right now. The forums have technical discussions about model quality and deployment.
GitHub trending and tool-specific repositories track the ecosystem’s direction. Star counts are noisy, but release activity and issue discussions tell you whether a project is maintained.
YouTube and blogs from practitioners like Simon Willison, Matt Shumer, and channels covering local AI do a good job of filtering signal from noise. They test things and report what actually works.
How to Stay Current Without Drowning
Pick one or two sources and check them weekly, not daily. The pace of releases can feel overwhelming, but most changes don’t affect your setup. A new model release matters if it fits your hardware better. A new MCP server matters if it connects to something you use. Everything else is noise you can catch up on later.
What you’ve set up won’t stop working because something new came out. Upgrade when you actually hit a limitation.
Next Steps
-
Module 9: Go Further: Fine-tuning, media generation, evaluation, and where this is all heading.
-
Module 6: Build Custom Tools: If you skipped ahead, this covers MCP wiring, function calling, and RAG pipelines.
-
Module 5: Agents: Agent architectures, frameworks, and multi-agent orchestration.
-
Model Reference: Current model recommendations for every use case.