Loading course…
Loading course…
Created by Shaunak Ghosh
Run capable local LLMs with Ollama or LM Studio, then build a private RAG workflow over your own documents with grounding and citations. Finally, connect your local-first agent to MCP tools safely, and package it for reproducible, “works on my machine” startup.
8 modules • Each builds on the previous one
Build a correct mental model of what runs locally: the generative LLM (text output) vs the embedding model (vectorization for retrieval), and why local-first mainly changes data boundaries and cost—not magically model capability.
Compare Ollama and LM Studio as local inference toolchains: installation paths, model management, serving APIs, updates, and how to keep setups offline/low-cost and repeatable.
Understand local LLM constraints: RAM/VRAM sizing, context window memory cost, tokens/sec throughput, CPU vs GPU behavior, and practical monitoring for ‘works on my machine’ reliability.
Learn what quantization changes (weight precision), why it reduces RAM/VRAM footprint, and how formats like GGUF and EXL2 trade quality, speed, and compatibility across backends.
Pick models strategically for reasoning vs writing vs code vs small-footprint use, using lightweight evaluation: latency, context needs, tool-use ability, and task-specific benchmarks.
Understand embeddings as vectors, how similarity search works (cosine/dot), and how local vector indexes support private semantic retrieval for RAG over personal docs.
Build a private RAG pipeline over local documents (PDFs, notes, repos): chunking strategies, embeddings, retrieval tuning, and grounding checks (citations, quote verification) to reduce hallucinations.
Learn MCP fundamentals and connect local-first agents to tools (filesystem, git, task manager) with safe defaults (read-only, directory scoping), then harden privacy boundaries against prompt injection and package a reproducible ‘one-command’ setup.
Begin your learning journey
In-video quizzes and scaffolded content to maximize retention.