Reference Reference

🛠️ Tools & UIs Reference

Tools & UIs Reference

Last verified: April 2026. The tools used to run and interact with local LLMs evolve constantly. Below is the current functional state of top engines and User Interfaces (UIs).

The Core Engines (The Backend)

Engines do the heavy lifting: loading models into memory, determining GPU offloading, and exposing an API (usually OpenAI-compatible) so other apps can chat with them. You rarely interact with the engine directly unless you’re using the terminal.

1. Ollama

Best for: Most users starting out. The standard starting point.
Key Features: One-command installs (ollama pull), MLX hardware acceleration for Macs, native tool calling, and simple setup.
New for 2026: Integrates ollama launch for instant sandboxed configuration of coding agents.
Platforms: Mac, Windows, Linux.

Quick Setup

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

The model name (llama3.2) comes from the Models reference page. Swap in any model listed there.

2. llama.cpp / llama-server

Best for: Hardware tinkerers, low-resource machines, and those wanting support for the latest quantization formats (like TurboQuant 3-bit).
Key Features: A C/C++ backend optimized for low-level hardware access. If your hardware is unusual, llama.cpp probably supports it. It powers almost every other GUI behind the scenes.
Platforms: Everything.

Quick Setup

# Mac
brew install llama.cpp

# Linux: build from source. See https://github.com/ggerganov/llama.cpp
llama-cli -m path/to/model.gguf

3. MLX Server (Apple Silicon Only)

Best for: Mac users wanting maximum tokens-per-second out of Apple Unified Memory. Faster than llama.cpp on M-series for many models.
Platforms: macOS with Apple Silicon (M1 and later).

Quick Setup

pip install mlx-lm
mlx_lm.server --model mlx-community/model-name

Replace model-name with any model available at mlx-community on HuggingFace.

4. vLLM

Best for: Serving models at scale on NVIDIA GPUs. Not intended for local single-user use.
Key Features: Production inference server with an OpenAI-compatible API. Supports tensor parallelism across multiple GPUs via a single flag. PagedAttention gives it higher throughput than naive serving.
Platforms: Linux with NVIDIA GPUs.

Quick Setup

pip install vllm
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.3

Graphical Interfaces (The Desktop Apps)

If you don’t want to use the terminal, you want a GUI. These apps bundle an engine (usually llama.cpp) inside a chat interface.

1. LM Studio

Best for: Individuals wanting a polished, out-of-the-box desktop experience.
Key Features: Visual model downloading via HuggingFace, system specification checks (warns you if a model is too big), and LM Link for secure remote GPU usage.
Platforms: Mac, Windows, Linux.
Download: lmstudio.ai

Quick Setup

Download from lmstudio.ai. Open the app, go to the model browser, search for a model, and click Download. No terminal required.

2. Open WebUI

Best for: Power users, organizations, and developers.
Key Features: Runs in a browser via Docker. Granular Role-Based Access Control (RBAC), multi-user logins, integrated document RAG, and custom modular “Pipelines” for routing and prompt policies.
Platforms: Self-hosted (Docker).
Download: openwebui.com

Quick Setup

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main

This connects to a running Ollama instance on the same machine. Open http://localhost:3000 in your browser.

3. Unsloth Studio

Best for: Users who want to fine-tune models directly from their GUI.
Key Features: Merges the chat experience with LoRA fine-tuning workflows, letting you adjust model behavior visually.

4. AnythingLLM

Best for: Users who want to chat with their own documents without setting up a full RAG pipeline manually.
Key Features: All-in-one desktop app for document Q&A (RAG). Supports multiple LLM backends including Ollama. No Docker required. Point it at a folder of PDFs or text files and start asking questions.
Download: anythingllm.com

Low-Code Workflow UIs

If you want to orchestrate RAG pipelines or agents visually without writing Python:

Dify

Best for: Production deployments and teams that want a managed platform.
How it works: Cloud-hosted option available, also self-hostable. Built-in LLM orchestration, prompt management, and API publishing. Lets you deploy a working app from a visual flow without writing server code.
Site: dify.ai

Langflow

Best for: Rapid prototyping on your own machine.
How it works: Self-hosted only. Python-based drag-and-drop flow builder. Lets you wire together document loaders, vector databases, LLMs, and tool-callers into a pipeline you can test locally. Not designed for production traffic.
Site: langflow.org

Specialized Assistants

OpenClaw

Best for: Users who want AI integrated into their operating system as a background personal assistant.
How it works: Connects to services like WhatsApp and Slack, monitors for triggers, and can execute actions on your behalf. Generates and stores reusable “skills” locally so it doesn’t repeat the same reasoning twice.

Summary Comparison

Tool	Type	Multi-User	GPU Tier	Target Use
Ollama	Backend Engine	No	Consumer	Developers / most users
LM Studio	Desktop App	No	Consumer	Solo users / researchers
Open WebUI	Web UI	Yes	Consumer	Teams / organizations
Unsloth Studio	Web / Desktop	No	Consumer	Fine-tuners
AnythingLLM	Desktop App	No	Consumer	Document Q&A
vLLM	Backend Engine	Yes	NVIDIA (multi-GPU)	Production inference
MLX Server	Backend Engine	No	Apple Silicon	Mac inference
llama.cpp	Backend Engine	No	Any	Tinkerers / embedded

Module 1: Getting Started: Ollama installation walkthrough.
Module 4: Chat Interfaces and Document Q&A: Setting up Open WebUI and AnythingLLM for everyday use.
Module 8: Ecosystem Setup: Connecting engines, UIs, and tools into a working local stack.

🛠️ Tools & UIs Reference

Tools & UIs Reference

The Core Engines (The Backend)

1. Ollama

2. llama.cpp / llama-server

3. MLX Server (Apple Silicon Only)

4. vLLM

Graphical Interfaces (The Desktop Apps)

1. LM Studio

2. Open WebUI

3. Unsloth Studio

4. AnythingLLM

Low-Code Workflow UIs

Dify

Langflow

Specialized Assistants

OpenClaw

Summary Comparison

Related Pages