Local LLM (GGUF) Inference Viewer

This demo downloads a GGUF model from a fixed URL, stores it in IndexedDB for reuse, and runs real inference in the browser. It can also expose Ollama-compatible /api/tags, /api/generate, and /api/chat endpoints.

Initializing inference engine...

* The first run and periodic refreshes require about 1 GB of traffic.

Model not loaded yet (press the button to download it or reuse the cache).
API status: unchecked
The response will appear here.
GET /api/tags POST /api/generate {"model":"default","prompt":"hello","stream":false} POST /api/chat {"model":"default","messages":[{"role":"user","content":"hello"}],"stream":false}

* If stream is omitted, the default is true (NDJSON stream). The API returns 503 when no model is loaded.