Local LLM (GGUF) Inference Viewer
This demo downloads a GGUF model from a fixed URL, stores it in IndexedDB for reuse, and runs real inference in the browser. It can also expose Ollama-compatible /api/tags, /api/generate, and /api/chat endpoints.
Initializing inference engine...
* The first run and periodic refreshes require about 1 GB of traffic.
Model not loaded yet (press the button to download it or reuse the cache).
API status: unchecked
The response will appear here.
GET /api/tags
POST /api/generate {"model":"default","prompt":"hello","stream":false}
POST /api/chat {"model":"default","messages":[{"role":"user","content":"hello"}],"stream":false}
* If stream is omitted, the default is true (NDJSON stream). The API returns 503 when no model is loaded.