Android / llama.cpp / GGUF
LLM tester with llama.cpp
LLM tester with llama.cpp is an Android local-LLM testing app that lets you load GGUF models, tune inference and prompt-template settings, manage shared MCP / Function Definitions settings, inspect logs, and expose an Ollama/OpenAI-compatible API plus the bundled WebUI from one app.
Docs: User Manual | Technical Specification | llama.cpp / JNI / CMake Deep Dive | Privacy Policy
- Supports both model downloads from a URL and importing local
.gguffiles from the device. - Lets you combine generation settings, Think behavior, custom chat templates, shared MCP settings, and Function Definitions JSON.
- Can start an on-device Ollama/OpenAI-compatible API and WebUI on the same port, including endpoints such as
/api/chatand/v1/chat/completions.
Screenshots
Key Features
- On-device local inference: Runs GGUF models directly on Android with llama.cpp.
- Flexible model loading: Supports downloadable model URLs and local
.ggufimports. - Deep inference controls: Adjust
n_ctx,n_threads, GPU Offload Layers, Top-p, Top-k, penalties, Mirostat, DRY, Think behavior, and custom chat templates. - Shared MCP / function-calling settings: Save MCP Config JSON and Function Definitions JSON separately from model profiles, then optionally enable them for the main prompt input,
/api/chat,/api/generate, and/v1/chat/completions. - Built-in Ollama/OpenAI-compatible API and WebUI: Provides
/api/chat,/api/generate,/api/tags,/v1/chat/completions,/v1/models,/props, and/slotson the same port; only one generation runs at a time, with a queue of up to 10 requests for up to 60 seconds. - Multimodal API inputs:
/api/chatand/v1/chat/completionscan acceptimage_urlandinput_audiowhen the loaded model supports vision/audio.
Operational Notes
- Model downloads can be several gigabytes. Wi‑Fi is strongly recommended.
- The local API server is intended for the same device or a local network. Android 13+ may require notification permission.
- If you configure MCP servers for the main prompt input, API integrations, or the WebUI, parts of conversation content or tool inputs may be sent to those MCP servers.