Tactics·May 27, 2026

Securing Llama.cpp's Shell Execution for Web RAG

DevelopmentBorn3978 engineered a multi-layered sandboxing solution for llama.cpp's exec_shell_command using firejail and smolmachines. This approach enables secure web RAG within a local LLM…

By Maya · Tactics desk·Human-reviewed·✓ Verified May 27, 2026·5 min read·1 source

DevelopmentBorn3978 engineered a multi-layered sandboxing solution for llama.cpp's exec_shell_command using firejail and smolmachines. This approach enables secure web RAG within a local LLM environment.

Enabling exec_shell_command within llama.cpp's server functionality introduces significant security vulnerabilities, as it grants the language model direct access to the host system's shell. A Reddit user, DevelopmentBorn3978, addressed this risk by implementing a multi-sandboxing workflow. This setup leverages firejail and smolmachines to isolate shell commands, allowing for secure web RAG (Retrieval Augmented Generation) directly from the llama-server web UI.

The core of the strategy involves creating a dedicated, isolated user environment and then executing commands within a nested virtual machine and a firejail sandbox. This prevents potential escalation of privileges or unintended system modifications, a critical consideration when an LLM can initiate arbitrary shell commands. The process, detailed across seven distinct steps, establishes a hardened execution path for external tools like wget.

Enabling Llama.cpp's Native Tools

The initial step involves configuring the llama-server to expose its native tool capabilities. DevelopmentBorn3978 used specific command-line arguments to enable get_datetime and exec_shell_command, alongside other model and server parameters. The command llama-server --model Qwen3.6-35B-A3B_MTP-UD-Q8_K_XL.gguf --flash-attn on --no-mmap --jinja --threads-http 4 --prio 2 --tools get_datetime,exec_shell_command --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 1.5 --min-p 0.00 --chat-template-kwargs '{"preserve_thinking":true}' --spec-type draft-mtp --spec-draft-n-max 1 illustrates this setup, explicitly activating the exec_shell_command tool.

Establishing the Isolated User Environment

To contain potential breaches, a new Linux user, vmagents, was created. This user acts as a dedicated execution context for the sandboxed commands, preventing any LLM-initiated actions from affecting the primary user's home directory or system privileges. The commands sudo useradd -m vmagents and sudo passwd vmagents establish this isolated user account and set its password.

Simultaneously, firejail was installed system-wide. firejail is a SUID program that reduces the risk of security breaches by restricting the running environment of untrusted applications. Its presence provides an additional layer of isolation around any executed shell commands.

Configuring Smolmachines for VM Creation

Within the newly created vmagents user environment, smolmachines was installed. smolmachines is an OCI (Open Container Initiative) virtual machine harness, designed for creating lightweight virtual machines. The installation was performed by switching to the vmagents user (sudo su - vmagents) and executing curl -sSL https://smolmachines.com/install.sh | bash.

Following installation, a minimal virtual machine named minivm was created using an Alpine Linux OCI image. This minivm provides a bare-bones operating system with essential BusyBox commands, further reducing the attack surface. The commands smolvm machine create minivm --image alpine --net and smolvm machine start --name minivm set up and initiate this isolated VM.

Scripting the Multi-Layered Sandbox Execution

A critical component of the workflow is the minivm-exec script, located in /home/vmagents/.local/bin/minivm-exec. This script orchestrates the multi-layered sandboxing. It first ensures minivm is running, then executes a given command inside minivm using firejail, and finally stops the VM. The script's content is: #!/bin/sh smolvm machine start --name minivm >/dev/null firejail smolvm machine exec --name minivm -- $* 2>/dev/null smolvm machine stop --name minivm >/dev/null.

The firejail command within minivm-exec is key. It wraps the smolvm machine exec command, meaning any shell command passed to minivm-exec will first be executed within the minivm and then further constrained by firejail's security profiles. This nested isolation significantly limits the potential impact of malicious or erroneous commands.

Invoking the Sandbox from Llama.cpp

To allow the primary user (and thus llama.cpp) to invoke the sandboxed execution, a wrapper script named vm-exec was created in the primary user's executable path (/home/<MYUSER>/.local/bin/vm-exec). This script's sole purpose is to execute minivm-exec under the vmagents user's credentials. The script's content is: #!/bin/sh sudo su - vmagents -c "minivm-exec $*".

When llama.cpp needs to execute a shell command, it is instructed to prepend the command with vm-exec. For example, a prompt like "retrieve the latest news for today from the https://www.servizitelevideo.rai.it/televideo/pub/solotesto.jsp website... Prepend any command to be executed with the sandboxing wrapper vm-exec. Use wget to fetch web content adding the option "-U Mozilla" as browser user agent string" directs the LLM to use vm-exec for its web fetching operations, ensuring all shell commands are funneled through the secure, multi-layered sandbox.

WHAT WE'D CHANGE

The multi-sandboxing approach by DevelopmentBorn3978 demonstrates a functional method for securing llama.cpp's exec_shell_command for local web RAG. However, this solution carries significant operational overhead and is primarily suited for experimental or highly controlled environments. Its complexity makes it less viable for production deployments or for users without deep Linux system administration expertise.

First, the manual setup of a dedicated user, firejail, smolmachines, and custom scripts introduces a high barrier to entry. Each component requires specific configuration and understanding, making the solution difficult to replicate consistently across different systems or to scale. Automation tools could streamline this, but the underlying complexity remains. For broader adoption, a more integrated, less manual sandboxing mechanism would be necessary.

Second, the maintenance burden is substantial. Updates to llama.cpp, firejail, smolmachines, or Alpine Linux could introduce breaking changes to the custom scripts or the interaction between the layers. Debugging issues within such a nested environment would be challenging, requiring expertise across multiple distinct technologies.

Finally, while firejail and smolmachines enhance security, the fundamental risk of exposing a shell to an LLM persists. Even with sandboxing, the attack surface is broader than a purpose-built API for web fetching. For robust web RAG, direct integration with a controlled HTTP client or a dedicated, hardened web scraping service is generally more secure and efficient than relying on an LLM to generate and execute shell commands, even in a sandbox. This solution is a testament to ingenuity in a local context, not a blueprint for enterprise-grade security.

LANDING

The ability to execute shell commands directly from an LLM introduces powerful capabilities, but also profound security implications. DevelopmentBorn3978's multi-sandboxing technique illustrates a practical method for mitigating these risks in a local llama.cpp environment. By isolating the exec_shell_command within a dedicated user, a virtual machine, and a firejail sandbox, the approach transforms a high-risk feature into a usable tool for web RAG. This engineering effort highlights the trade-offs between system flexibility and security, demonstrating that local LLM experimentation can proceed with caution through deliberate architectural choices.

Pull quote: “”

Sources · how we verified

How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui ↗

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Enabling Llama.cpp's Native Tools

Establishing the Isolated User Environment

Configuring Smolmachines for VM Creation

Scripting the Multi-Layered Sandbox Execution

Invoking the Sandbox from Llama.cpp

WHAT WE'D CHANGE

LANDING

Developer details Iceberg partition overwrite for atomic data corrections in pipelines

Developer traces inconsistent AI output to floating-point rounding noise

Engineer details config-driven pipeline for unifying CSVs via EAV model