Tactics·May 28, 2026

Preventing GPT Hallucination with Make.com Data Injection

Kevin Seeberger's automated content pipeline produced confident, incorrect facts. His solution involved a structural change: three non-AI modules to inject verified data and constrain the LLM. Kevin…

By Maya · Tactics desk·Human-reviewed·✓ Verified May 28, 2026·6 min read·1 source

Kevin Seeberger's automated content pipeline produced confident, incorrect facts. His solution involved a structural change: three non-AI modules to inject verified data and constrain the LLM.

Kevin Seeberger's automated content pipeline, designed to produce daily sports betting articles, began confidently shipping factual errors within a week of deployment. Articles claimed a Spanish second-division side was "the reigning Champions League winners," stated a 38-year-old striker had "just signed his first professional contract," and misreported match times as "this past Tuesday" for a Saturday fixture. These specific hallucinations, all technically plausible but entirely incorrect, stemmed not from a faulty large language model, but from a structural flaw in data delivery. Seeberger's solution involved injecting three non-AI modules into his Make.com flow to prevent the GPT-4o model from generating facts it was never given.

The core issue, Seeberger identified, was not the model itself but the prompt's implicit demand for knowledge the LLM did not possess. His original system prompt instructed GPT-4o to act as a "sports betting journalist" and "write a 600-word match preview" covering "team form, head-to-head history, key players, and betting angles." The user prompt supplied only team names and the match date. This setup asked the LLM to be authoritative on subjects for which it received no explicit data, leading it to generate plausible-sounding but fabricated content from its stale training data. The fix, Seeberger concluded, was to make hallucination structurally impossible by providing every necessary fact and forbidding the model from exceeding those facts. This is the principle of data injection, applied as a constraint pattern.

Before and After Flow Structures

Seeberger's initial Make.com flow was linear and direct. It aggregated data from two APIs, then passed it directly to GPT-4o for content generation. The structure was: Odds API → API-Football → Aggregator → GPT-4o → Google Docs. This configuration meant the Aggregator module delivered a clean object, but the LLM then operated as a black box, filling informational gaps with generated text between the input of team names and the output of a 600-word article. The system relied on the LLM to infer or create facts that were not explicitly provided.

The revised flow introduced three additional modules, none of which use AI. These modules are designed to ensure GPT-4o only processes verified facts and is structurally prevented from inventing information. The updated pipeline is: Odds API → API-Football → Aggregator → Data Validator → Structured Fact Block Builder → GPT-4o (constrained) → Output Validator → Google Docs (or Error Queue). Each new module serves a specific purpose in pre-processing, structuring, or validating data, thereby enforcing factual accuracy before and after the LLM's involvement. The goal is to eliminate the conditions under which hallucination can occur.

The Data Validator: Catching Missing Data Early

The first critical addition is the Data Validator, implemented as a Make.com router. This module addresses a common source of hallucination: null or incomplete data fields. If an API returns no head-to-head data for teams that have not played recently, and the Aggregator passes h2h: null to GPT-4o, the original prompt would still instruct the model to "cover head-to-head history." The LLM would then invent this history. The Data Validator inspects every field that the article is expected to reference.

If any required field is missing, the flow branches. Options include skipping the article entirely, fetching fallback data, or proceeding with a specific flag marking the field as "data unavailable." For instance, if the system requires the last five matches for each team and an API returns fewer, the flow does not fail. Instead, it sets a form_data_complete: false flag. A downstream prompt builder then uses this flag to constrain the LLM, preventing it from generating content about non-existent or incomplete data. This proactive validation ensures the LLM never encounters a prompt asking it to know something it cannot.

Structured Fact Block Builder and Output Validator

Following the Data Validator, the Structured Fact Block Builder module takes the now-verified and complete data. While the specific mechanics of this module are not detailed in the source, its function is implied: to format all available facts into a structured block. This block is then explicitly injected into the prompt for the GPT-4o (constrained) module. This direct injection of a 'fact block' is central to the data injection strategy, ensuring the LLM receives all necessary information and is explicitly forbidden from generating content beyond those provided facts. The constraint on GPT-4o is therefore not merely a prompt instruction, but a structural limitation enforced by the preceding modules.

The final non-AI module in the pipeline is the Output Validator. This module, also not detailed in its internal workings, serves as a gatekeeper for the LLM's output. Its purpose is to check the generated article for adherence to the provided facts and potentially other structural or factual constraints. If the output fails validation, the article is routed to an Error Queue rather than being published to Google Docs. This final check acts as a safety net, ensuring that any residual hallucinations or deviations from the fact block are caught before public dissemination. The combination of pre-processing validation and post-processing verification creates a robust defense against LLM hallucination.

What We'd Change

Seeberger's approach effectively mitigates hallucination by imposing structural constraints, but its direct applicability depends on the content domain and data availability. The method relies heavily on the ability to define and validate all necessary facts upfront. For domains where factual completeness is inherently difficult or requires subjective interpretation, the Data Validator's binary logic (present/absent) might be too rigid. For instance, an article requiring nuanced analysis or speculative commentary, rather than pure factual reporting, would struggle under such strict data injection. The form_data_complete: false flag, while useful for factual reporting, would severely limit creative or analytical depth.

Furthermore, the reliance on Make.com, while practical for a solo founder, introduces potential scalability and cost considerations for larger operations. Each additional module adds latency and execution costs. While the principle of data injection is sound, implementing it with custom microservices or a more integrated data orchestration platform could offer greater control, efficiency, and scalability for high-volume content generation. The Structured Fact Block Builder and Output Validator are described only by their function, not their implementation. Replicating their effectiveness would require careful engineering to define robust validation rules and fact-block generation logic, which could be complex for less structured data types or more intricate content requirements. The specific examples of hallucination (sports scores, player ages, match times) are highly objective and easily verifiable; applying this playbook to more abstract or qualitative content would demand a re-evaluation of what constitutes a

Pull quote: “”

Sources · how we verified

Preventing GPT hallucination in automated content pipelines: how I structure Make.com flows with data injection ↗

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Before and After Flow Structures

The Data Validator: Catching Missing Data Early

Structured Fact Block Builder and Output Validator

What We'd Change

Developer details Iceberg partition overwrite for atomic data corrections in pipelines

Developer traces inconsistent AI output to floating-point rounding noise

Engineer details config-driven pipeline for unifying CSVs via EAV model