Why generic LLMs hallucinate tender estimates (and what to use instead)
Generic large language models like ChatGPT hallucinate on tender and estimating work because they predict plausible-sounding text, not verified facts. They have no connection to your priced history or the document in front of them, so when a number or a compliance answer is missing, they generate the most likely-looking one instead of stopping. For a bid, that is dangerous: a confident wrong price or a fabricated compliance claim costs real money. The fix is not a better prompt; it is an architecture that reads your documents, cites every value to a source, and refuses to invent the numbers.
Why does ChatGPT make up numbers in a quote?
A general LLM is a text predictor, not a calculator or a database. It produces the most statistically likely continuation of your prompt, which is why its prose reads so well, and why its numbers are untrustworthy. It has no live connection to your priced history, your supplier quotes or the specification in front of it, so when a figure is missing it generates a plausible one rather than stopping. The model has no internal concept of "I don't have this price" unless the system around it is built to enforce one.
Where does this actually bite in a bid?
The damage shows up in three predictable places. First, invented unit rates that look entirely reasonable and sail through a quick read. Second, compliance statements asserted as "comply" with no clause behind them. Third, scope items that are silently dropped or merged when the model summarises a long document. Each one is a confident, fluently written error, the hardest kind to catch in the rush before a deadline, and the kind that turns into an absorbed variation or a mispriced job after award.
Why won't a better prompt fix it?
Prompt engineering lowers the rate of obvious mistakes but cannot remove the root cause. Asking a model to "only use real figures" or "say if you're unsure" still leaves it guessing, because generating an answer and guessing are the same operation to it: there is no separate verification step happening inside the model. Reliability for estimating has to come from the system wrapped around the model, not from the wording of the request.
What does a cited, deterministic assistant do instead?
A purpose-built quoting assistant uses the language model only for the genuinely hard reading (interpreting messy RFQs, scopes of work and drawings) and then does the matching, arithmetic and writing deterministically. Every value it outputs is traced to a source document and page; pricing is never invented but flagged for a human; and an item it cannot match is surfaced for review rather than filled with a guess. Put simply: the generic model answers from memory, while the assistant answers only from your documents, and tells you where each answer came from.
How can you tell the difference when evaluating a tool?
Ask any AI quoting tool three questions. Can it show the source (document and page) behind every number it produces? Does it refuse to invent a price, or does it cheerfully fill the cell? Will it stop and ask when it cannot match a line item? If the honest answer is "it just generates a response," you are looking at a generic LLM with a new coat of paint, and it will hallucinate on your bids the same way the chatbot does.
Common questions
Can ChatGPT write a tender response?
It can draft prose, like a cover letter, but it should not be trusted to produce priced schedules, compliance statements or returnables. It has no access to your priced history and will invent figures and assert compliance it cannot support. Use it for first-draft language only, never for the numbers or claims that carry commercial risk.
Are all AI quoting tools the same as ChatGPT underneath?
No. Many wrap a general model and inherit its hallucination problem; others use the model only to read documents and keep the pricing and writing deterministic and cited. The test is whether the tool can show a source for every value and refuse to invent the rest, or whether it simply generates an answer.
Isn't hallucination being solved as models get better?
Newer models hallucinate less but still cannot guarantee a cited, verifiable number, because stating a fact and generating text are the same operation to them. For work where a wrong figure costs money, you need verification built into the system around the model, not just a more fluent model.
So what is the safe way to use AI in estimating?
Let it do the document-heavy reading (extracting clauses, matching line items, flagging gaps) with every output cited to a source and pricing left to your team. Keep the judgment and the numbers with people; use the AI to make the reading and drafting faster, not to invent answers.
Send a real tender. Get the output back.
Hand Elora Grid one real task and judge the result yourself.