When a Local Model Can Breach You, Design Is the Only Defense Left

Capable AI is the starting point. The harness is what turns it into results.

Attackers no longer need a frontier model. With the right harness, and increasingly with uncensored local models, they can get most of the practical capability offensive work requires. That changes the question defenders should be asking. It is no longer how powerful are the latest models. It is how do we build systems that hold up when capable AI is already widely available.

We have been learning this the practical way, by building harnesses for our own assessment work. Most of the attention on AI in security still goes to the model: which one is smartest, which one tops the benchmarks, which lab shipped the newest frontier system this month. That is the wrong thing to watch. The model is the engine. The thing that decides whether it does real work is the harness around it, and that is where the more important story is.

The other reason local models matter to attackers

There is a second reason local models matter here, apart from the harness. Frontier providers ship their models with safety layers built in: refusals, guardrails, and terms of service that someone actually enforces. Once the weights are on your own hardware, those provider-enforced safeguards largely disappear. The model still reflects whatever alignment it learned during training, but there is no provider enforcing policies, rate limits, or account controls. Attackers can take advantage of this. They can run uncensored community builds, fine-tune open-weight models on their own data, or modify the system prompts and surrounding software to reduce or bypass behavioral constraints. None of it needs a frontier provider's permission, because no provider is in the loop anymore. That is what makes local models attractive to an attacker before the harness even enters the picture: no provider to say no, no account to suspend, no logs sitting on someone else's server.

What a harness actually is

A large language model (LLM) does one thing. It takes some text and predicts what comes next. That is genuinely useful, but on its own it cannot run a command, read a file, remember what it did five steps ago, or check whether its own answer was right. On its own, it is a sophisticated autocomplete.

A harness is the engineering wrapped around the model that turns that autocomplete into work. It usually includes a few things:

Tools. The model can call out to the world. Run a scanner, query a database, read source, send a request, parse the response. The model decides what to do; the harness lets it actually do it.
A control loop. Instead of one answer, the harness runs the model in a cycle: plan, act, observe the result, adjust, repeat. This is the difference between a chatbot and an agent.
Context management. Deciding what information goes in front of the model at each step, pulling in the right reference material, and keeping track of what has already happened so it does not lose the thread.
Verification. Checking the model's output against ground truth, catching mistakes, and retrying instead of trusting the first pass. Left to itself, the model never doubts its own output; the harness is what does.
Domain structure. The scaffolding, prompts, and guardrails that point a general capability at a specific job and keep it on task. Without it, a capable model will happily solve the wrong problem, skip your process, or drift past the scope it was given.

These ideas are not new. The plan, act, observe loop was formalized in research like ReAct, and work such as SWE-agent has shown that the interface between a model and its tools can matter as much as the model itself.

If the model is the engine, the harness is the transmission, the steering, the instruments, and the driver's hands. A powerful engine with none of that is a powerful engine sitting in a field.

Local models and frontier models, briefly

A quick definition, because the rest of this turns on it. A model's size is usually measured in parameters, the internal values it learns during training. More parameters used to track fairly closely with more raw capability. That relationship has gotten noisier as training data quality, architecture, and fine-tuning have started to matter as much as raw scale, so size alone is a weaker signal than it used to be. It still costs more in hardware and compute either way. The frontier models are the largest systems on the market. Most labs do not publish exact parameter counts anymore, but the working assumption is somewhere from the hundreds of billions to over a trillion. They are too big to run on your own equipment, so you reach them as a service over the internet, which means your data leaves your environment to be processed on someone else's.

A local model is one small enough to run on hardware you control. The weights are downloaded and the model runs on your own machines, so your data never leaves the boundary you control. To stay that size it has fewer parameters, often in the tens of billions, small enough to run on a single GPU or a machine with enough unified memory rather than a frontier system. The trade is real: less raw horsepower in exchange for privacy, control, and the ability to run where there is no outside connection at all. That trade is exactly the one our work demands, and it is why the harness matters so much.

Why a local model can keep up with a frontier one, for scoped work

Here is the part that surprises people. Once the harness is good, the gap between a local model and a frontier model narrows dramatically, for constrained, tool-driven workflows where the harness carries the structure and the model just has to execute each step well. That qualifier matters. Hand either model an open-ended problem with no scaffolding, and the frontier model's raw reasoning starts to matter again.

A frontier model can carry a weak harness on its own strength. A local model has no margin to spare, which is why the real engineering goes into the harness.

The frontier models are the stronger reasoners. But most security work is not a single heroic act of reasoning. It is many small, well-defined steps, run in sequence, with the results of each one feeding the next. A good harness breaks the work down so that each step is something even a modest model handles reliably, then it does the part the model is bad at: remembering, checking, and staying organized across hundreds of steps. The harness supplies the discipline. The model supplies the reasoning.

We do most of this with local models running on hardware we control. That choice is not about saving money. Our clients operate sensitive networks and platforms, much of it under restrictions that make sending their data to an outside service a non-starter. Local models are frequently the only option that is even on the table. And a well-designed harness around a local model is already enough to do serious work, so working entirely on hardware we control gives our clients the privacy they require without sacrificing capability.

What it does to the work

With that setup in place, we have watched these tools move through networks. They get past the obvious findings quickly and surface the architectural weaknesses that used to take several specialists from different disciplines to piece together. Work that once required deep expertise across Windows Active Directory, embedded systems, bus protocols, radio, and cloud infrastructure all at once now needs something different. It needs someone who understands the problem, knows the right questions to ask, and can steer the model in the right direction.

That is a real shift in the economics of offensive security. The time to a serious result drops, and the breadth of expertise you have to assemble in one place drops with it. We are not alone in seeing it. Published work like PentestGPT found that better task structure alone produced a large jump in automated penetration-testing performance, and the UK's National Cyber Security Centre assesses that AI will increase both the volume and the impact of cyberattacks.

The reason this should worry defenders

If a local model in a good harness can do this for us, working under scope and with permission, it can do it for an adversary too. This is what people mean when they talk about machine-speed cyber and the leading edge of fully autonomous, Mythos-class attacks: tools that find vulnerabilities, write the exploit, and chain one flaw into the next with little human involvement. The far end of that is genuinely alarming. The quieter point is that you do not need the top end to feel the effect. The capability is already here, running on modest hardware, available to anyone willing to build the harness.

Most security programs were designed for a human-paced attacker. Perimeter defenses, monitoring, detection, and response all impose cost mainly at the moment of initial access. Once an attacker is inside, those defenses fall quiet and little stands in the way of what comes next. Against an attacker that moves at machine speed and never tires, detect-and-respond on its own is a losing position. The economics that made it work no longer hold.

A better alarm is not the answer. It is structure built into the architecture so that an intrusion stays expensive at every phase, not just the first one. Separate the domains so that a foothold in one does not become the run of the whole environment. Give every element only the access and the function it actually needs. Make the boundaries explicit so they do not erode over time. Properties like these impose cost on the attacker regardless of how skilled, well-resourced, or fast they are, because the only way through is a direct assault on each part of the design. The failure mode in most breaches is not inadequate technology. It is inadequate engineering.

Where this leads

The same capability that is reshaping how we run an assessment is reshaping the threat our clients face. Both halves point to the same conclusion: systems have to be designed to hold up against attackers who now have AI on their side.

That is the subject of our Defending Complex Systems in the AI Era workshop, which is built for engineers and security teams who have to integrate AI into high-stakes systems and defend those systems against the same class of tools. If you are thinking about what machine-speed offense means for the way you build, that is where to start.

References

Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629.
"SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering." NeurIPS 2024.
Deng, G. et al. "PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing." USENIX Security 2024.
UK National Cyber Security Centre. "The near-term impact of AI on the cyber threat." January 2024.
NIST. "SP 800-160 Vol. 1 Rev. 1: Engineering Trustworthy Secure Systems." November 2022.