<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Harness on XEDCZQ Blog</title><link>https://xedczq.cn/en/tags/harness/</link><description>Recent content in Harness on XEDCZQ Blog</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Tue, 19 May 2026 11:29:42 +0800</lastBuildDate><atom:link href="https://xedczq.cn/en/tags/harness/index.xml" rel="self" type="application/rss+xml"/><item><title>Agent_Harness Engineering</title><link>https://xedczq.cn/en/post/agent_harness%E5%B7%A5%E7%A8%8B/</link><pubDate>Tue, 19 May 2026 11:29:42 +0800</pubDate><guid>https://xedczq.cn/en/post/agent_harness%E5%B7%A5%E7%A8%8B/</guid><description>&lt;h1 id="what-harness-engineering-actually-is"&gt;&lt;a href="#what-harness-engineering-actually-is" class="header-anchor"&gt;&lt;/a&gt;What Harness Engineering Actually Is
&lt;/h1&gt;&lt;p&gt;My conclusion after reading these articles side by side:&lt;/p&gt;
&lt;p&gt;Harness Engineering is not just about writing better prompts. It is about engineering all the capabilities around the model into an iterative system, so an agent can produce stable and verifiable outcomes during long-running tasks.&lt;/p&gt;
&lt;p&gt;One-line summary:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Agent = Model + Harness
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Harness = State management + Tooling + Constraints + Feedback loops + Execution orchestration
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The model provides intelligence. The harness makes that intelligence usable, controllable, and repeatable.&lt;/p&gt;
&lt;h2 id="shared-takeaways-across-the-articles"&gt;&lt;a href="#shared-takeaways-across-the-articles" class="header-anchor"&gt;&lt;/a&gt;Shared Takeaways Across the Articles
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Theme&lt;/th&gt;
 &lt;th&gt;Common Ground&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Definition of harness&lt;/td&gt;
 &lt;td&gt;Not the model itself, but surrounding code, configuration, process, tools, and validation mechanisms&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Goal&lt;/td&gt;
 &lt;td&gt;Reduce supervision cost, improve first-pass correctness, and support long-running execution&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Core method&lt;/td&gt;
 &lt;td&gt;Turn repeated failure modes into engineered assets: rules, tools, tests, and loops&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Main long-task challenge&lt;/td&gt;
 &lt;td&gt;Limited context windows, session interruption, state drift, and premature “done” claims&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Solution direction&lt;/td&gt;
 &lt;td&gt;Incremental task decomposition, state handoff, automated checks, observability, and continuous correction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="5-core-components-my-practical-view"&gt;&lt;a href="#5-core-components-my-practical-view" class="header-anchor"&gt;&lt;/a&gt;5 Core Components (My Practical View)
&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;Task scaffolding&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Clear decomposition strategy (one feature at a time)&lt;/li&gt;
&lt;li&gt;Clear Definition of Done (DoD) to avoid “looks finished” outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;State and memory&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Recoverable state: progress files, commit notes, change logs&lt;/li&gt;
&lt;li&gt;Reliable handoff between sessions instead of relying on model guessing&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Tools and environment&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Fast deterministic tools for agents (tests, lint, screenshots, logs)&lt;/li&gt;
&lt;li&gt;Self-serve context access instead of manual copy/paste&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Feedback and sensors&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Computational sensors: lint/typecheck/unit/e2e (fast, deterministic)&lt;/li&gt;
&lt;li&gt;Reasoning sensors: LLM review/semantic QA (slower, costlier, but useful for semantics)&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Scheduling and governance&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;After failure, do not only retry; improve capability&lt;/li&gt;
&lt;li&gt;Accumulate reusable rules in templates (&lt;code&gt;AGENTS.md&lt;/code&gt;, docs, checklists)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-harness-workflow-for-normal-webcoding-users"&gt;&lt;a href="#practical-harness-workflow-for-normal-webcoding-users" class="header-anchor"&gt;&lt;/a&gt;Practical Harness Workflow for Normal WebCoding Users
&lt;/h2&gt;&lt;p&gt;This is my compressed version for individual developers. You do not need multi-agent orchestration to start.&lt;/p&gt;
&lt;h3 id="step-0-define-done-first"&gt;&lt;a href="#step-0-define-done-first" class="header-anchor"&gt;&lt;/a&gt;Step 0: Define “Done” First
&lt;/h3&gt;&lt;p&gt;Create a one-page &lt;code&gt;SPEC.md&lt;/code&gt; for each feature:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User scenario&lt;/li&gt;
&lt;li&gt;Input and output&lt;/li&gt;
&lt;li&gt;Acceptance criteria&lt;/li&gt;
&lt;li&gt;Failure scenarios&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without this, agents tend to produce “confident but misaligned” output.&lt;/p&gt;
&lt;h3 id="step-1-create-minimal-harness-files"&gt;&lt;a href="#step-1-create-minimal-harness-files" class="header-anchor"&gt;&lt;/a&gt;Step 1: Create Minimal Harness Files
&lt;/h3&gt;&lt;p&gt;At least these 4 files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;: repository rules (commands, directory conventions, no-touch zones, commit style)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TASKS.md&lt;/code&gt;: feature backlog with &lt;code&gt;todo/doing/done&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PROGRESS.md&lt;/code&gt;: what was done, what is unfinished, next step&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CHECKLIST.md&lt;/code&gt;: unified acceptance checks (build, test, UI, performance, security)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="step-2-one-feature-per-iteration"&gt;&lt;a href="#step-2-one-feature-per-iteration" class="header-anchor"&gt;&lt;/a&gt;Step 2: One Feature Per Iteration
&lt;/h3&gt;&lt;p&gt;Execution pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pick one item from &lt;code&gt;TASKS.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Give the agent a bounded task&lt;/li&gt;
&lt;li&gt;Avoid “build the entire site in one go” requests&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This sharply reduces context chaos and regressions.&lt;/p&gt;
&lt;h3 id="step-3-let-the-agent-change-then-prove"&gt;&lt;a href="#step-3-let-the-agent-change-then-prove" class="header-anchor"&gt;&lt;/a&gt;Step 3: Let the Agent Change, Then Prove
&lt;/h3&gt;&lt;p&gt;Require the agent to output every round:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Files changed&lt;/li&gt;
&lt;li&gt;Why each change was made&lt;/li&gt;
&lt;li&gt;Commands executed&lt;/li&gt;
&lt;li&gt;Passed/failed checks&lt;/li&gt;
&lt;li&gt;Risk and rollback points&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This converts hidden reasoning into auditable execution traces.&lt;/p&gt;
&lt;h3 id="step-4-two-layer-validation-computational-first"&gt;&lt;a href="#step-4-two-layer-validation-computational-first" class="header-anchor"&gt;&lt;/a&gt;Step 4: Two-Layer Validation (Computational First)
&lt;/h3&gt;&lt;p&gt;Run at least:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;npm run lint
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;npm run &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;npm run build
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For frontend UI changes, also add:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Key path screenshot checks&lt;/li&gt;
&lt;li&gt;Manual critical interaction checklist&lt;/li&gt;
&lt;li&gt;Responsive checks on main breakpoints&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Rule: pass deterministic checks first, then do semantic review.&lt;/p&gt;
&lt;h3 id="step-5-convert-every-failure-into-harness-assets"&gt;&lt;a href="#step-5-convert-every-failure-into-harness-assets" class="header-anchor"&gt;&lt;/a&gt;Step 5: Convert Every Failure into Harness Assets
&lt;/h3&gt;&lt;p&gt;When agent output fails, do not only patch the immediate bug:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If it is a rule issue, add it to &lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If it is repeated execution, script it&lt;/li&gt;
&lt;li&gt;If it is quality drift, add it to &lt;code&gt;CHECKLIST.md&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goal: prevent the same class of errors from recurring.&lt;/p&gt;
&lt;h3 id="step-6-force-handoff-for-long-tasks"&gt;&lt;a href="#step-6-force-handoff-for-long-tasks" class="header-anchor"&gt;&lt;/a&gt;Step 6: Force Handoff for Long Tasks
&lt;/h3&gt;&lt;p&gt;If work spans more than one context window, generate a handoff containing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Current goal&lt;/li&gt;
&lt;li&gt;Completed work&lt;/li&gt;
&lt;li&gt;Remaining work&lt;/li&gt;
&lt;li&gt;Blockers&lt;/li&gt;
&lt;li&gt;First step for next round&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Store it in &lt;code&gt;PROGRESS.md&lt;/code&gt; or planning files, not only in chat history.&lt;/p&gt;
&lt;h3 id="step-7-run-a-release-grade-loop-before-merge"&gt;&lt;a href="#step-7-run-a-release-grade-loop-before-merge" class="header-anchor"&gt;&lt;/a&gt;Step 7: Run a Release-Grade Loop Before Merge
&lt;/h3&gt;&lt;p&gt;Before merge, run one unified cycle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Regression checks&lt;/li&gt;
&lt;li&gt;Critical user-path smoke tests&lt;/li&gt;
&lt;li&gt;Quick performance and error-log scan&lt;/li&gt;
&lt;li&gt;Agent self-review plus human spot-check&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This prevents “local pass, system-level failure.”&lt;/p&gt;
&lt;h3 id="step-8-weekly-harness-cleanup"&gt;&lt;a href="#step-8-weekly-harness-cleanup" class="header-anchor"&gt;&lt;/a&gt;Step 8: Weekly Harness Cleanup
&lt;/h3&gt;&lt;p&gt;Weekly maintenance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Remove stale rules&lt;/li&gt;
&lt;li&gt;Fix broken scripts&lt;/li&gt;
&lt;li&gt;Merge duplicate constraints&lt;/li&gt;
&lt;li&gt;Refresh docs index&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Harness is also code. Without maintenance, it decays.&lt;/p&gt;
&lt;h2 id="minimum-viable-harness-mvp-for-individuals"&gt;&lt;a href="#minimum-viable-harness-mvp-for-individuals" class="header-anchor"&gt;&lt;/a&gt;Minimum Viable Harness (MVP) for Individuals
&lt;/h2&gt;&lt;p&gt;If you want the fastest starting point, do this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Write 20-50 lines of hard rules in &lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Ask the agent to do only one feature per iteration&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;lint/test/build&lt;/code&gt; every round&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;PROGRESS.md&lt;/code&gt; each round&lt;/li&gt;
&lt;li&gt;Convert repeated failures into rules or scripts&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These five actions are usually enough to move from “using agents by feel” to “compounding engineering productivity.”&lt;/p&gt;
&lt;h2 id="my-practical-conclusion"&gt;&lt;a href="#my-practical-conclusion" class="header-anchor"&gt;&lt;/a&gt;My Practical Conclusion
&lt;/h2&gt;&lt;p&gt;Harness Engineering answers one core question:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;When an agent fails, do you supervise it repeatedly, or convert that failure into system capability?&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;The first consumes human time. The second compounds.&lt;/p&gt;
&lt;p&gt;For normal webcoding users, the key is not the fanciest model, but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do you have executable rules?&lt;/li&gt;
&lt;li&gt;Do you have automated feedback?&lt;/li&gt;
&lt;li&gt;Do you convert failures into deterministic advantages for the next run?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is the real value of harness engineering.&lt;/p&gt;
&lt;h2 id="references"&gt;&lt;a href="#references" class="header-anchor"&gt;&lt;/a&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;OpenAI: &lt;a class="link" href="https://openai.com/index/harness-engineering/" target="_blank" rel="noopener"
 &gt;Harness engineering: leveraging Codex in an agent-first world&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Anthropic: &lt;a class="link" href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" target="_blank" rel="noopener"
 &gt;Effective harnesses for long-running agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Anthropic: &lt;a class="link" href="https://www.anthropic.com/engineering/harness-design-long-running-apps" target="_blank" rel="noopener"
 &gt;Harness design for long-running application development&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;LangChain: &lt;a class="link" href="https://www.langchain.com/blog/the-anatomy-of-an-agent-harness" target="_blank" rel="noopener"
 &gt;The Anatomy of an Agent Harness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mitchell Hashimoto: &lt;a class="link" href="https://mitchellh.com/writing/my-ai-adoption-journey" target="_blank" rel="noopener"
 &gt;My AI Adoption Journey&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Martin Fowler: &lt;a class="link" href="https://martinfowler.com/articles/exploring-gen-ai/harness-engineering-memo.html" target="_blank" rel="noopener"
 &gt;Harness Engineering - first thoughts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Martin Fowler: &lt;a class="link" href="https://martinfowler.com/articles/harness-engineering.html" target="_blank" rel="noopener"
 &gt;Harness engineering for coding agent users&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>