<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Context on XEDCZQ Blog</title><link>https://xedczq.cn/en/tags/context/</link><description>Recent content in Context on XEDCZQ Blog</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Tue, 19 May 2026 16:35:00 +0800</lastBuildDate><atom:link href="https://xedczq.cn/en/tags/context/index.xml" rel="self" type="application/rss+xml"/><item><title>Agent_Context Engineering</title><link>https://xedczq.cn/en/post/agent_%E4%B8%8A%E4%B8%8B%E6%96%87%E5%B7%A5%E7%A8%8B/</link><pubDate>Tue, 19 May 2026 16:35:00 +0800</pubDate><guid>https://xedczq.cn/en/post/agent_%E4%B8%8A%E4%B8%8B%E6%96%87%E5%B7%A5%E7%A8%8B/</guid><description>&lt;h1 id="what-context-engineering-is"&gt;&lt;a href="#what-context-engineering-is" class="header-anchor"&gt;&lt;/a&gt;What Context Engineering Is
&lt;/h1&gt;&lt;p&gt;Context engineering can be defined as:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Injecting the &amp;ldquo;just-enough and highly relevant&amp;rdquo; information at every agent step, while continuously managing the lifecycle of that information.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If prompt engineering focuses on &amp;ldquo;how to phrase the task,&amp;rdquo; context engineering focuses on &amp;ldquo;what information to provide, in what order, and when to prune or rebuild it.&amp;rdquo;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="phase-1-passive-truncation-and-sliding-window-20202022--every-token-counts"&gt;&lt;a href="#phase-1-passive-truncation-and-sliding-window-20202022--every-token-counts" class="header-anchor"&gt;&lt;/a&gt;Phase 1: Passive Truncation and Sliding Window (2020–2022) — &amp;ldquo;Every Token Counts&amp;rdquo;
&lt;/h2&gt;&lt;h3 id="typical-characteristics"&gt;&lt;a href="#typical-characteristics" class="header-anchor"&gt;&lt;/a&gt;Typical Characteristics
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Context windows were generally small, and tokens were highly constrained.&lt;/li&gt;
&lt;li&gt;The default strategy was &amp;ldquo;truncate when over limit.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;A common implementation was sliding window (keep only the latest N turns).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-it-solved"&gt;&lt;a href="#what-it-solved" class="header-anchor"&gt;&lt;/a&gt;What It Solved
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Prevented immediate failure from overlong input.&lt;/li&gt;
&lt;li&gt;Preserved recent interaction and basic multi-turn continuity.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="core-problems"&gt;&lt;a href="#core-problems" class="header-anchor"&gt;&lt;/a&gt;Core Problems
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Early critical information was often dropped.&lt;/li&gt;
&lt;li&gt;Goal drift was severe in long tasks.&lt;/li&gt;
&lt;li&gt;Historical state could not be inherited reliably.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="phase-2-external-topology-introduction-20212023--the-birth-of-an-external-brain-rag"&gt;&lt;a href="#phase-2-external-topology-introduction-20212023--the-birth-of-an-external-brain-rag" class="header-anchor"&gt;&lt;/a&gt;Phase 2: External Topology Introduction (2021–2023) — &amp;ldquo;The Birth of an External Brain (RAG)&amp;rdquo;
&lt;/h2&gt;&lt;h3 id="typical-characteristics-1"&gt;&lt;a href="#typical-characteristics-1" class="header-anchor"&gt;&lt;/a&gt;Typical Characteristics
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;The paradigm shifted from &amp;ldquo;stuff everything into context&amp;rdquo; to &amp;ldquo;retrieve on demand then inject.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Vector retrieval and semantic recall became mainstream.&lt;/li&gt;
&lt;li&gt;RAG decoupled parametric knowledge from external knowledge.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-it-solved-1"&gt;&lt;a href="#what-it-solved-1" class="header-anchor"&gt;&lt;/a&gt;What It Solved
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Broke through the memory ceiling of single-window context.&lt;/li&gt;
&lt;li&gt;Reduced hallucinations by grounding responses with retrievable evidence.&lt;/li&gt;
&lt;li&gt;Enabled knowledge updates without retraining the model.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="core-problems-1"&gt;&lt;a href="#core-problems-1" class="header-anchor"&gt;&lt;/a&gt;Core Problems
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Retrieval quality remained unstable (missed recall, wrong recall).&lt;/li&gt;
&lt;li&gt;Attention dilution still occurred after retrieval chunks were merged.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Retrieved&amp;rdquo; did not necessarily mean &amp;ldquo;used correctly by the model.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="phase-3-fine-grained-compression-and-reordering-20232024--addressing-the-lost-in-the-middle-problem"&gt;&lt;a href="#phase-3-fine-grained-compression-and-reordering-20232024--addressing-the-lost-in-the-middle-problem" class="header-anchor"&gt;&lt;/a&gt;Phase 3: Fine-Grained Compression and Reordering (2023–2024) — &amp;ldquo;Addressing the Lost-in-the-Middle Problem&amp;rdquo;
&lt;/h2&gt;&lt;h3 id="typical-characteristics-2"&gt;&lt;a href="#typical-characteristics-2" class="header-anchor"&gt;&lt;/a&gt;Typical Characteristics
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;The community began to systematically focus on long-context utilization.&lt;/li&gt;
&lt;li&gt;Research and engineering attention increased around the Lost-in-the-Middle effect.&lt;/li&gt;
&lt;li&gt;Strategy evolved from &amp;ldquo;adding more context&amp;rdquo; to &amp;ldquo;compressing, reordering, and layered memory.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="common-methods"&gt;&lt;a href="#common-methods" class="header-anchor"&gt;&lt;/a&gt;Common Methods
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;History summarization (state snapshot / handoff summary)&lt;/li&gt;
&lt;li&gt;Tool-output pruning (keep recent critical rounds)&lt;/li&gt;
&lt;li&gt;Information reordering (place highest-priority evidence near strong attention zones)&lt;/li&gt;
&lt;li&gt;Task segmentation and stage-wise handoff&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-it-solved-2"&gt;&lt;a href="#what-it-solved-2" class="header-anchor"&gt;&lt;/a&gt;What It Solved
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Reduced middle-section information neglect.&lt;/li&gt;
&lt;li&gt;Improved long-task state continuity.&lt;/li&gt;
&lt;li&gt;Made cross-window agent execution more controllable.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="core-problems-2"&gt;&lt;a href="#core-problems-2" class="header-anchor"&gt;&lt;/a&gt;Core Problems
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Compression summaries could introduce information loss.&lt;/li&gt;
&lt;li&gt;Reordering rules were task-dependent and hard to generalize.&lt;/li&gt;
&lt;li&gt;Evaluation was required to verify post-compression executability.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="phase-4-ultra-long-context-and-infrastructure-caching-20242026-current--kv-cache-and-intelligent-memory"&gt;&lt;a href="#phase-4-ultra-long-context-and-infrastructure-caching-20242026-current--kv-cache-and-intelligent-memory" class="header-anchor"&gt;&lt;/a&gt;Phase 4: Ultra-Long Context and Infrastructure Caching (2024–2026, Current) — &amp;ldquo;KV Cache and Intelligent Memory&amp;rdquo;
&lt;/h2&gt;&lt;h3 id="typical-characteristics-3"&gt;&lt;a href="#typical-characteristics-3" class="header-anchor"&gt;&lt;/a&gt;Typical Characteristics
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Context windows continued to expand.&lt;/li&gt;
&lt;li&gt;Vendors and frameworks introduced stronger cache/reuse mechanisms.&lt;/li&gt;
&lt;li&gt;Agent systems moved from &amp;ldquo;context management&amp;rdquo; to &amp;ldquo;context infrastructure.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="common-capabilities"&gt;&lt;a href="#common-capabilities" class="header-anchor"&gt;&lt;/a&gt;Common Capabilities
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Prompt/prefix caching (reducing repeated token cost)&lt;/li&gt;
&lt;li&gt;Session state snapshots and resume&lt;/li&gt;
&lt;li&gt;Multi-layer memory architecture (short-term working memory + long-term external memory)&lt;/li&gt;
&lt;li&gt;Policy-based dynamic context construction&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-it-solved-3"&gt;&lt;a href="#what-it-solved-3" class="header-anchor"&gt;&lt;/a&gt;What It Solved
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Lowered long-chain cost and latency.&lt;/li&gt;
&lt;li&gt;Improved continuity in long-running tasks.&lt;/li&gt;
&lt;li&gt;Made memory management governable as an engineering subsystem.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="core-problems-3"&gt;&lt;a href="#core-problems-3" class="header-anchor"&gt;&lt;/a&gt;Core Problems
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Cost and system complexity increased.&lt;/li&gt;
&lt;li&gt;Memory contamination and stale-information governance became harder.&lt;/li&gt;
&lt;li&gt;Strong observability was required to diagnose context failure points.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="representative-industry-articles-and-references"&gt;&lt;a href="#representative-industry-articles-and-references" class="header-anchor"&gt;&lt;/a&gt;Representative Industry Articles and References
&lt;/h2&gt;&lt;p&gt;Below are high-value public references for context engineering:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Anthropic: &lt;a class="link" href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" target="_blank" rel="noopener"
 &gt;Effective context engineering for AI agents&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Clearly positions context engineering as the natural extension of prompt engineering.&lt;/li&gt;
&lt;li&gt;Emphasizes that reliability bottlenecks in agents are often in context construction, not single prompts.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Anthropic: &lt;a class="link" href="https://www.anthropic.com/research/prompting-long-context" target="_blank" rel="noopener"
 &gt;Prompt engineering for Claude&amp;rsquo;s long context window&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Early long-context practice guidance with concrete input-structuring patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Anthropic Docs: &lt;a class="link" href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips" target="_blank" rel="noopener"
 &gt;Long context prompting tips&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Practical implementation checklist style guidance.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;LangChain Docs: &lt;a class="link" href="https://docs.langchain.com/oss/python/langchain/context-engineering" target="_blank" rel="noopener"
 &gt;Context engineering in agents&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Implementation-oriented strategies for what to inject at each agent step.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Paper: &lt;a class="link" href="https://arxiv.org/abs/2307.03172" target="_blank" rel="noopener"
 &gt;Lost in the Middle: How Language Models Use Long Contexts&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Provides systematic evidence for degraded utilization of middle context.&lt;/li&gt;
&lt;li&gt;Directly influenced later compression/reordering practices.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Foundational RAG Paper: &lt;a class="link" href="https://arxiv.org/abs/2005.11401" target="_blank" rel="noopener"
 &gt;Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Established the mainstream retrieval+generation paradigm.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-problems-context-engineering-solves"&gt;&lt;a href="#what-problems-context-engineering-solves" class="header-anchor"&gt;&lt;/a&gt;What Problems Context Engineering Solves
&lt;/h2&gt;&lt;p&gt;This can be summarized into 6 core problem classes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Information selection&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Not all data should be provided; only context relevant to the current step.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Memory continuity&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Keep long tasks continuous across turns, windows, and sessions.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Cost and performance&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Control token spend, latency, and throughput by reducing low-value context.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Reliability&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Reduce missed evidence, state misreads, and repeated failed attempts.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Governance&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Make context policies (compression/retrieval/reordering) configurable, measurable, and iteratable.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Toolchain coordination&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Integrate context with RAG, caching, state machines, and orchestration systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One-line summary:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Context engineering is not about whether a model can answer once; it is about whether it can keep answering correctly, consistently, and cost-effectively in complex workflows.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="my-practical-conclusion"&gt;&lt;a href="#my-practical-conclusion" class="header-anchor"&gt;&lt;/a&gt;My Practical Conclusion
&lt;/h2&gt;&lt;p&gt;For agent projects, a pragmatic build order is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start with prompt engineering (clear task contract)&lt;/li&gt;
&lt;li&gt;Then add context engineering (information lifecycle management)&lt;/li&gt;
&lt;li&gt;Finally implement harness engineering (end-to-end execution loop)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you only do prompt engineering, long tasks remain fragile. If you skip context engineering and jump directly to harness engineering, complexity increases quickly and debugging becomes expensive.&lt;/p&gt;</description></item><item><title>Agent_Context Compression Prompt</title><link>https://xedczq.cn/en/post/agent_contextcompression/</link><pubDate>Fri, 15 May 2026 17:58:59 +0800</pubDate><guid>https://xedczq.cn/en/post/agent_contextcompression/</guid><description>&lt;h1 id="notes-on-agent-context-compression-design"&gt;&lt;a href="#notes-on-agent-context-compression-design" class="header-anchor"&gt;&lt;/a&gt;Notes on Agent Context Compression Design
&lt;/h1&gt;
 &lt;blockquote&gt;
 &lt;p&gt;Reference: &lt;a class="link" href="https://wakeup-jin.github.io/Practical-Guide-to-Context-Engineering/%E4%B8%8A%E4%B8%8B%E6%96%87%E7%AE%A1%E7%90%86/%E4%B8%8A%E4%B8%8B%E6%96%87%E5%8E%8B%E7%BC%A9%E6%8C%87%E4%BB%A4%EF%BC%9AClaudeCode%E4%B8%8EGemini%E7%9A%84%E5%8E%8B%E7%BC%A9%E6%8F%90%E7%A4%BA%E8%AF%8D%E8%A7%A3%E6%9E%90.html" target="_blank" rel="noopener"
 &gt;Context Compression Instruction: Prompt Analysis of Claude Code and Gemini&lt;/a&gt;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="what-problem-does-context-compression-solve"&gt;&lt;a href="#what-problem-does-context-compression-solve" class="header-anchor"&gt;&lt;/a&gt;What Problem Does Context Compression Solve?
&lt;/h2&gt;&lt;p&gt;An agent’s context window is not infinite. As multi-turn conversations, tool calls, file reads, error logs, and code diffs accumulate, the model gradually approaches the token limit. The goal of context compression is not simply to “make it shorter,” but to preserve task continuity while reorganizing history into a state that the next agent turn can continue from.&lt;/p&gt;
&lt;p&gt;I treat context compression as a work handoff:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep what the user is actually trying to accomplish&lt;/li&gt;
&lt;li&gt;Keep project constraints, tech stack, and key decisions&lt;/li&gt;
&lt;li&gt;Keep file states that were read, modified, or created&lt;/li&gt;
&lt;li&gt;Keep errors, fixes, and unresolved issues&lt;/li&gt;
&lt;li&gt;Drop repetitive, outdated, and noisy tool outputs&lt;/li&gt;
&lt;li&gt;Let the next context window continue execution instead of re-exploring&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A good compression system should answer three questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When to compress: scheduling strategy based on token thresholds, message length, tool output size, etc.&lt;/li&gt;
&lt;li&gt;What to compress: user messages, system constraints, tool results, file states, or plans&lt;/li&gt;
&lt;li&gt;How to compress: LLM summarization, rule-based trimming, retrieval reconstruction, or a hybrid approach&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="classic-approach-1-llm-summarization-compression"&gt;&lt;a href="#classic-approach-1-llm-summarization-compression" class="header-anchor"&gt;&lt;/a&gt;Classic Approach 1: LLM Summarization Compression
&lt;/h2&gt;&lt;p&gt;Both Claude Code and Gemini CLI follow a core idea: when context is too long, pass history to a model and let it output a structured summary. This summary becomes the core memory in the next context window.&lt;/p&gt;
&lt;p&gt;The advantage is strong semantic retention: goals, constraints, errors, and plans scattered across long history can be reorganized. The downside is that quality depends on prompt design. A weak prompt may lose file paths, snippets, user preferences, or unfinished tasks.&lt;/p&gt;
&lt;h3 id="claude-code-style-detailed-structured-handoff"&gt;&lt;a href="#claude-code-style-detailed-structured-handoff" class="header-anchor"&gt;&lt;/a&gt;Claude Code Style: Detailed Structured Handoff
&lt;/h3&gt;&lt;p&gt;Claude Code-style compression is closer to a full handoff document. It emphasizes chronological analysis and focuses on user requests, technical details, file changes, error handling, and next steps.&lt;/p&gt;
&lt;p&gt;Suggested fields:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Field&lt;/th&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Primary requests and intent&lt;/td&gt;
 &lt;td&gt;Preserve the initial user goal and later intent shifts&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Key technical concepts&lt;/td&gt;
 &lt;td&gt;Record stack, frameworks, architecture patterns, dependencies&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Files and code sections&lt;/td&gt;
 &lt;td&gt;Track read/modified/created files and key snippets&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Errors and fixes&lt;/td&gt;
 &lt;td&gt;Prevent repeating the same mistakes after compression&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Problem-solving status&lt;/td&gt;
 &lt;td&gt;Separate resolved issues from ongoing debugging&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;User messages&lt;/td&gt;
 &lt;td&gt;Preserve original feedback to reduce intent distortion&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Pending tasks&lt;/td&gt;
 &lt;td&gt;Make remaining work explicit&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Current work state&lt;/td&gt;
 &lt;td&gt;Capture what was in progress before compression&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Optional next steps&lt;/td&gt;
 &lt;td&gt;Keep only directly relevant follow-up actions&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The point is not “a pretty summary,” but “a handoff that can keep coding.” In coding-agent workflows, file paths, function names, test commands, failed logs, and user corrections are critical.&lt;/p&gt;
&lt;p&gt;Compression template:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Please compress the conversation history into a handoff summary that can continue execution.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Must keep:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1. User’s primary goals and explicit requests
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2. Tech stack, architecture constraints, and key decisions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;3. Files read/modified/created/deleted and why
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4. Key code snippets, function signatures, config items
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;5. Encountered errors, failure logs, and fixes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;6. Important user feedback and preferences
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;7. Completed items, pending items, and current pause point
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;8. Next-step suggestions directly related to the current task only
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Must remove:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1. Repetitive explanations
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2. Outdated tool outputs
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;3. Intermediate attempts that no longer help
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4. Irrelevant small talk
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="gemini-cli-style-state-snapshot"&gt;&lt;a href="#gemini-cli-style-state-snapshot" class="header-anchor"&gt;&lt;/a&gt;Gemini CLI Style: State Snapshot
&lt;/h3&gt;&lt;p&gt;Gemini CLI-style compression is more like generating a compact &lt;code&gt;state_snapshot&lt;/code&gt;. It uses fewer fields but packs higher density.&lt;/p&gt;
&lt;p&gt;Typical fields:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Field&lt;/th&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;overall_goal&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;One-line high-level user objective&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;key_knowledge&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Facts, constraints, and conventions that must be remembered&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;file_system_state&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Created/read/modified/deleted file state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;recent_actions&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Recent key actions and outcomes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;current_plan&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Current plan and progress&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This style works well as a runtime snapshot, especially for recovery after interruption. It is shorter than the Claude-style handoff but requires stricter detail retention.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-xml" data-lang="xml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;&amp;lt;state_snapshot&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;lt;overall_goal&amp;gt;&lt;/span&gt;User&amp;#39;s current high-level goal&lt;span class="nt"&gt;&amp;lt;/overall_goal&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;lt;key_knowledge&amp;gt;&lt;/span&gt;Critical facts, constraints, preferences, technical decisions&lt;span class="nt"&gt;&amp;lt;/key_knowledge&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;lt;file_system_state&amp;gt;&lt;/span&gt;File read/modify/create/delete state&lt;span class="nt"&gt;&amp;lt;/file_system_state&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;lt;recent_actions&amp;gt;&lt;/span&gt;Recent important actions and outcomes&lt;span class="nt"&gt;&amp;lt;/recent_actions&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;lt;current_plan&amp;gt;&lt;/span&gt;Current plan, completed steps, pending steps&lt;span class="nt"&gt;&amp;lt;/current_plan&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;&amp;lt;/state_snapshot&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="classic-approach-2-tool-message-trimming"&gt;&lt;a href="#classic-approach-2-tool-message-trimming" class="header-anchor"&gt;&lt;/a&gt;Classic Approach 2: Tool Message Trimming
&lt;/h2&gt;&lt;p&gt;In real agent systems, the biggest token consumer is often tool output, not user text or assistant replies. File reads, code search, test runs, and logs can explode token usage.&lt;/p&gt;
&lt;p&gt;So tool-message trimming is highly practical:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep system messages&lt;/li&gt;
&lt;li&gt;Keep normal user and assistant messages&lt;/li&gt;
&lt;li&gt;Remove outdated tool calls and tool outputs&lt;/li&gt;
&lt;li&gt;Keep only the last N tool rounds&lt;/li&gt;
&lt;li&gt;Summarize key tool outputs before deleting raw long outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A common policy: identify all tool rounds, keep only the last &lt;code&gt;N&lt;/code&gt;, and remove older tool-related messages.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-ts" data-lang="ts"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;MessageRole&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;system&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;user&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;assistant&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;tool&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;role&lt;/span&gt;: &lt;span class="kt"&gt;MessageRole&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;content&lt;/span&gt;: &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;tool_calls?&lt;/span&gt;: &lt;span class="kt"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;tool_call_id?&lt;/span&gt;: &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CompressionOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;enabled&lt;/span&gt;: &lt;span class="kt"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;keepLastToolRounds&lt;/span&gt;: &lt;span class="kt"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;compressToolMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;: &lt;span class="kt"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;options&lt;/span&gt;: &lt;span class="kt"&gt;CompressionOptions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kr"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolRounds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;identifyToolRounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kr"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;roundsToKeep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolRounds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;keepLastToolRounds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kr"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keepIndexes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;roundsToKeep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;flatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;indexes&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;system&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;keepIndexes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kr"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isToolRelated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;tool&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;assistant&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;isToolRelated&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key decision is whether a tool output still helps future decisions. If it has already been absorbed into conclusions or is only exploratory noise, remove it. If it is a fresh test result, key error log, or important file content, keep or summarize it first.&lt;/p&gt;
&lt;h2 id="classic-approach-3-middle-drop-oldest-drop-and-hybrid-strategy"&gt;&lt;a href="#classic-approach-3-middle-drop-oldest-drop-and-hybrid-strategy" class="header-anchor"&gt;&lt;/a&gt;Classic Approach 3: Middle Drop, Oldest Drop, and Hybrid Strategy
&lt;/h2&gt;&lt;p&gt;Besides LLM summarization, rule-based algorithms can also trim messages directly. They are more controllable and cheaper, but weaker in semantic understanding.&lt;/p&gt;
&lt;p&gt;Three common methods:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Strategy&lt;/th&gt;
 &lt;th&gt;Method&lt;/th&gt;
 &lt;th&gt;Best for&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Middle drop&lt;/td&gt;
 &lt;td&gt;Keep head and tail, remove middle&lt;/td&gt;
 &lt;td&gt;Head has constraints, tail has current work&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Oldest drop&lt;/td&gt;
 &lt;td&gt;Remove earliest messages first&lt;/td&gt;
 &lt;td&gt;Long-running sessions where recent context matters most&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Hybrid&lt;/td&gt;
 &lt;td&gt;Choose dynamically by conversation shape&lt;/td&gt;
 &lt;td&gt;Mixed workloads and different model limits&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="middle-drop"&gt;&lt;a href="#middle-drop" class="header-anchor"&gt;&lt;/a&gt;Middle Drop
&lt;/h3&gt;&lt;p&gt;Works well when history has this structure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Head: system prompt, project rules, user goals
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Middle: heavy tool usage, search process, trial-and-error
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Tail: current issue, latest code, latest errors
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Advantage: keeps task framing and current working context. Risk: key decisions may be lost if the middle is removed without summarization.&lt;/p&gt;
&lt;h3 id="oldest-drop"&gt;&lt;a href="#oldest-drop" class="header-anchor"&gt;&lt;/a&gt;Oldest Drop
&lt;/h3&gt;&lt;p&gt;This is a sliding-window style approach. It assumes the newest messages are most relevant.&lt;/p&gt;
&lt;p&gt;Advantage: simple and effective for continuity in long sessions. Risk: early constraints, architecture decisions, or initial goals may be dropped.&lt;/p&gt;
&lt;h3 id="hybrid-strategy"&gt;&lt;a href="#hybrid-strategy" class="header-anchor"&gt;&lt;/a&gt;Hybrid Strategy
&lt;/h3&gt;&lt;p&gt;Dynamic selection can use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compression ratio target (current tokens vs target)&lt;/li&gt;
&lt;li&gt;Total message count&lt;/li&gt;
&lt;li&gt;Share of recent-message tokens&lt;/li&gt;
&lt;li&gt;Presence of long messages&lt;/li&gt;
&lt;li&gt;Presence of system messages&lt;/li&gt;
&lt;li&gt;Heavy tool-message density&lt;/li&gt;
&lt;li&gt;Model context window size&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A practical decision table:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Condition&lt;/th&gt;
 &lt;th&gt;Recommended strategy&lt;/th&gt;
 &lt;th&gt;Why&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Light compression + short dialogue&lt;/td&gt;
 &lt;td&gt;Middle drop&lt;/td&gt;
 &lt;td&gt;Head and tail are often most important&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Heavy compression + very long dialogue&lt;/td&gt;
 &lt;td&gt;Oldest drop&lt;/td&gt;
 &lt;td&gt;Recent context usually has higher priority&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Recent messages dominate tokens&lt;/td&gt;
 &lt;td&gt;Middle drop&lt;/td&gt;
 &lt;td&gt;Protect the current working context&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;System/tool-heavy history&lt;/td&gt;
 &lt;td&gt;Middle drop&lt;/td&gt;
 &lt;td&gt;Keep opening rules and latest state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Uncertain&lt;/td&gt;
 &lt;td&gt;Try both and score&lt;/td&gt;
 &lt;td&gt;Data-driven selection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A simple score:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;efficiency_score = token_reduction_ratio * 0.6 + message_retention_ratio * 0.4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the system prioritizes staying under target tokens, increase token-reduction weight. If it prioritizes context continuity, increase retention weight.&lt;/p&gt;
&lt;h2 id="recommended-hybrid-compression-architecture"&gt;&lt;a href="#recommended-hybrid-compression-architecture" class="header-anchor"&gt;&lt;/a&gt;Recommended Hybrid Compression Architecture
&lt;/h2&gt;&lt;p&gt;A single method is usually not robust enough. For coding agents, I prefer a combined pipeline:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Raw history
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Token and structure statistics
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Compression threshold check
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Trim outdated tool messages
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;LLM structured summary for key history
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Generate state snapshot / handoff summary
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Rebuild next context window
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I usually preserve four layers:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Layer&lt;/th&gt;
 &lt;th&gt;Content&lt;/th&gt;
 &lt;th&gt;Storage&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Stable rules layer&lt;/td&gt;
 &lt;td&gt;System prompt, project rules, security constraints&lt;/td&gt;
 &lt;td&gt;Persistent prompt/rule files&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Working memory layer&lt;/td&gt;
 &lt;td&gt;Current goal, plan, TODOs, user preferences&lt;/td&gt;
 &lt;td&gt;Structured summary&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Evidence layer&lt;/td&gt;
 &lt;td&gt;Latest tool results, key errors, key snippets&lt;/td&gt;
 &lt;td&gt;Last N tool rounds or summarized evidence&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;External knowledge layer&lt;/td&gt;
 &lt;td&gt;Docs, codebase, history&lt;/td&gt;
 &lt;td&gt;RAG / file retrieval&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Rebuilt context layout:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;System prompt
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Project rules
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Compression preface
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Structured summary
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Recent full conversation rounds
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Recent key tool results
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Current user request
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The “recent full rounds” part is important. Summaries keep the big picture, but recent raw turns often carry subtle intent, tone, corrections, and boundary conditions.&lt;/p&gt;
&lt;h2 id="compression-prompt-design-principles"&gt;&lt;a href="#compression-prompt-design-principles" class="header-anchor"&gt;&lt;/a&gt;Compression Prompt Design Principles
&lt;/h2&gt;&lt;p&gt;The goal is not to let the model freestyle. It is to enforce a stable handoff format.&lt;/p&gt;
&lt;p&gt;Recommended prompt constraints:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explicit role: you are a context compressor, not an executor&lt;/li&gt;
&lt;li&gt;Explicit goal: generate a state that the next agent can continue from&lt;/li&gt;
&lt;li&gt;Explicit retention: goals, constraints, files, code, errors, plan, user feedback&lt;/li&gt;
&lt;li&gt;Explicit deletion: repetition, irrelevant tool output, small talk, intermediate noise&lt;/li&gt;
&lt;li&gt;Explicit output format: Markdown, XML, JSON, or custom tags&lt;/li&gt;
&lt;li&gt;Explicit prohibition: do not fabricate file states, do not invent decisions, do not execute next steps&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Practical prompt template:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;You are the context compressor for an agent.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Please compress the conversation history into a Chinese handoff summary.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;This summary will be the primary context for continuing execution in the next context window.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Must keep:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- User goals, explicit requests, and important feedback
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Tech stack, project constraints, architecture decisions, tool preferences
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- File paths read/modified/created/deleted
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Key code snippets, function names, config items, commands
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Encountered errors, failed tests, and fixes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Completed tasks, pending tasks, and current pause point
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Next-step suggestions directly relevant to the current task
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Must remove:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Repetitive explanations
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Irrelevant small talk
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Tool output with no further value
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Intermediate attempts that do not affect final decisions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Do not fabricate information not present in history.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Do not execute tasks. Only output the compressed summary.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="engineering-implementation-notes"&gt;&lt;a href="#engineering-implementation-notes" class="header-anchor"&gt;&lt;/a&gt;Engineering Implementation Notes
&lt;/h2&gt;&lt;h3 id="trigger-timing"&gt;&lt;a href="#trigger-timing" class="header-anchor"&gt;&lt;/a&gt;Trigger Timing
&lt;/h3&gt;&lt;p&gt;Compression can be triggered when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tokens exceed 70% to 85% of model context limit&lt;/li&gt;
&lt;li&gt;Single tool output exceeds threshold&lt;/li&gt;
&lt;li&gt;Tool call rounds exceed threshold&lt;/li&gt;
&lt;li&gt;A task phase ends and a handoff is needed&lt;/li&gt;
&lt;li&gt;User explicitly requests &lt;code&gt;/compact&lt;/code&gt; or equivalent command&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="compression-order"&gt;&lt;a href="#compression-order" class="header-anchor"&gt;&lt;/a&gt;Compression Order
&lt;/h3&gt;&lt;p&gt;Recommended order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Remove obviously low-value tool output&lt;/li&gt;
&lt;li&gt;Keep the last N complete conversation rounds&lt;/li&gt;
&lt;li&gt;Generate structured summaries for older messages&lt;/li&gt;
&lt;li&gt;Rebuild context with summary + rules + recent rounds&lt;/li&gt;
&lt;li&gt;Record metrics: pre/post token count, dropped message count, kept tool rounds&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="risk-control"&gt;&lt;a href="#risk-control" class="header-anchor"&gt;&lt;/a&gt;Risk Control
&lt;/h3&gt;&lt;p&gt;The most common failure is not “insufficient compression,” but “loss of critical facts.”&lt;/p&gt;
&lt;p&gt;Especially avoid:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Losing explicit user constraints&lt;/li&gt;
&lt;li&gt;Losing file paths&lt;/li&gt;
&lt;li&gt;Losing the latest error message&lt;/li&gt;
&lt;li&gt;Losing failed attempts that should not be repeated&lt;/li&gt;
&lt;li&gt;Turning assumptions into facts&lt;/li&gt;
&lt;li&gt;Mixing completed tasks with pending tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I prefer to keep explicit state labels in summaries:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[Done] Fixed login form validation
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[Failed attempt] Direct schema change breaks legacy API
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[Pending confirmation] Whether to keep legacy export format
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[Next] Run pnpm test for auth module verification
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="my-takeaway"&gt;&lt;a href="#my-takeaway" class="header-anchor"&gt;&lt;/a&gt;My Takeaway
&lt;/h2&gt;&lt;p&gt;Context compression is fundamentally an agent memory-management and handoff system. Claude Code-style compression is better for full development-context retention. Gemini CLI-style compression is better for high-density state snapshots. Tool-message trimming is the most direct way to reduce token noise.&lt;/p&gt;
&lt;p&gt;If I were implementing a stable agent compression module, I would prioritize this combination:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Keep recent conversation rounds intact
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;+ Trim outdated tool messages
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;+ LLM structured summary
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;+ File state snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;+ Current plan and TODO list
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;+ Compression metrics and observability logs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The final objective is not the shortest context. It is that after compression, the agent still knows: what the user wants, what the project is, what has been done, what has failed, where it stopped, and what should happen next.&lt;/p&gt;</description></item></channel></rss>