Zerostack – A Unix-inspired coding agent written in pure Rust

(crates.io)

547 points | by gidellav 1 day ago

70 comments

parhamn 1 day ago
I (somewhat jokingly) wrote one recently too... https://github.com/pnegahdar/nano in under 200 lines. Repl, sessions, non-interactive, approvals, etc
The smarter the models get the less the harnesses matter (outside of devx).
Maybe one day I'll run it through swebech.
[-]
- freakynit 1 day ago
  So freaking cool..in just 200 (190 actually) lines.
  I also wrote one by myself last week (just for fun and learning). It works, including integration with configured mcpServers (like you do in most coding agents). Wrote about the whole step-by-step process and what is needed at what step and why: https://nb1t.sh/building-a-real-agent-step-by-step/
- tasuki 17 hours ago
  Ok, I know it's a joke. And also, are you daily-driving it?
  [-]
  - parhamn 13 hours ago
    Not daily driver, but have used it as a utility a few times.
    For my daily work I like letting different harnesses compete and look over each others work (while subsidized with the subscriptions) so I use OpenADE.
- mgfist 1 day ago
  I like it
rullopat 22 hours ago
I understand the need for memory footprint in some situations, but what's the point of seeking performance for a software that mostly calls LLMs and waits?
[-]
- tjoff 22 hours ago
  Before I tried coding agents my guess would have been: none.
  But seeing how slow claude code and copilot cli are and how much ram they use I'm flabbergasted. If you have long running sessions they can both take tens pf gigabytes of ram and feel quite sluggish.
  [-]
  - i_am_a_peasant 21 hours ago
    huh. my evidence with codex hasn’t been so bad. and tbh why would i discourage anyone from coding. hack away mr hacker. your solution will either sink or swim
    [-]
    - krzyk 20 hours ago
      codex is in rust and not in power and memory hungry js/ts.
      [-]
      - i_am_a_peasant 19 hours ago
        oh sweet I had no idea. funny that i mostly use it to write rust
        [-]
        dorian-graph 14 hours ago
        It was previously JS/TS, but they rewrote it in Rust, sometime in the past 12 months.
        manmal 14 hours ago
        Check out its app-server, IMO it’s a decent foundation to the codex clients.
  - crabmusket 19 hours ago
    I've been playing with running Claude Code inside a Vagrant VM. I can't be certain it was getting OOM killed when I allowed the VM 4GB of RAM, but when I went to 16 it did seem to be more stable...
    [-]
    - yjftsjthsd-h 18 hours ago
      > I can't be certain it was getting OOM killed when I allowed the VM 4GB of RAM
      Of it's actually getting OOMed (and not backing off by itself), I'm pretty sure that's logged in dmesg. Or earlyoom or systemd-oomd if userspace is in play and getting there first.
      [-]
      - crabmusket 10 hours ago
        Thanks for the tip, I will probably try shrinking it back to 4 to see, as that seems like it should be enough RAM for anybody (:
  - Mjarvis 20 hours ago
    Yes...exactly. Its frustrating and inefficient.
  - mpalmer 15 hours ago
    The appetite for Rust is the appetite for higher guardrails. Automatic memory management in safe Rust makes it less likely your app bloats even as its source balloons.
    The people "writing" agents are not themselves experts in how to write performant code. Claude Code is so massive and ugly it can only be realistically maintained by continuing to throw LLMs at it. But that's not a replacement for good software design.
  - adabsurdo 20 hours ago
    [dead]
- mapcars 22 hours ago
  I see spreading Rust as an overall good thing, because it changes benchmark on how software should feel in terms of performance, stability, memory footprint.
  So even if it doesn't create tangible advantage in a particular use case - its still good for the whole industry.
  [-]
  - GodelNumbering 21 hours ago
    I haven't used Rust extensively but my feeling is, if you change the design (which inevitably happens in many early stage projects), the refactoring takes more time due to borrow-checker semantics. Although I am far from a representative sample and could well have been using it wrong
    [-]
    - ijustlovemath 20 hours ago
      When you write Rust long enough you settle on certain architectures (message passing, event loops) that go well with the borrow checker, and don't end up thinking about it too much. Plus you can always throw an agent at the first set of errors from the refactor and let the compiler guide the annoying parts.
      [-]
      - bheadmaster 19 hours ago
        > When you write Rust long enough you settle on certain architectures (message passing, event loops) that go well with the borrow checker
        So basically Go?
        [-]
        flossly 19 hours ago
        Go only provides one concurrency paradigm. Rust support many (if not all).
        The type system of Go is very weak. I'd say that'd be my main reason to pass on Go, even when the concurrency paradigm fits the project perfectly.
        [-]
        jen20 16 hours ago
        The biggest reason to pass on Go right now (if your software can tolerate a runtime) is the lack of algebraic data types when doing interesting domain modeling. It makes such a huge difference it’s worth tolerating the pain points of Rust (or Swift, or F#) just to have them.
        ijustlovemath 16 hours ago
        Traits, Enums, and Typestate allow much richer paradigms at much lower cost
    - eldenring 21 hours ago
      Its just not a thing to consider and doesn't happen often.
  - amelius 21 hours ago
    No because it means people will use Rust for the wrong reasons.
    Systems programming is only a tiny fraction of code out there.
    Approaching every problem as a systems programming problem is a massive waste of resources and intellect.
    [-]
    - angusturner 21 hours ago
      For small to medium projects, an LLM can write functional (if not well crafted) Rust.
      Considering how easy this is now, why choose a heavier, slower and less typesafe language?
      [-]
      - zingar 45 minutes ago
        Edit: I lost the context that this is about building devtools where you can’t just throw more hardware at the problem. But perhaps my answer still explains the reality: anthropic builds Claude with Claude so Claude needs to be easy to build with Claude.
        Easier to read for humans is easy to read for LLMs. A more expressive language will bring about fewer misunderstandings when you apply stochastic tools like LLMs.
        Just be sure you don’t choose something heavier/slower that is not more expressive.
      - amelius 18 hours ago
        Ok, so write your app in the garbage collected language, and then tell the LLM to translate it to Rust :)
      - Wowfunhappy 17 hours ago
        I find it kind of shocking that Anthropic doesn't see it this way.
        [-]
        pojzon 13 hours ago
        Claude Code has whole game engine built into it. God knows why.
        [-]
        attentive 7 hours ago
        Tell us more.
        [-]
        pojzon 7 minutes ago
        [dead]
      - singpolyma3 20 hours ago
        Could choose a similar weight, similar speed, equal or more typesafe language though :)
        [-]
        galangalalgol 17 hours ago
        Ada? Other than c and c++ everything else benchmarks 2-4 times slower than rust for compute bound tasks, even after jit warmup. I'm up for ada though, especially with an llm where I don't have to type all that verbose syntax.
        [-]
        singpolyma3 17 hours ago
        OCaml? Haskell? Idris?
        Lots of options with no jit or warmup
        [-]
        galangalalgol 17 hours ago
        I'm not against jit or warmup, just saying it doesn't actually catch up for compute bound tasks in my experience. Haskell and ocaml would definitely be next on my list, but they do take a very good hit in performance over ada or rust. I wouldn't say they were similar in performance, certainly. There is a pretty big cliff between the systems languages and everything else performance-wise. For a lot of things it doesn't matter I know, but none of those things are domains I've ever worked in. I've never had a project in my professional career where we didn't descope requirements to fit the available compute.
    - tcfhgj 21 hours ago
      it saves a lot of resources - for instance my devices would probably use less than half of the memory it uses now and I wouldn't hear the fan.
      [-]
      - amelius 18 hours ago
        You won't hear the fan because you're still building it.
        The resources I was talking about are developers × time.
        [-]
        tcfhgj 18 hours ago
        I am talking about using software - if software is used by many people, that's the more relevant resource usage.
        [-]
        lobocinza 11 hours ago
        It is a common trend for companies to optimize for visible CapEx at the cost of increased but invisible OpEx for consumers.
  - gf000 21 hours ago
    How is it any faster than something written in say, Java?
    [-]
    - tcfhgj 21 hours ago
      latency and throughput (when with Java the system is crying for more memory while it's chilling in the Rust case)
      [-]
      - gf000 20 hours ago
        What's the latency difference between a long running process issuing a network call in Java vs rust? This is such a short time that it is completely overshadowed by noise (OS doing something else, what other software is running etc)
        As for throughput: you have 1-2 requests going at a time, the next one waiting for the reply. What throughput are we talking about?
        That's like speeding to the post office and expecting your letter to get to the recipient faster.
        [-]
        tcfhgj 20 hours ago
        you seem to specifically aim at the current example, but mine wasn't
        Anyways, consider how higher memory usage can affect the systems performance dramatically once the system needs to start swapping memory to disk signficantly
        [-]
        hnlmorg 19 hours ago
        If you cannot write a simple Java agent without consuming so much RAM that your system is swapping then that really says more about the developer than anything.
        Java is used in plenty of embedded systems and other memory constrained environments. Yes, it’s not going to perform well compared with Rust, but that doesn’t mean it’s an Electron-equivalent bloated clusterfuck of an ecosystem that’s going to eat all your system resources.
        [-]
        tcfhgj 18 hours ago
        > so much
        1) the agent is probably not the only thing running on the system, so more is just worse generally
        2) I am fine if a developer needs Rust or similar to write a resource efficient app. I wonder what the developer could achieve when he put the optimization effort into the Rust app instead.
        [-]
        hnlmorg 17 hours ago
        My point is that Java isn’t going to be the application that sends your machine into swap hell.
        People are so narrow minded about programming on this forum. They talk as if only Rust fills the void between unsafe C and node.js behemoths. But the reality is there are a plethora of other good languages out there too.
        gf000 17 hours ago
        Of course, what would be a point of talking about an overly specific statement that has no relevance here?
        mejutoco 13 hours ago
        > That's like speeding to the post office and expecting your letter to get to the recipient faster.
        I mean, the post office is not a magic box. Actual people will take your letter somewhere, sometimes batching sends. So running to the post office might actually get your letter in an earlier batch, same as ordering on amazon or your online supermarket in the morning or in the evening might change the delivery time.
        Pedantic, I know, but interesting example.
      - ink-splatters 20 hours ago
        You can tune java runtime in many ways, achieving impressive throughput/latency for your type of workload.
        Next to none of them will get you nearly as good cold start times as of native app, if using free java.
        There was GraalVM and its ecosystem which included Java Native Image - first thing I’d evaluate if thought about non-server side, performant Java application.
        But it all had been sadly swept away by Oracle from free tier.
        [-]
        flossly 19 hours ago
        I use GraalVM and Native Image now and while the project --a small CLI tool-- is tiny (2kLOC with mainly AWS-SDK deps) the compile times are huge (~3 minutes), the OS-dependencies many (so much I use a build container to ease the burden of installing all) and the resulting binary is huge (~60MB).
        But then it distributes as one binary and starts in milliseconds.
        Rust would have been a better fit (cargo-and-done, smaller binary, quicker to compile); but I wanted to use Kotlin as we use in all other projects.
        gf000 17 hours ago
        It hasn't been swept away by Oracle, far from it. It's development is just no longer coupled to the OpenJDK release cycle, which benefits both projects.
- tornikeo 22 hours ago
  Simplest explanation I could come up with: Just for hype and fun.
  Rewriting things in rust is "cool". Bun did it, other projects did it. Therefore, writing a coding agent in one should be cool too.
  And apparently enough HN crowd agrees with it to take the #1 spot on the board.
  [-]
  - GodelNumbering 21 hours ago
    For the most part, doing things right in the given language matters more than change of language. A lot of refactors in Rust (in the coding agent space) I see jump straight to Rust without considering what inefficiencies can be addressed before changing the language.
    Having said that, I considered a Go/Rust rewrite of Dirac (https://github.com/dirac-run/dirac) for some modules to support cases when someone wants to run like 30 agents, but it quickly became obvious that, a) while the node event loop is a bottleneck, it is not the sole bottleneck and b) if you have a VSCode extension, you can't totally get rid of TypeScript, so it just becomes the case of bi-lingual project and the maintenance burden that comes with it
  - flossly 19 hours ago
    Rust is just another language. Sure it's cooler than some langs, to some ppl. Sure.
    The author made the choice. Open sourced it (thanks!). So now we all enjoy more options. Saying author did so because "cool" does not sit well with me. It's feels like you get a no-strings attached gift of significant value and then going saying the giver gave it to be seen as cool.
- joelthelion 22 hours ago
  Opencode can be surprisingly hard on the CPU (could be an issue when coding on battery or a weak remote VM), and uses a lot of RAM. A little competition is always welcome.
- wint3rmute 22 hours ago
  Even a simple coding agent TUI should work instantenously, which I sadly cannot say is true about typescript-based applications like Claude Code or Gemini.
  After switching away from GNOME Terminal + Zsh to Ghostty + Nushell, I started to appreciate how instant everything feels. Why not make everything just as fast?
  [-]
  - itsdavesanders 21 hours ago
    I have to say this is one of my favorite things about local Qwen and Qwen code, it seems a heck of a lot faster that Claude and feels better to work with.
    Problem is it is nowhere near as smart, so what speed I get in conversation gets killed by iteration.
- jwxz 21 hours ago
  I didn't see anyone mention this, but I think having a single binary is much nicer than having a JS (or Python) program sprawled all over your system.
  [-]
  - ink-splatters 20 hours ago
    Having single binary output is completely different problem and is solved for both Python and typescript (bun supports the later).
    [-]
    - jwxz 1 hour ago
      That's true, but it's not quite the same thing. The single binary you're referring to is the interpreter and source code packaged together (at least for TS/JS).
      If you install too many of these "single binaries" then at some point you would be better off just having a single interpreter and using npm/pip.
      By contrast the Rust binary only contains the machine code for this program and can be directly executed.
    - crabmusket 19 hours ago
      Node and Deno can also bundle apps into a single executable.
- flossly 19 hours ago
  Over time software grows. Once big rewriting it in another language is hard and gets harder as the project grows in size.
  Starting with a resource-saving attitude may be a very good long term strategy.
  Also: with Rust there are many features of high-level, modern, type-safe, FP-inspired languages that you do not have to miss.
  [-]
  - amelius 16 hours ago
    Most FP languages cannot work without GC unless you're willing to give up idiomatic FP programming. There is a reason Haskell has a garbage collector.
    [-]
    - flossly 10 hours ago
      Hence I used FP-inspired (to point at languages like Rust, Kotlin, Ruby, Swift)
- rbalicki 15 hours ago
  That's exactly the tradeoff I made with Barnum (https://barnum-circus.github.io/). It's just not important to optimize the performance of the rust side for the reason you stated. So instead, all focus goes into making it easy for an LLM to build a reliable pipeline (from which LLMs are invoked).
- throwa356262 22 hours ago
  While we are not there yet, people are looking into running agents in esp32 and alike.
  See projects such as picoclaw, nullclaw and more.
  https://github.com/sipeed/picoclaw
  https://github.com/nullclaw/nullclaw
- krzyk 20 hours ago
  e.g. opencode right now uses ~80% of my CPU.
  At first I also thought that it would be just call and wait, but a lot of work is done locally (any tool calls).
  [-]
  - tacone 17 hours ago
    It's also dealing with memory issues (see: Memory Megathread https://github.com/anomalyco/opencode/issues/20695).
    And in my experience is not that much faster to start than more complex software like Visual Studio Code.
- faangguyindia 18 hours ago
  If you write in Go, you get faster compile time, more likely your code will compile fine after long time.
- tcfhgj 21 hours ago
  - Reduce the footprint on the planet
  - prolonged life of hardware
  - less electricity
  - less expensive hardware
  [-]
  - sdevonoes 21 hours ago
    Compared to what LLMs actually consume, your agent makes zero difference
    [-]
    - krzyk 20 hours ago
      Why would anyone compare a cloud LLMs power usage when one doesn't pay for it? Local power consumption is important for those.
      [-]
      - afavour 20 hours ago
        OP specifically cited “reduce the footprint on the planet”
    - tcfhgj 21 hours ago
      very wrong - especially on the local machine, see https://news.ycombinator.com/item?id=48164613
- iddan 21 hours ago
  Running many of those in scale.
- phplovesong 21 hours ago
  I recall back in the mid 2000s when i saw many "rewrite in rails" apps. Its just hype, and it will die out in a few years when something new comes out.
- cpa 21 hours ago
  [dead]
frio 1 day ago
Thanks, I've been tooling away in my spare time on my own version of this -- both to get a deeper understanding of agents (everyone suggests writing your own) and to help learn Rust. I'd like to retain `pi`'s configurability though, the ability to self-mutate and generate new tools is incredibly useful, particularly because I don't think any of these things should have access to arbitrary code execution through `bash` (of course, if they have access to, say, `edit` and `cargo run` they still have arbitrary code exec, but...) (so I tend to generate tools on the fly when I encounter something the no-bash agent needs to do).
[-]
- gidellav 1 day ago
  I actually though about this issue, but while Pi can have this script-like environment thanks to the fact that it's based on an interpreted language (TypeScript), Rust has its own limitation as a compiled language.
  I decided to allow for customization in a different way:
  1. The prompt library (~/.config/hypernova/prompts/) acts as a simpler alternative to Skills, with the built-in prompts that should replace superpowers + Claude's frontend-design
  2. Compile-time features; things that might make the agent more bloated can be disabled when you decide to compile zerostack
  3. Clean code; code that's short and easy to read, you can just throw zerostack on its own source code in order to build a custom fork if your necessity can't be satisfied. Good features could also be adopted by the main version.
  4. Permission mode; as you can see in the README, there was lots of concern around the permission model, and I landed on a 4-mode system that goes from "Restrictive" (no commands) to "YOLO" (whatever the agent wants to do" + custom regex patterns for allow/ask/deny permission on 'bash' calls. In your case, you just need to run `zerostack -R` to force all tools to ask for permission.
  (Also, there is a work-in-progress features for programmable agents, but that's yet to be announced)
  [-]
  - aerzen 1 day ago
    Ok, what about having tools be discoverable from the environment, similar to how $PATH works in POSIX?
    There could be an env var $AGENT_TOOLS, a string of paths delimited by `:` and tools would be discovered as some specific format of file. Maybe a JSON that contains tool name, list of parameters and the command to run it.
    This is essentially decoupling tools from the agent, allowing more customization and per-project environments. It does require shipping and installing more binaries, one for each tool probably.
    [-]
    - threecheese 16 hours ago
      The Hermes agent (Python) follows something similar; it defines a HOME dir and enumerates plugins and memory extensions present there.
      https://github.com/nousresearch/hermes-agent
      Functionally, it fits more in the openclaw space than pi-agent.
    - zrg 20 hours ago
      This is one of the approaches im considering for my own, Roder.
      The approach mostly being communicating over json rpc which has become the standard for MCP so it makes it more approachable to agent developers.
      Obviously its very much NOT mcp, its a low level events based rpc system for registering capabilities and extending low level primitives of the agnet itself not the model
    - gidellav 1 day ago
      I understand the concept, but I don't get what's the advantage over adding in the prompt instructions to use a specific bash command for a specific task, acting as a "custom tool".
      [-]
      - frio 10 hours ago
        The harness clamps what the agent can do. `bash` allows full code execution; a dedicated `mvn` tool might only allow `mvn compile` but not `mvn spring-boot:run`. You could probably implement this with an `allow` list attached to your `bash` tool, but by doing it this way, you can enhance the outputs or perform mandatory checks too.
        For instance, Claude likes to run little Python scripts; reviewing them is tedious. Removing `bash` and adding a `python` tool would allow the harness to pre-review and grep for common harmful patterns, or run the `python` script in a `krunvm` or `muvm` to isolate it, etc. This review/isolation would be handled programatically as it's part of the harness; leaving the agent to choose what to do as a skill means the agent can conveniently forget to enforce its own checks.
      - aerzen 21 hours ago
        Good point. There might be a small advantage if one does not want to give bash access. But general answer to "how do add custom tools like we can in pi" is "you don't". Keep it simple.
  - frio 1 day ago
    I've been trying to use `Deno` underneath `Rust` so that the tools can still be written in Typescript and thus self-mutated without the compilation step (but I can still try to do clever things with V8 Isolates or similar). It's been an ugly experiment so far; I'm vaguely thinking a simpler model would be to just define a binary "API" and run tools by exec-ing binaries.
    [-]
    - gidellav 1 day ago
      I have to be honest and tell you that try to load such an heavy runtime as a scripting layer is not a great idea; at the same time I can tell you that I am working on another Rust project where I also needed scripting, and after three attempts I landed on rhai (https://rhai.rs/) (https://rhai.rs/book).
      You might find it nice for pretty much all use cases except for high-performance scripting (so, if you are not try to build the entire logic entirely in rhai, you are going to be fine).
      [-]
      - frio 1 day ago
        Yeah, it's been a bit of a dead end. I didn't want the heavy runtime but felt it was worth disproving after experimenting rather than ruling out off the bat. Even before getting it running, the dependency list alone was pretty discouraging, especially given the storm of supply chain attacks these days.
        Rhai looks nice, I'll take a look, thanks! And good luck with Zerostack.
        [-]
        aschar 1 day ago
        [dead]
      - slopinthebag 1 day ago
        I was just going to suggest rhai. It's simple enough LLMs can easily write it with a little context, and you control the entire API so you can sandbox effectively without needing to resort to hacks with a JS interpreter etc.
    - slowhorse 1 day ago
      I agree v8 and Deno seems very heavy handed and complex to integrate for scripting capabilities.
      Have you considered Lua? It is tailor made for use cases like this. Creating an embedded host in Rust is trivial, the work lies in creating built-in functions for the script runtime so that the user scripts can do useful things to the environment.
    - BillStrong 1 day ago
      Have you thought about Zig? If you limit it to CompTime, isn't that just a scripting language that happens to be compiled to binary?
      [-]
      - brabel 21 hours ago
        That’s not how it works. Comptime Zig is Zig, not an embedded scripting language. You can’t run comptime code separately, it only runs as part of compiling a Zig program. Think of it like Rust macros.
      - frio 1 day ago
        Possibly, I'm not really interested in learning Zig though (or learning to embed it in Rust). I'm sure that'd be a cool project for someone else to try :).
    - jswny 1 day ago
      Why not WASM?
      [-]
      - frio 1 day ago
        Unfamiliarity and I believe it requires a compile step. I’m at least familiar with Typescript and Deno so being able to embed them was an appealing idea :)
  - kristjansson 1 day ago
    > simpler alternative to Skills
    this concerns me. Skills are already just about the simplest possible thing; they're just prompts, in a directory!
    [-]
    - lunar_mycroft 1 day ago
      Skills are notably more complex than that. They require metadata (which the model is given and uses to determine whether or not to load the main file), are intended to be loaded via a tool call, contain extra resources (also loaded by tool calls), etc. In contrast, with this system the harness doesn't need a tool to load the stored prompts, the prompts don't need to include metadata to allow for runtime discovery, etc.
      [-]
      - cobolcomesback 20 hours ago
        Runtime discovery is the entire point of skills. Without it, this is just a templating prompt system that the user has to remember to use… except because this one changes your system prompt, it also busts your cache and costs you extra money when you use a prompt.
        Skills are already dead-simple and this prompt system doesn’t at all tackle the same problem.
        [-]
        lunar_mycroft 18 hours ago
        "{Feature} is the whole point of {more complex technology}" is an objection that can very often be raised. That doesn't mean that giving up features in exchange for simplicity is always the wrong call. And there's also advantages to having the user drive what instructions go into the prompt instead of the harness/model.
        [-]
        cobolcomesback 18 hours ago
        This is tangential to the point. It’s often great to have a simpler version of a solution, even if it eschews some features. But this isn’t that. OP claims that the prompt system is an “alternative” to skills, but it isn’t. It isn’t solving the same problem that skills solve at all. It’s like saying that a bicycle is a simpler alternative to a lawnmower because they both have wheels.
        Prompts are a feature that are simpler than skills, sure, but they’re a completely different feature entirely.
        [-]
        lunar_mycroft 18 hours ago
        It's an alternative in the same way e.g. plain markdown is an alternative to HTML, even though plain markdown lacks some of the features of HTML. "X is an alternative to Y" in this sense doesn't mean "X all the same features of Y", it means "you might reasonably choose to use X instead of Y, depending on your exact usecase"
      - gidellav 1 day ago
        Exactly, this was my thought process when deciding if we should have Skills or not.
        In the end, I think that this prompt-only design, with the integrated tools that come with zerostack, is more than enough.
    - backscratches 1 day ago
      So are these lol
- praveer13 1 day ago
  I’ve been doing the same thing in zig haha.
throwa356262 1 day ago
"RAM footprint: ~8MB on an empty session, ~12MB when working"
I like this, Claude Code is using multiple gigabytes, which is really annoying on lowend laptops
[-]
- all2 1 day ago
  I'm building an agent framework in golang and it is extremely light weight. Startup time is under 1/2 second, and RAM usage is really low. I have a 12 year old laptop and it happily runs without slowing down.
  There's no reason what is essentially a string concat engine should be slow on any hardware, including old hardware.
  [-]
  - gidellav 1 day ago
    Isn't 2 second startup time a lot? With zerostack, I managed to get it down to ~90ms
    [-]
    - NewJazz 1 day ago
      They said 1/2 as in 0.5 seconds as in 500 ms.
  - throwa356262 22 hours ago
    Sounds interesting, would you like to share any more information about your project?
    [-]
    - all2 12 hours ago
      Link is here [0]. The idea is to model cognitive states (how to think), and workflows (what to think about) as statecharts. The charts will be defined in YAML (version-able, hot-reloading). Context payloads are defined in an agent YAML file. Think of it as a map, like a drive map for a computer's HDD/SSD. You spec the order of context chunks, what goes into them, and then when the inference payload is built, it uses the context map definition (comprised of the chunks you defined), the agent definition (including model params like context length, temp, etc), cognitive state, and workflow state to build out the inference payload.
      Agent cognitive states may add chunks to the system prompt. Workflows may add chunks to the system prompt. Tool access may vary by agent/workflow state (policy is last-defined-wins overlays to keep it simple to reason about).
      Agents may run by themselves or be 'bound' to a workflow. Agents can detach from a workflow before it is finished, and either re-bind, or another agent may bind to the workflow (one implements, another reviews, for example).
      Conceptually, this is all very simple, which is why I'm hand rolling it.
      The goal is a minimal runtime that can support long-running agents in a 'zero human company' setting.
      On top of the runtime will be a minimal change control workflow (if you've spent time in hardware engineering, these are standard processes governed by a company's quality system).
      I've yet to wire in the economic pieces (token spend, power consumption, rollups that show performance of various agents based on inputs and outputs).
      It is a bit far fetched, but I'd like to get this thing ISO9001 certified, and maybe AS9100 certified.
      This is all to scratch my own itch, tbh. Most agentic systems are hard to reason about, bloated, lack visibility in the appropriate places, lack economic data of sufficient granularity, and so on. So I'm building this.
      [0] https://github.com/zerohumancompany2/maelstrom-code
- rel 1 day ago
  I've been trying to migrate over the zed and think they're Agent Client Protocol[1] is pretty neat, I wonder how much memory pressure Claude Code exerts if it is going through that mechanism instead
  1: https://zed.dev/acp
  [-]
  - johntash 2 hours ago
    My understanding is that zed wrote a wrapper around 'claude -p' for ACP support so I imagine there isn't too much of a memory savings there since it's still actually running claude code.
  - threecheese 16 hours ago
    Not answering your question, but I just realized the new Anthropic billing changes are affecting ACP clients like Zed :(
    https://zed.dev/blog/anthropic-subscription-changes
- messh 1 day ago
  The memory footprint is great, it allows finally running these coding agents in extra small instances -- say x1 on shellbox.dev
  [-]
  - chrisweekly 1 day ago
    Hmm, if they're this small something like smolmachines (like shellbox, but free and local) might be a great fit.
- tecoholic 1 day ago
  Yes. Just this fact is going to make a lot of people try it out.
- rane 22 hours ago
  I have 29 Claude Codes open, using 6.3 GiB RSS total
- esperent 1 day ago
  Are you sure you don't have an LSP plugin or something running?
- marknutter 1 day ago
  Isn't that because of the context window size?
  [-]
  - gidellav 1 day ago
    Hi, I'm the developer of zerostack! No, the memory footprint is not beacuse of the context window size: on my benchmarks, with a 128k context loaded, and it jumped from 8MB (without any chat/context loaded) to 11MB.
    The reasons why the memory footprint of zerostack are:
    - Rust, and not JS/Python, so no interpreters/VMs on top
    - Load-as-needed, so we only allocate things like LLM connectors when needed
    - `smallvec` used for most of the array usage of the tool (up to N items are stored in stack)
    - `compactstring` used for most of the string usage of the tool (up to N chars are stored in stack)
    - `opt-level=z` to force LLVM to optimize for binary size and not for performance (even tho we still beat both in TTFT and in tool use time opencode)
    - heavy usage of [LTO](https://en.wikipedia.org/wiki/Interprocedural_optimization#W...)
  - SatvikBeri 1 day ago
    The context window has nothing to do with RAM usage and even if it did, a million tokens of context is maybe 5mb.
    [-]
    - bluegatty 1 day ago
      'A million tokens of context' is literally Terrabytes of KV cache VRAM on very expensive Nvidia silicon - on the model.
      On the Agent, yes, the context window does relate to RAM, because the 'entire conversational history' is generally kept in memory. So ballpark 1M 'words' across a bunch of strings. It's not that-that much.
      Claude Code is not inneficient because 'it's not Rust' - it's just probably not very efficiently designed.
      Rust does not bestow magical properties that make memory more efficient really.
      A bit more, but it's not going to change this situation.
      'Dong it in Rust' might yield amazing returns just because the very nature of the activity is 'optimization'.
      [-]
      - rixed 1 day ago
        Rust "denialism" is as annoying as rust evangelism.
        Of course any seemingly idiomatic rust is going to run circles around TS transpiled into JIT-compiled JS.
        [-]
        bluegatty 1 day ago
        Lamenting any 'not even criticism' of Rust as 'denialism' is just evidence of the insane cult that is Rust.
        Rebuilding Claude Code in Rust will make almost no difference in terms of real world performance. V8 is 'relatively fast', and there wouldn't be any noticeable improvements there, and probably not memory footprint either.
        The source for Claude Code was leaked and it's a vibe-coded mess, there's not much thought given to clean architecture, it's unlikely they've just cleaned up a bit and given thought to memory consumption etc, if they did, they'd get by far most of the way there and likely abnegate and real want to 'do it in rust', unless there are other architectural considerations.
        [-]
        imtringued 14 hours ago
        You're the delusional one for bringing up the memory usage of the inference server that clearly isn't running inside the coding agent.
        The problem with your comments is that you're showing off a fundamental lack of understanding between managed languages and unmanaged languages.
        The vast majority of GCs are optimized for throughput and allocate big chunks of memory. They also tend to never release it if there was a temporary memory spike. The most advanced GCs also tend to have either read or write barriers, which slow down basic object accesses.
        Just in time compilation and managed languages in general need to retain a runtime representation of the source code to perform JIT compilation and then they have to store the compiled code in memory as well.
        JavaScript uses references against dynamic objects, which means you have to pay the indirection cost of a pointer but you also need to store type information as well to monomorphize the object literals and classes at runtime and fall back to a regular hashmap when fields are added dynamically.
        All of these things will add up and increase the amount of memory the application uses and how slow it runs.
        Sure Claude Code has severe architectural issues causing it to leak hundreds of gigabytes of RAM, but if those were not there you could easily build a C++ based alternative that runs circles around a hypothetical JavaScript based Claude Code that got its act together.
        [-]
        bluegatty 5 hours ago
        1) I'm not 'delusional' for bringing up 'What Memory is Used Where' - I'm clarifying for the people who seem a bit confused (see above) as to 'where the context lives' - and trying to provide a simple mental model for that.
        That's the opposite of delusional.
        It's just information.
        Attacking people for anything 'Rust related' however - is the quintessential reason why everyone hates the Rust community.
        2) 'The problem with your comment' is that it's presumptive and arrogant - as if I 'don't know the difference between GC and managed languages'.
        I've been writing software since 1990.
        Embedded (on custom Silicon), UI, SaaS, backend, some embedded work I've done is still in production today from almost 30 years ago.
        I've written a scripting languages (for production), and cyclic ref-count gc (didn't make it to production).
        Your comments about GC etc. are fine - but they but they don't really offer any insight into the actual problem.
        There's one critical detail aka 'memory not released after spikes', yes, this is observed behaviour, but it's usually accommodated with a little bit of decent Engineering.
        If you're going to make the comparative basis an an 'Idiomatic Rust' solution (aka good patterns), the we should make the assumption of an 'Idiomatic Node' solution for Claude Code.
        3) 'The other problem with your comment' is that your conclusion is wrong - by your own hand.
        Right here: "Claude Code has severe architectural issues causing it to leak hundreds of gigabytes of RAM," - the implication being that Claude Claude does not inherently have to 'leak all that RAM' - and would run just as fine with some basic work.
        An 'Idiomatic Node' implementation of Claude Code wouldn't exhibit those problems, and would perform pragmatically just as well as an Idiomatic Rust implementation.
        From a memory management situation, Rust might use significantly less memory, but a 150Mb footprint vs 350Mb foot print for an average session is 'pragmatically immaterial'.
        The difference in 'perceived performance' would be negligible - if any.
        The 'cost' of writing a the 'kind of program that Claude code is' in a systems-level language would be quite a lot, for not really much benefit.
        The 'Rust or C++' solution would not 'run circles' around the 'node' implementation in anything but some 'preformative', inward looking benchmarks, aka 'the worst kind of Engineering'.
        Consider pondering why almost nobody writes such applications in Rust or C++.
      - regexorcist 20 hours ago
        You have a point but it's definitely not TBs for 1M. Should be more like 100G.
    - vlovich123 1 day ago
      It has nothing to do with local RAM usage. But a million tokens of LLM context is decidedly not 5mb.
      The rough estimate is 2 * L * H_kv * D * bytes per element
      Where:
      * L = number of layers * H_kv = # of KV heads * D = head dimension * factor of 2 = keys + values
      The dominant factor here is typically 2 * H_kv * D since it’s usually at least 2048 bytes. Per token.
      For Llama3 7B youre looking at 128gib if you’re context is really 1M (not that that particular model supports a context so big). DeepSeek4 uses something called sparse attention so the above calculus is improved - 1M of context would use 5-10GiB.
      But regardless of the details, you’re off by several orders of magnitude.
      [-]
      - tujux 1 day ago
        Pretty sure we're talking about the output text, not the tensors.
        [-]
        m00x 1 day ago
        These LLM replies are really getting annoying.
        [-]
        vlovich123 17 hours ago
        Mine? I literally wrote what I wrote because “context window” as a term of art refers to the LLM’s context window.
        I guess get better at detecting LLMs instead of accusing everything of being an LLM reply?
  - SwellJoe 1 day ago
    The context window is not on your system. It's on the server with the model. There may be some local prompt caching, of some sort, but you're not locally hosting the context unless you're also locally hosting the model.
    [-]
    - bluegatty 1 day ago
      Chat history is kept locally, generally you have to send the 'whole history' to the model 'each turn'.
      [-]
      - SwellJoe 1 day ago
        That's just the plain text (or whatever files), that's not the context the model is directly working with on the server, which is tokenized, embedded, vectorized and has attention run against those vectors. The local history is generally quite small, the context generally quite a bit larger. A text conversation of a few hundred kilobytes in plain text will be gigabytes in context.
        [-]
        bluegatty 1 day ago
        KV for a sota model is into terrabytes
      - rixed 1 day ago
        Only "generally"? I'm curious what API has moved away from this protocol that seems mode adapted to conversaions with humans than agentic loops.
        [-]
        _flux 1 day ago
        To me it would certainly make sense if the protocol just said "append this text to context window id/sha256", in particular as the data is cached in tensor level in the provider side, so they need to first do that lookup anyway. So I would be surprised if they don't have that.
        In addition, this protocol could make it more transparent to say "oh we cannot proceed as we dropped the this cache, are you sure you want to proceed and consume a whole lot of expensive uncached tokens?". Oh, maybe that's a reason not to do it..
        bluegatty 1 day ago
        So the standard API you pass it all along but I think there are some odd open ai apis that are different.
arjie 1 day ago
I had Claude Code build me one of these as well, though I added Dirac's line hashing for edits etc. Also used Rust, and I had this idea that I should use plugins so it can self-edit by implementing in hooks but in the end, I just have it create exhaust information about improvements into a separate file and just update the source code and recompile. The source code is in a fixed place so it can just rewrite and build the agent itself. I use it with DeepSeek 4 Flash running on 2x RTX 6000 Pros which I get some 138 tok/s on.
To be honest, I just plagiarized Pi, Dirac, OpenCode. Any new tricks in this one that I can steal?
[-]
- joshka 1 day ago
  Take a look at OpenAI blogs about codex: https://openai.com/index/unrolling-the-codex-agent-loop/ https://openai.com/index/harness-engineering/ https://openai.com/index/unlocking-the-codex-harness/
- GodelNumbering 21 hours ago
  Creator of Dirac here. Glad to see it mentioned and even more glad that you found it useful.
  I am currently in deep refactor mode to introduce modular tooling to Dirac since the concept of 'fixed' set of tools is starting to feel antiquated, adding tools on demand would be super convenient and a likely replacement for MCP (I understand not all use-cases of it)
  [-]
  - karagenit 20 hours ago
    Curious how you’re handling prompt caching, as I understand it most LLM providers essentially inject tool definitions in the system prompt, so changing tools dynamically breaks the cache. This has been a big annoyance for me in a separate project; I currently just implemented my own tool-ish system that defines schemas in user messages and instructs the LLM to return matching JSON, but it’s less reliable than using the native tool calling + structured outputs available in the API.
    [-]
    - GodelNumbering 10 hours ago
      Native tool calling indeed. By modular, I meant the tool defs are loaded dynamically per task and stay the same during the task
- gidellav 1 day ago
  Some interesting features I add on top of being lightweight are the prompts library, Git worktrees integration and Ralph Wiggum loops integrations.
  [-]
  - arjie 1 day ago
    Very cool. Thank you! I will look.
- teo-mateo 1 day ago
  Is it public on github?
  [-]
  - arjie 15 hours ago
    Mine? No. It’s super idiosyncratic and I haven’t validated that it has not leaked secrets into the codebase.
  - normie3000 1 day ago
    Yes.
wkcheng 1 day ago
This is nice! I tried it for a bit and it was indeed quite fast. Are you looking for contributors, or are you building this as a personal tool? I ran into some issues when attempting to use different models, though: gpt-5.5 on Azure doesn't work, even with the OpenAI compatible endpoint, because "max_tokens" has been replaced with "max_completion_tokens". And it doesn't appear possible to pass through custom headers, so I wasn't able to specify reasoning_effort for deepseek models.
[-]
- gidellav 1 day ago
  Yes, I am open for PRs.
  What you showed is a clear bug in my codebase, if you can, open a Github issue with each of your bugs.
  Thanks!
zbyforgotp 1 day ago
We don’t trust llm execution- so we add user approvals. But task decomposition calls for co-recursion between code and prompts. This means that the approvals should be evocable at any depth. I think we need some kind of protocol for that (à la the Cubes OS protocols for cut and paste between vms).
Maybe a workaround could be to use bubblewrap of the scripts ther recursively call the llm (and run the agent in yolo inside the wrap).
[-]
- frabcus 1 day ago
  Well, or not spawn any external commands, and actually have tools made of code written by someone who thought about what the agents at each level should be limited to doing.
  [-]
  - zbyforgotp 1 day ago
    In the limit we want the llm to write the code (like in RLMs).
  - alfiedotwtf 1 day ago
    Or just run agents in a container…
- hashmal 1 day ago
  Currently, having LLM feeding on its own output repeatedly is the fastest way to get it hallucinate.
- agumonkey 23 hours ago
  Transactional recursive agents ?
  Nothing is committed until the final top-level transaction is accepted.
- zbyforgotp 18 hours ago
  Too late for fixing it - but of course I meant https://www.qubes-os.org/
- gidellav 20 hours ago
  zerostack contains --sandbox flags that forces bwrap usage on all shell tool usage
360MustangScope 1 day ago
Funny this comes out today. I was just about to start to write one in rust. It's amazing having opencode slowly leak memory and end up becoming 6gbs on a large project and then get slower and slower.
Will check this out! Seems cool!
[-]
- gidellav 1 day ago
  Yes! This project derived from an OOM killer activation that happened on my old laptop beacuse i had more than 2 opencode instances open together with Firefox...
hiAndrewQuinn 1 day ago
The codebase was small enough that I handed it over to DeepSeek v4 Flash in Pi to skim through for any risky business, and I didn't find anything concerning. Nice work.
[-]
- koito17 1 day ago
  Since the OP stated they used DeepSeek V4 Flash for generating a lot of the code, I decided to check whether there were any outdated dependencies. In my experience, with Rust projects, if you do not instruct models (even Claude 4.7 Opus) to use `cargo add` instead of manually editing the Cargo.toml, you will almost certainly get out-of-date dependencies added to your project.
  Manually checking the dependencies used by this project, I was pleased to see they are all the latest version. That doesn't mean there are no issues lurking in transitive dependencies, of course.
  As for getting an LLM to review the code, I think we can get all opinionated very fast. For instance, when I was eyeballing the code, some of the enum methods converting to/from strings made me think "this could've been a single #[derive] with strum." That would make the code in provider.rs a lot more concise, at the cost of importing one crate (with no dependencies!)
  Lastly, for fun, I decided to get DeepSeek V4 Pro (with Max thinking) to "audit" the codebase. The output mentioned no obvious signs of hidden telemetry, but it did note that the project sets the panic handler to "abort", which I have strong opinions on... Presumably the OP wanted to avoid linking against libunwind to save a few kilobytes of binary size, but now you have a binary that immediately aborts and doesn't give the user a stacktrace of what just crashed. I would rather have a ~50 KiB larger binary if it means getting useful debug info during a panic. Additionally, if there are async tasks that panic, they can't be recovered to display a generic error message; instead the whole process just aborts.
  [-]
  - gidellav 1 day ago
    Hi, nice comment!
    1. I had experience not only with wrong versions selected by the agents, but also weird crates (ex. choosing a crate with 10 github stars when a more complete and more supported one was available), reason why now I always choose the dependencies and then I let the agent work.
    2. Yes, some of the provider code could be made using macros, I am just lazy... But thanks for the tip! I will save it for later.
    3. No telemetry, and it can be checked thanks to the fact that there are no HTTP calls outside of the MCP implementation (via rmcp) and LLM connectors (via rig)
    4. Yes, i set panic handler to 'abort', thinking that I would've get a nice size decrease: i yet have to experience a panic on this project, but I will revert it to default behavior if the binary size saving is really so small
    5. While it is async, the entire project runs on one thread (as expressed in the main.rs with ```#[tokio::main(flavor = "current_thread")]```), as it allows for a nice ~8MB memory saving (so, 50% off) and no real performance loss, being such a simple tool.
    ---
    P.S. Just switched back to default settings for panic handler
  - hiAndrewQuinn 1 day ago
    Hidden telemetry was my big concern, yes; the abort thing wasn't caught as a security thing by DeepSeek V4 Flash but it was mentioned by Claude 4.7 Opus (I wanted to compare and contrast here), and Flash brought it up later when I asked it about performance tuning.
    `cargo add` tip is very helpful, I had a hunch this happened in my own Rust project and I think you just filled in the missing piece for me there.
    [-]
    - vlovich123 1 day ago
      To me panic=abort is much safer security as it means you’re unlikely to enter weird states due to incorrectly handled unwinding. The only attack vector is a DOS attack which is a short term thing that’s easily rectified.
- gidellav 1 day ago
  Thanks! Funny enough, a good chunk of the coding was done by Deepseek v4 Flash, while I hand-wrote a couple of the TUI logic, as deepseek kept failing on certain cursor-moving logic, and I fully managed the memory optimization process (as you can read on another comment I left, it both a set of compiler optimizations and usage of certain Rust crates in order to leverage more efficient data structures).
  [-]
  - hiAndrewQuinn 1 day ago
    Taking notes and comparing this against my own (non coding agent) Rust TUI project, thank you! I'm new to Rust so this is a helpful baseline.
    [-]
    - gidellav 1 day ago
      No problem, happy to help!
- kadoban 1 day ago
  > I handed it over to DeepSeek v4 Flash in Pi to skim through for any risky business
  Doesn't prompt injection make that a rather flimsy investigation?
wolttam 13 hours ago
The way I see this going is there will be 10s of thousands of model harness projects out there, because the tools make it so easy to make a harness that suites your workflows exactly the way you like (as someone who made their own harness)
I also used bwrap for sandboxing. I'm looking at layering slirp4netns, because I found out that models will happily break out of the sandbox via the the host network interface.
khimaros 1 day ago
i built something with a similar philosophy here: https://github.com/khimaros/airun -- it is intended to be piped and redirected. it discovers skills, AGENTS and prompt templates from Claude Code, Pi.dev, OpenCode and others. no TUI, but does have a basic tool calling loop
$ airun -q -p 'output a shell command for linux to display the current time. output only the command with no other code fencing or prose' | airun -q -s 'review the provided shell command, determine if it is safe, run it only if it is safe, and then summarize the output from the command' --permissions-allow='bash:date *'
[-]
- gidellav 1 day ago
  While I think that the core philosohpy is the same, i'd like to ask: why adding features like Skills and prompt templates?
  I personally decided to not implement Skills and instead using a prompt library approach, where certain .md are used to fully replace the system prompt, in order to allow for an approach similar to Skills with ~100 LoC dedicated to this system.
  [-]
  - afzalive 1 day ago
    Isn't the key thing with skills that the description is used to match them from a prompt that doesn't mention them?
    Would a prompt library do that too?
  - khimaros 16 hours ago
    i wanted airun to be drop-in useful in existing Claude/OpenCode/etc projects and skills are common.
  - c-hendricks 1 day ago
    Aren't skills fairly easy to share, and can contain more than one file?
    [-]
    - desireco42 1 day ago
      Prompts as well... he might be on to something here, can't say as I didn't try it yet
      Skills are just prompts
      [-]
      - c-hendricks 18 hours ago
        Skills are _like_ prompts, yes, they're extra info added to the context. A prompt is just a prompt though, an agent like Claude could use multiple skills in one go, which seems impossible to do with Zerostack.
      - hedgehog 1 day ago
        Most of mine have code in them. That's most of the value.
      - cobolcomesback 20 hours ago
        Skills are not just prompts.. the entire problem that skills solve is runtime discoverability via a skill description. Agents can self-recognize that a skill would be useful in a situation, and then load+use.
        Prompts are just text templates entered by the user, and the user must specifically know when to and remember to invoke them. If you’re just using skills as if they are the same as prompts, you’re totally missing out on the entire benefit that skills provide!
whazor 21 hours ago
It says inspired by Pi, but I don't see any extension/plugin possibilities. The best feature of Pi is that an extension can hook anywhere and completely change the behavior. It also allows two extensions to stack on the same hook where there are no conflicts.
I believe Pi extensibility is the most important feature, exactly as how it was important for WordPress. WordPress won because anyone could install it and add the plugins they needed. WordPress also has the same hook system where multiple plugins can build on the same hook.
Companies will want to completely customize their agent harness so it optimally works for their situation.
[-]
- zrg 20 hours ago
  I'm actually very close to being ready to release exactly that also in rust. I completely agree with your statement, extensibility is the most importnat feature.
  https://x.com/PandelisZ/status/2055633346831548902
  The two things I want to get right before actually releasing it is properly eval it againt other harnesses and make sure its better.
  And the licence. I don't think a GPL licence will yield addoption so I would like to MIT Roder or figure out the right licence
- gidellav 20 hours ago
  Check https://news.ycombinator.com/item?id=48164948
- krzyk 20 hours ago
  The most important feature of Pi is that it is small, and has small system prompt, making it great for locall LLMs.
tontinton 14 hours ago
Yo that's really similar to my very own https://github.com/tontinton/maki only I'm MIT and you're GPL, cool
goyozi 1 day ago
Really neat, I’ll have to try it when I’m at home. Lean, fast tools really make a difference in the coding experience.
I’m curious how the prompts idea performs in practice compared to typical skills and subagents. I frequently combine the two to get otherwise tricky workflows done. Say I have a failing build. I invoke my /fix-ci skill (sometimes in the same context I made the code change in), it launches a subagent to extract an error message / stack traces / relevant logs, and works through the problem. Say an integration test ran into a db query issue. Sometimes the agent itself, sometimes with a slight nudge from me, will load the readonly db access skill and start investigating. If I expect long, deep shenanigans, I’ll often say something like „use a sonnet subagent and instruct it to use the db query skill to debug the behavior we’re seeing”. And it can keep going like that: skills give extra capabilities on the fly, subagents isolate context to prevent bloat. Intuitively, it seems that by the agent running itself via bash with different prompts _might_ come close but a bit less streamlined? I’d have to check and see.
[-]
- gidellav 1 day ago
  Well... for the most part, you use it like skills, but instead of "commands" you can think of "environments": so '/prompt debug', which is one of the integrated prompts, allows for a debug-focused agent, you can then talk to it as a normal agent, and then '/prompt code' to go back to the standard coding agent.
  About subagents: as of right now, the entire agent runs on one context buffer, so it doesn't support subagents in order to keep it lean; but there is a great chance that subagents will be added, as explore-heavy tasks often bloat the context window
  [-]
  - post_below 1 day ago
    It sounds like you're saying that /prompt changes the system message part of the session. Doesn't that cause a cache break and result in higher usage/cost?
    [-]
    - post_below 23 hours ago
      I took a quick look at the source code and it looks like, yes, using /prompt during a session will rebuild the session with a new preamble/system prompt, causing a full cache miss on the next turn.
      So in that way it's not like skills at all, neither of those result in paying full read price on the entire session, just the skill prompt itself.
      Something else I noticed... In the Anthropic implementation it doesn't seem to be using 'cache_control' in the body. Assuming my understanding is current, without that the Anthropic API won't do any caching at all (unlike most other APIs that do some level of automatic caching without it being requested). So that would result in paying full read price on every turn.
      Of course I could be missing something, it was a quick look. Can you clarify?
halcyonblue 13 hours ago
https://forgecode.dev/ https://github.com/tailcallhq/forgecode is written in Rust too and seems surprisingly capable. How does Zerostack compare to forgecode?
GTonehour 21 hours ago
I tried to list the competing open-source AI coding agents to compare their popularity over time — opencode wins for now.
https://www.star-history.com/?repos=anthropics%2Fclaude-code...
nextaccountic 20 hours ago
> Bash execution ... optional sandboxing for isolation
Sandboxing should be the default. Rather than routinely allowing unsandboxed access, one should be able to configure the sandbox to allow exactly what is needed
That's hard. For example, I've been unable to give wayland access to agents inside the sandbox (there's a special flag in bubblewrap to mount /dev/dri in a way you can make use of it, but you also must give access to the wayland socket, and maybe other things). So I think that maybe harnesses should invest in more sandboxing resources
[-]
- gidellav 20 hours ago
  This is actually a topic of current interest, and I think that I will switch to a sandbox-by-default once the bwrap implementation inside of zerostack is well tested and highly configurable.
sinansaka 1 day ago
Love it! I think the minimal approach you took is the right path forward. As others mentioned, small harnesses make it possible to run many agents in parallel and in small cloud instances. working on a minimal agent in Go myself for this use case.
martingxx 22 hours ago
I wonder how this compares to tau https://tau-agent.dev/ ?
Both are in Rust and both mention Unix in their descriptions.
[-]
- coalstartprob 22 hours ago
  [dead]
mohsen1 1 day ago
This is much needed!
Compared to Codex CLI, Claude Code is insanely slow.
```
    $  time claude --version
    2.1.143 (Claude Code)

    ________________________________________________________

    Executed in    4.39 secs      fish           external
    usr time   29.68 millis    0.26 millis   29.41 millis
    sys time   71.30 millis    1.30 millis   70.00 millis
```
5 seconds to show me the version number!
I'm guessing Claude Code also needs a rewrite in Rust. But from what I saw in the leaked TypeScript code, a line-to-line port will be pretty bad. It requires a new architecture that matches Rust idioms
[-]
- nomel 1 day ago
  Note that includes network requests to check latest version.
  I suspect we'll soon see someone make a persistent Claude shell mode, with the reverse of a !, where you work in shell and send a message to Claude, and Claude sees all the context.
- marcosscriven 1 day ago
  What version of time is giving you that kind of output?
  [-]
  - pramodbiligiri 21 hours ago
    Looks like that time command was invoked from "fish" shell: https://fishshell.com/docs/current/cmds/time.html
zoobab 19 hours ago
I tried to install opencode on my x200 laptop, it would segfault as Bun wants some specific intel processor extensions (SIMD).
Now I tried to install zerostack, but the compilation freezes at a certain package.
Is there a static binary available for linux?
[-]
- zoobab 14 hours ago
  I finally managed to compile it, quite happy with the usage.
  Will try to rebuild it with static flag.
tsiao1999 23 hours ago
I’m also playing around with Rust for building agents—my setup ends up looking a lot like ZeroStack’s approach. If anyone’s curious, my project is here: https://github.com/7df-lab/devo
[-]
- Fuzzwah 21 hours ago
  The screenshots in your readme all 404
Phlogi 1 day ago
Looks interesting, how would you use skills with that? Would I need to migrate them into prompts? Which I think is not the same.
E.g. how to use official, vendor provided skills with zerostack? https://github.com/elestio/elestio-skill
[-]
- ffsm8 1 day ago
  Technically, a skill is equivalent to adding
  '"The skill description": if this applies, read /path/to/skill/definition.md'
  To your agents.md
  At least currently skills don't let you set the model (to my knowledge), so that's not a distinction either here (it would be with agent definitions)
inciampati 1 day ago
> Integrated Ralph Wiggum loops: looping capabilities for long-horizon tasks
Imo, this shouldn't be embedded in the executor layer. Orchestration should handle this.
[-]
- gidellav 1 day ago
  I get you, but when I decided to follow a no-skills approach (as in, no agent's Skills used), I had to decide what:
  1. Couldn't be built only using prompts
  2. Couldn't be built only using MCP servers
  3. Would have improved my UX experience (as i hope, your UX experience).
  From those three conditions, I chose integrated git worktrees and loops
- qsera 1 day ago
  Is AI is the new Waterfall/Agile methodology with all the lingo/terminology/names that make no damn sense?
  Appears so, because I am so turned off by it...
nopurpose 18 hours ago
How would one create custom tools for it? opencode offers TS SDK for it, but with rust it will be something more heavyweight like gRPC bridge (similar to how terrafoem providers work).
noodletheworld 1 day ago
Are agent harnesses the new web framework?
Everyone wants to write one, building a new one is easy to start with, but tough to get to “prod ready” and the landscape is littered with failed attempts?
Certainly feels like it.
This is really good though; works well and at least has a clearly articulated raison d'être.
ianberdin 14 hours ago
Don’t get me wrong, but 7K LoCs means it is still an early attempt to make a coding agent. It starts easy “ah it can edit and read files!”, but it requires a lot of extra effort to make properly for many edge cases, especially caching, price optimizations, etc.
I’ve been implementing custom coding agent in https://playcode.io for 3 years already. Far beyond of 7K LoCs.
So when you compare to “shitty slow” Claude code - I don’t agree.
[-]
- gidellav 14 hours ago
  Check what tools we already implemented, check your "slow" accusation, check the prompt system, check the provider integration (via Rig, so caching is already enabled), check the MCP support and other integrations that you don't even find on some major agents (git worktrees + loops).
  For 3 years, your Lovable clone is something that Claude Code could make in a couple of days, but good luck shitting on other project I guess.
spectaclepiece 1 day ago
The key thing with pi is that it can extend itself. How does that work when it’s written in rust?
[-]
- adastra22 1 day ago
  That's a bit like saying "the key thing with Lisp is that it can extend itself." Yes, that is a core feature and a lot of people use it for that reason. But not everyone. Other use pi just because it is a small agent harness, but don't need (or don't want) the self-extensibility.
- nextaccountic 20 hours ago
  The usual way to make a Rust program extensible is to embed a wasm interpreter. Then the agent can extend it by writing an extension in Rust or any other language that compiles to wasm. Zed does it for example
sergiotapia 1 day ago
Given agent harnesses affect so much of the performance of models, it would be great to see some kind of benchmark on how this tool performs compared to claude/codex/opencode/pi etc.
[-]
- gidellav 1 day ago
  Hi! While I didn't try any agent benchmark, I already though of this possible issue, and I tried to approach it on two different levels:
  1. The tools that are given to the agent are almost the same to the one defined in Opencode, except for Skills and Subagents (both features not implemented in zerostack)
  2. Zerostack is prompt-based, so that it ships with a set of .md files, stored in ~/.config/zerostack/prompt, and that can be selected from the TUI in order to activate different 'agents': as you can see from the README, it is designed to contain the most important feautres of superpower + Claude's front-end design + git worktree support and Ralph Wiggum loops (both as integrated features)
  [-]
  - esafak 1 day ago
    It's been said before, but it is important to prospective users, so it bears repeating: screenshots and benchmarks, please; it helps users decide whether to invest time in it. The ability to transfer settings from other agents would be great too.
    [-]
    - gidellav 1 day ago
      1. I will add some screenshots tomorrow
      2. As said before, there are no benchmarks right now, but it is good enough for me, so I hope it's good enough for y'all :)
      3. Transfering settings from other agents is out-of-scope for a minimalstic coding agent, but the idea is that, apart from MCP server, the rest might just force you to learn how zerostack works, because of design choices such as not having Skills or having certain specialized tools integrated (worktrees and loops).
tedshark 1 day ago
New to this. but whats the benefit over models like Claude code ?
[-]
- frabcus 1 day ago
  Make harness independent of model, so when pricing or quality changes you can switch.
  Avoid lock in to stack from one provider (things like a harness that only works with models from one provider and so on).
  Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.
  Can improve the harness, fix bugs in it, make it compatible with different systems and techniques.
  This game happens every time in new cycles of developer technology. The good bet historically has always been to use open source - there's a reason most developer tooling just pre-AI revolution was open source (even things like Java and .NET which used to be proprietary).
  [-]
  - DeathArrow 21 hours ago
    >Make harness independent of model
    You can use Claude Code with almost any model.
    >Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.
    You can do that with Claude Code.
- timwis 1 day ago
  Different harness (pi), but this blog post may partially answer your question: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
perlgeek 15 hours ago
Are there any pre-built Linux binaries for this? I tried to install it with cargo, but got "feature `edition2024` is required" (which is the newest cargo available from my current Ubuntu distro).
Also, can I configure zerostack to always require a sandbox? I don't want to accidentally forget to call it with --sandbox.
theusus 1 day ago
I absolutely like this. Pi becomes sluggish after installing a couple of extensions. I myself was trying to port Pi to Rust but it was consuming too much tokens.
Is there any API like Pi so that I can create extensions.
[-]
- esperent 1 day ago
  It absolutely doesn't. It must be the extensions you're using.
  I've found is that nearly every extension on the official pi.dev/packages is vibe coded trash, like for example the most popular subagents extension.
  Instead of just giving you a basic subagent, it's a whole kitchen sink of recursion, teams, chains, confusingly named agents like "oracle" etc. Basically feels like someone kept prompting "what else could we add here?".
  They're all like that. It's no wonder these slow down pi.
  What I've done is just have the agent write my own.
  Get a local copy of e.g. that kitchen sink subagents extension. Have the agent list all the features, then I give back a much smaller list of the features I want and say "write me a new extension with just these new features" and every time it one shots it (using GPT 5.3 usually), then 20-30 minutes later I have a working, lightweight extension tuned to my exact workflow.
  I've done this for I guess about 8 extensions now (subagents, a lightweight typescript LSP, web search, background processes, Claude style hooks, plan mode are the main ones) and it's very fast and snappy.
  [-]
  - theusus 1 day ago
    Still they are maintained by those developers. I cannot spend my time developing extensions. I'd rather do that in Rust.
    [-]
    - esperent 1 day ago
      Then pi is probably not for you, as doing this is pretty much the whole selling point. You could try oh-my-pi or OpenCode instead.
0xAstro 1 day ago
These simple harnesses perform the best in my day to day experience but I sitll can't figure out why that's the case.
[-]
- jwpapi 1 day ago
  Because they don’t have an incentive to maximize your usage, but rather focus on solving probabilistic solvable problems for you.
  Bigger harnesses need to balance upping your token usage and being helpful.
eddy-sekorti 19 hours ago
How is it any faster than something written in anyother programming languages?
2001zhaozhao 1 day ago
Hmm, Claude Code and Opencode work fine for me.
It's a bit amusing that coding agents rely on drawing 1000W+ and using 2TB+ of memory in a datacenter to run, yet people really focus on the last few watts and few hundred megabytes of memory on their laptop (which get dwarfed by the energy cost of compiling their code anyways). But I suppose making them a bit faster and lighter wouldn't hurt.
[-]
- kvdveer 1 day ago
  The data centre runs on a dedicated power line. My laptop runs on battery. Using coding agents currently drains battery quite fast, which is surprising, given that the vast majority of the work does not take place on my laptop.
  Making the client side coding agent more efficient isn't about saving the climate. It is about extending the workday (which might actually make the climate worse)
- remus 1 day ago
  I think this is overly reductive. For sure the models are behemoths and consume a lot of resources, but the harness can have a big impact on how much the model is used. For example, having a strong set of tools available in the harness means the model can work much more efficiently.
  [-]
  - NewJazz 1 day ago
    It is also just an indicator of the planning and polish that a particular harness may have.
- huflungdung 1 day ago
  [dead]
teiferer 23 hours ago
Could we finally put the whole "written in pure Rust" thing as if it is a certificate of quality to rest? You can write crap in Rust, you can write excellent software in Rust, and both goes for all other languages too. I don't care what language you used for a project from the quality POV. Slop is slop, no matter Rust or JS or C.
born-jre 1 day ago
Sorry, it looks like we were not able to load the page. Please make sure your network connection works and you are using an up-to-date browser. If the issue persists, please visit our issue tracker to report the problem
Got this on iPhone firefox
[-]
- gidellav 1 day ago
  Retry from Safari, sometimes it works better
slopinthebag 1 day ago
I love these. Coding agents aren't very difficult to build, it's a TUI + tools + getting a nice agent loop working. The hardest part seems to be supporting all of the different providers and model quirks. What is interesting is seeing the experimentation: some provide tons of tools, others provide a single python interpreter and have the agent use tools via sandboxed python scripts, others use minimal tools and lean on bash. Personally I want a harness that gives a ton of control to the user to let them steer the LLM, less agent and more augmentation. Maybe I'll have to build it myself. If anyone has ideas, let me know.
[-]
- inhumantsar 1 day ago
  I'm working on one right now where nearly everything can be expressed as a combination of workflows. There will be some built-in agent types out of the box but all the Lego pieces are there if you want to put together something different.
  [-]
  - michalsustr 1 day ago
    What language are you building this in? I’m interested but trying to stay away from js world for security reasons.
    [-]
    - inhumantsar 16 hours ago
      The system and plugins are Rust. Workflows can be defined in a plugin with Rust or externally with YAML.
      Might add support for custom WASM plugins down the road, but everything shipped with the system will be Rust.
- afzalive 1 day ago
  Pi.dev is pretty good in giving tons of control to the use and has extensions that you can easily build.
  Although people are complaining about its RAM usage in this thread, I haven't bothered to check how much RAM it uses.
  [-]
  - slopinthebag 8 hours ago
    I refuse to run npm slop on my hardware
usernametaken29 1 day ago
Now make it into an IntelliJ plugin which has proper access to the search index. I’ll pay for it. For Christs sake it’s insane JetBrains hasn’t figured this out yet
[-]
- gidellav 1 day ago
  I am currently deciding on adding ACP support or not (and ACP support should allow connections to JetBrains's IDEs)
  [-]
  - upcoming-sesame 22 hours ago
    Yes please.
    TUIs are cool but sometimes people prefer staying in the IDE
- nullorempty 1 day ago
  I think this is such an opportunity for JetBrains. I talked to them about this at AWS Re-Invent, strangely, they could really see how strong of a position they are in if only they paid attention to the right thing!
  [-]
  - usernametaken29 1 day ago
    They even have this already, Junie, but of course the plugin version cannot use BYOK….
- kirtivr 1 day ago
  Jetbrains does not have their own IDE-integrated coding agent?
  What do Jetbrains users use then? Amp?
  [-]
  - krzyk 19 hours ago
    What is the use case for integrating coding agent in IDE?
    I use run agents outside of my IDE, while they work I can look at the code they created, or I can us IDE to do different work.
  - sgarman 1 day ago
    https://www.jetbrains.com/junie/
    [-]
    - usernametaken29 1 day ago
      Junie does not support BYOK inside the IDE
      [-]
      - leonsmith 1 day ago
        Has this position recently changed? It states this on the marketing page?
        > Use a JetBrains AI subscription or connect your preferred provider with Bring Your Own Key (BYOK).
        [-]
        Ardren 22 hours ago
        It seem confusing. My understanding is the AI assistant part (i.e. chat) is configurable. But Junie IDE is only via credits through Jetbrains.
        https://youtrack.jetbrains.com/articles/SUPPORT-A-1833/What-...
        (To make it more confusing, Junie CLI seems to say it will any provider)
        PythonLuvr 21 hours ago
        [flagged]
      - Mashimo 1 day ago
        What does the k stand for? Key?
        You can add any open Ai api endpoint you want, no?
        [-]
        usernametaken29 21 hours ago
        No, you have to buy their subscription within the IDE
        [-]
        Mashimo 18 hours ago
        The JetBrains AI Assistant plugins says:
        > Choose how AI runs by selecting built-in AI models from top-tier providers, bringing your own API keys or connecting local models.
        And the AI Assistant in turn can use Junie.
        At least that is what the plugin overview says, I have not tested it.
- dtauzell 1 day ago
  Does the IntelliJ mcp server do that? It has find tools
rw_panic0_0 23 hours ago
what "unix-inspired" here means?
deagle50 1 day ago
Looks promising, is OpenAI subscription support planned?
hparadiz 1 day ago
this is what I've been waiting for
a low level language. please no more scripting language TUIs!
[-]
- nine_k 1 day ago
  Rust, a language with affine types, generics, lifetimes, deep static analysis, hygienic macros, etc is not low-level. It's nearly as high-level as Haskell (without HKTs though).
  It just does not rely on GC and allows to manage resources efficiently. This efficiency is partly due to its being so high-level.
  [-]
  - gidellav 1 day ago
    While I agree on the fact that it allows to manage resources efficiently, I don't agree on the fact the efficency derives from it being high-level; from a purely tecnical standpoint, i could skim off 2-3MB from the memory footprint by writing the code in pure C, as there are some unused parts of Rust's std that cannot be removed without recompiling std.
    This is obv only a technical talk, as writing an AI TUI in pure C would be rather... ehhh
    [-]
    - nine_k 1 day ago
      That's why I said "part of its efficiency". Rust can do RAII, can optimize things more aggressively because of no aliasing ever in safe code, and because of known lifetimes, it can offer fearless concurrency™. Rust can also support highly optimized data representations (see how Optional works, or other ADTs, etc) which languages like Haskell, to say nothing of Python, cannot offer because of GC and boxing.
      Lower-level languages like Zig or even Go, to say nothing of C, lack many of the high-level language features that power this efficiency.
  - onlyrealcuzzo 1 day ago
    Agreed, Rust is way more expressive than people give it credit for.
- schaefer 1 day ago
  There has been no reason to wait... Codex is written in rust.
  -- So is deepseek-tui.
  [-]
  - hparadiz 1 day ago
    Forgot to add an open source qualifier. I use codex lol
    [-]
    - andxor 1 day ago
      Codex is also opensource.
      [-]
      - hparadiz 1 day ago
        I don't really want something owned by a company for my local stuff. I'd prefer it be small and minimalistic. Maybe in the future I'll change my mind and it will be more like a browser but for now I wanna keep it small and local.
        [-]
        gidellav 1 day ago
        Thanks! I don't think that the only advantages are being open and lightweight, but you can actually find some more interesting features such as Ollama support, integrated Prompts (in order to compete with superpowers), git worktrees integration, and so on
- iknowstuff 1 day ago
  Isn’t codex in rust?
  [-]
  - rvz 1 day ago
    yes.
    [-]
    - cyberpunk 1 day ago
      How come the official codex install instructions say use npm install?
      (I just rebuilt my sandbox vm a few days ago….)
      Or are there two separate codex clients?
      https://developers.openai.com/codex/cli
      [-]
      - nicoritschel 15 hours ago
        The one from npm is signed by OpenAI, which means computer use from the CLI. The brew distribution requires using the Codex app for computer use.
        Thanks Apple.
      - krzyk 19 hours ago
        Because people are crazy, usage of npm for installing binaries is quite common unfortunately.
        [-]
        cyberpunk 18 hours ago
        So …. do I understand it right? openai, one of the hottest companies on the planet right now, with very deep pockets, distribute their official rust cli via the … public npm repo?
        [-]
        krzyk 18 hours ago
        Yes.
        There is also homebrew install.
choopachups 1 day ago
dude, im actually in disbelief how long we put up with the pile of shit that is claude code.
icase 20 hours ago
omfg stop
nobody actually cares about rust, let alone likes it
tencentshill 1 day ago
This may be the most HN post I have ever seen.
NamlchakKhandro 11 hours ago
No extensions? I think you've missed the point
DeathArrow 1 day ago
IMO, the problem with Claude Code, OpenCode, Pi is the harness quality and convincing the agents to do the exact things you need, to define workflows and make the agents stick to it. I didn't experience performance issues.
For example I have an agent in Claude Code that has strict rules to do something before implementing every phase in the plan. Sometimes it decides not to do it. "But, wait the feature is simple enough so I can proceed straight to implementation..."
Just because this is written in Rust won't solve the biggest issues most users have with coding agents.
[-]
- bhaak 23 hours ago
  But that‘s not an issue with the coding agent. It’s the model that doesn’t follow the instructions.
  Given how an LLM works, you can never be sure it will always work. LLMs are not deterministic.
  [-]
  - DeathArrow 21 hours ago
    Isn't a harness supposed to guide and steer yhe coding agent?
    [-]
    - bhaak 20 hours ago
      While the harness can block certain actions (e.g., tool usage), it can’t enforce perfect adherence to instructions because the model itself is probabilistic. The harness can reduce deviations, but it can’t eliminate the fundamental unpredictability of LLMs.
      The rules that are fed into the AI are not unbreakable laws to the AI. We should always remember that.
DeathArrow 1 day ago
How does this do in SWE-Bench Pro and Terminal Bench?
phplovesong 1 day ago
Does anyone use claude with custom agents? IIRC they banned the use, and only allow claudes own agent.
[-]
- shepherdjerred 1 day ago
  You can use Claude with other harnesses at API costs, but you cannot use it with your Claude Code sub. That's changing next month though, I guess https://support.claude.com/en/articles/15036540-use-the-clau...
- DeathArrow 1 day ago
  I use Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.
rvz 1 day ago
As you can see, writing a coding agent in a compiled language makes a ton of sense and gives the benefits of running multiple agents efficiently instead of running into leaks and tools consuming gigabytes of RAM.
[-]
- _user_account 22 hours ago
  That makes no sense, coding harness are just subprocess wrappers + http calls. What is the benefit if at the end of the day it will spawn make,cmake,python,node.js, or whatever the developer is working on? With the enormous downside of loosing native/easy extensibility, JavaScript Object Notation (JSON) is derived from JavaScript, it seamlessly parses and dumps.
anuis258 18 hours ago
hmm
joeyguerra 1 day ago
the war of the coding agents has begun.
kapija 22 hours ago
woo hoo, more ai slop...
obaid 1 day ago
Worth noting the "Unix-inspired" framing is the HN title, not the README — the project itself pitches "minimalistic" and "optimized for memory footprint." Curious what the author means by Unix-inspired specifically, since a single-binary TUI running a multi-tool agent loop doesn't immediately read as do-one-thing-well-and-compose.
Sim-In-Silico 23 hours ago
[flagged]
sarim 1 day ago
[flagged]
amys94fr 22 hours ago
[flagged]
LuminaNAO 16 hours ago
[dead]
shrmarahul 19 hours ago
[flagged]
kuanghs 1 day ago
[dead]
edgardurand 1 day ago
[flagged]
phoebe_builds 1 day ago
[flagged]
artem_am 1 day ago
[flagged]
IndianAISupport 11 hours ago
Another one. Cool, cool.
/s
nimchimpsky 1 day ago
[dead]
andrew_kwak 1 day ago
[flagged]
brcmthrowaway 1 day ago
!RemindMe 6 months
kuberwastaken 22 hours ago
This is awesome! can't wait to see where it goes as it continues development
Always funny how Hacker News works with traction, posted about a rust based TUI agent I'm working on a couple days ago too :P
https://github.com/Kuberwastaken/claurst
zby 23 hours ago
There is also https://github.com/Dicklesworthstone/pi_agent_rust
I vibed a comparison/review of these two systems using my llm wiki: https://zby.github.io/commonplace/work/pi-agent-zerostack-co...
(the prompt is in https://zby.github.io/commonplace/work/pi-agent-zerostack-co...)
[-]
- cassianoleal 23 hours ago
  Your bot seems to think that `pi_agent_rust` is the same as upstream Pi.
  [-]
  - zby 23 hours ago
    I think I fixed this in a later revision. Does that persist?