Optimizing Hermes Agent Performance on Remote Storage with Squashfs Snapshots

I love Hermes Agent. I really do. But there was one thing slowly driving me insane: it was painfully slow the moment I mounted S3 or R2 as the volume backing its state. Here is the story of how I fixed it — and why the naive solution beat every "proper" one I tried.

The Problem Nobody Warns You About

Hermes loads its brain at the start of each session. It assembles the system prompt from SOUL.md, memory snapshots, skills, and context files. That part is fine.

The issue is what happens after startup. Every task involves a cascade of file operations: reading skills, accessing memories, loading context, persisting new data, writing logs. In normal operation, this is invisible. On a local disk, it is free.

Now put ~/.hermes/ on a remote S3/R2 mount.

Suddenly every file read is a network request. Every write is a PUT. The VM drowns in continuous sync traffic from routine agent churn — creating skills, updating memories, writing logs — even when you are not actively using the agent. It is death by a thousand tiny file operations.

Hermes Agent · VM

skills

memory

context

logs

network

S3 / R2 Mount

1 file op = 1 HTTP request · constant sync churn

With ~/.hermes/ on a remote mount, every file read or write becomes a separate HTTP request.

Why I Did Not Take the Easy Way Out

The obvious answer is to just use an EBS-like block volume and move on with life. But I had requirements that ruled that out:

Unlimited storage capacity
No provider lock-in
The ability to spin up fresh sandboxes anywhere

So I looked at distributed POSIX filesystems. JuiceFS was the frontrunner. It would have worked — but the setup complexity was wildly disproportionate to what I actually needed. All I wanted was:

Fast data availability at startup
Persistence across container recreations

That is it. I did not need a full distributed filesystem. I needed a checkpoint.

The Squashfs Snapshot Solution

So I went naive. Embarrassingly naive. And it worked. The whole system fits in a few bullet points:

Every 15 minutes: create a compressed Squashfs snapshot of ~/.hermes/
On container recreation: mount the latest snapshot
On SIGTERM: capture one final snapshot before the VM dies

Squashfs compresses the entire Hermes home directory into a single .sqsh file. One curl uploads it to R2 or S3. No per-file sync overhead. No thousands of individual HTTP requests. Just one efficient bulk transfer, on a schedule.

R2 / Object Storage

One .sqsh file per snapshot

↑ upload

Every 15 min

Create snapshot

Compress ~/.hermes/ to .sqsh · curl upload to R2

↓ download

On container boot

Mount latest

Fetch newest .sqsh · mount into VM filesystem

↑ upload

On SIGTERM

Final snapshot

Capture state before VM dies · push to R2

Three triggers drive the snapshot system. Between them, everything runs against a local filesystem.

The Key Insight

What actually makes this fast is not the compression or the snapshot cadence. It is this: I mount the snapshot directly into the VM's filesystem.

The data becomes locally available the instant the container boots. Every file operation during the session runs at local disk speed. Zero network latency. The agent has no idea its state came from object storage — as far as it is concerned, everything is on local disk.

Changes made during the session get captured in the next periodic snapshot. If the container crashes, at worst you lose 15 minutes. SIGTERM catches the graceful shutdown case.

Before · Remote mount

Thousands of small network requests per session

After · Snapshot mount

Local-disk speed + one bulk upload every 15 min

Left: every file op hits the network. Right: a local mount absorbs session churn; only the periodic bulk transfer leaves the VM.

The Results

Fast: local filesystem performance during sessions
Resilient: snapshots survive container crashes and recreations
Portable: works with any cloud provider that supports object storage
Simple: no distributed filesystem to operate, debug, or pay for

The Moral

Sometimes the naive solution is the best one.

Complex distributed filesystems exist for good reasons. But not every problem requires one. I had a checkpointing problem dressed up as a filesystem problem, and once I saw it that way, a periodic snapshot was obviously enough.

When the "proper" solution feels like overkill, it usually is. Start with the dumbest thing that could work. Upgrade only when you have to.

Postscript

Since I shipped this, Hermes has added a built-in hermes backup command with quick snapshot capabilities. For general use, you should probably reach for that first. But for my specific case — remote storage optimization with direct filesystem mounting and single-file transfers — the Squashfs approach still holds up.

Cloudflare's Artifacts feature would likely solve this too. I have not had access to test it yet, but it is on my list.