Back to Field Notes
BuildingProductChess

Running a Chess Engine in the Browser Isn’t the Hard Part

Why continuous analysis needs promotion rules to feel trustworthy.

Browser Stockfish integration was straightforward; designing stable, trustworthy continuous analysis required explicit promotion rules.

Getting Stockfish to run in the browser was straightforward.

Making its output feel trustworthy was the hard part.

Once continuous analysis was working, the interface started behaving in ways that felt subtly wrong. Nothing was broken: evaluations were valid, depths were increasing, and the data was objectively improving. But the experience felt unreliable.

The evaluation bar would move a beat too late. Charts would pause, then jump. A deeper result would arrive for a position the user had already left. Move labels could change after the user had moved on.

Each update was technically correct. Collectively, they weakened trust.

Where trust broke down

Three issues kept surfacing.

1) Timing drift

Stronger evaluations arrive later in the search. If a deeper result lands just after the user navigates to another move, the UI reacts to information that is no longer contextually relevant.

That mismatch only has to happen a few times before users stop trusting the interface.

2) Visual instability

The chart and evaluation bar were reflecting real improvements, but they looked indecisive. To users, "refining analysis" and "changing its mind" can look identical.

Correctness alone is not enough. Presentation has to preserve confidence.

3) Context collisions

A chess position has at least two useful evaluation contexts:

  • Pre-move: how good the position was before a move
  • Post-move: how good the position became after a move

Different surfaces need different contexts:

  • Move classification depends on pre-move context
  • The evaluation bar reflects the current position
  • Trend charts are usually clearer as post-move progression

Mixing these contexts creates interfaces that are numerically valid but semantically inconsistent.

The naive rule that failed

My first rule was simple: always show the newest engine output.

That sounds right because newer usually means deeper, and deeper usually means better. In a live interface, though, deeper is only better if it is still relevant, meaningfully better than what we already know, and timed so users can interpret it correctly.

I had treated engine output as a stream. It worked better as a pipeline.

The rule that fixed it: promotion, not passthrough

Engine callbacks stopped going straight to the UI. Instead, results had to earn promotion.

To be visible, an update had to be:

  • Relevant: still tied to the active position
  • Meaningful: a real improvement, not just another intermediate tick
  • Stable: unlikely to flip again immediately
  • Surface-appropriate: suitable for the specific UI element consuming it

That single framing shift changed the architecture.

Continuous analysis became a refinement pipeline

Once we adopted promotion rules, four implementation decisions followed naturally.

  1. Bind analysis to position identity Continuous runs are keyed by FEN. If the user navigates away, late results from the old run are ignored for visible UI state.

  2. Resume from the best known depth New analysis does not restart from weaker depths. It begins from the strongest known baseline to avoid apparent regression.

  3. Buffer intermediate callbacks Not every engine message deserves a rerender. Intermediate updates are coalesced and promoted only on meaningful improvement.

  4. Let expensive interpretations lag Cheap surfaces (raw eval) can update earlier. Expensive, interpretive surfaces (classification, derived summaries) wait for a more stable signal.

What changed

The engine did not do less work. The UI simply became more selective about when work became visible.

After that, the product felt calmer: the evaluation bar moved with more confidence, charts changed with clearer intent, and labels stayed stable long enough to be trusted.

The broader lesson

I started this as an engine integration problem. It turned out to be a real-time product design problem.

Any live system has to answer the same question: when does new information deserve to interrupt what the user currently believes?

More data does not automatically improve the experience. Without explicit rules for relevance, timing, and stability, live updates create noise faster than insight.

Continuous analysis taught me that real-time systems need more than a stream of outputs.

They need promotion rules that decide when better information becomes visible.