Conducting Visuals With Your Hands — Ghost Arcade v1.7.1 Adds MediaPipe Gesture Input
Hand-tracking is now a first-class input in Ghost Arcade. Bind your right palm's X coordinate to a macro, pinch to spray particles, spread your fingers to crank a shader's speed. Built on Google's MediaPipe Hand Landmarker, with adaptive smoothing + forward prediction for tracking that feels immediate instead of laggy.
Hand-tracking, by way of a webcam
Ghost Arcade v1.7.1 ships a MediaPipe-backed gesture system: any webcam
becomes a controller. Move your right hand left-to-right and a slider
follows. Spread your fingers and a shader's speed climbs. Pinch your
thumb and index together and particles spray from the pinch point.
Bring up the floating Learn modal, pick a signal, tap any orange-
outlined param in the UI — bound. Tap more — same gesture controls
them all.
This post is a tour of what shipped and the small handful of
engineering moves that made it feel responsive enough to actually
perform with.
What's available as a signal
Every detected hand exposes a small bundle of 0..1 normalised values
per frame:
centre position in image space (X = horizontal, Y = vertical, Z =
depth-relative).
hand-size normalised so depth doesn't fake a pinch.
pinky tip). Closed fist ≈ 0.25, fully splayed ≈ 1.0. This is the
"openness" knob.
Pointing_Up, Thumb_Up, Thumb_Down, Victory, ILoveYou. Use these as
triggers (one-shot pulse) or latches (toggle on each fire).
Bindings target any param the rest of the app already exposes via
MIDI Learn — every effect param, every shader uniform, every plugin
param, every GPU shader layer slider, the wet/dry knob on each macro,
the crossfader, layer opacities, blend mode dropdowns. The dispatch
path is shared with MIDI and OSC, so anything one of them reaches the
others reach too.
The Learn UX
Open Settings → MediaPipe → + Add binding. The Settings panel
closes (your params live behind it; the panel was in the way) and a
floating modal lands bottom-right. Every bindable element in the app
gets an orange outline. Pick a signal in the modal, click an
orange param — bound. The modal stays open and the signal stays
selected, so you can keep clicking to bind the same gesture to many
params at once. Mid-session: change the signal, keep clicking, build
out the rest of your rig.
Already-bound params get a yellow tag showing which signal owns
them, so you can see your wiring at a glance.
For the speedrun: there's a Load defaults button that maps four
signals to Macros 1-4 (right spread, right palm Y, right pinch, hands
distance). Project loads, hit Load defaults, route each macro's
effect chain wherever you want — a hand-driven post-fx mixer in
twenty seconds.
The latency problem (and how it got solved)
The first cut felt unusable. ~80 ms of perceived lag between hand and
slider. Three things stacked to drop it to something a working VJ
will actually use:
Drop the gesture model when nobody needs it. MediaPipe's Hand
Landmarker is fast on the GPU delegate (~10ms inference). The Gesture
Recognizer is a second model that doubles that per frame. The HandFX
visualizer uses raw landmark distances (pinch, spread) instead of the
gesture model's output, so we default gesture-detection OFF. The
MediaPipe panel's "Canned gestures" checkbox keeps it available for
anyone who explicitly wants categorical gestures bound — but the
default path runs the cheap one.
Strict request-response frame pacing. Naively, the camera pump
sends a frame every 16 ms regardless of whether the worker is done
with the previous frame. Under contention this builds a queue — your
hand has already moved past where the worker thinks it is. We track
"frames in flight" and only ever allow one at a time: send → infer →
result → send again. Worst case latency stops growing.
Forward prediction in the smoothing layer. Smoothed velocity is
captured per landmark, then the rendered position extrapolates ~18 ms
along that vector (one display frame ahead of the actual landmark).
Capped at 1.5× the current motion so it doesn't fling past the hand
on a sudden reverse. Net effect: the cursor leads the hand by one
frame instead of trailing it.
Knobs in the panel: Smoothing (raise if jittery, lower for snap)
and Predict Ahead (ms) (raise for snappier, lower if you see
overshoot). 0.15 / 18 ms defaults read as "responsive without feeling
floaty."
The handedness gotcha
Selfie mirror is the natural default — your right hand on screen is
on the right of the image, like looking in a mirror. But MediaPipe
determines hand identity from the visual appearance of the hand, and
a mirrored right hand has the same shape signature as a left hand.
So with mirror on, the model reports your physical right hand as
"Left" and `palm.right.x` was actually tracking the wrong hand.
We post-process the handedness labels: when mirror is on, swap Left
↔ Right before deriving signals. Now selfie mirror gives you both
natural visuals AND signal names that match physical reality. (If
you ever turn mirror off explicitly, no swap — the model's labels
are already correct in raw-camera space.)
The HandFX visualizer
Five modes that lean on the hand tracking to actually paint something
back to the screen:
velocity, sparks shed along the stroke and drift through a
curl-noise flow field, the backbuffer fades slowly so paint
lingers. Crank the Linger slider to 0.998 and wave slowly — you
get ribbons hanging in the air.
with additive blend. Coloured smoke pouring off the hands.
MediaPipe is actually tracking" mode.
particles continuously spray from the pinch point in the direction
away from your palm. Tighter pinch = stronger spray. Released =
silent. Great for punctuating drops without the gesture-detector
latency.
layer's Difference blend mode for the "spread hands → invert the
world behind you" trick.
Adaptive per-landmark smoothing kills jitter without smearing motion.
Optional camera-feed background at user-set opacity so the performer
can show their hands behind their own paint. There's no on-canvas
text — the visualizer is what hits your projection output, so we
moved every "no hands yet" hint into the panel UI instead.
Project-scoped bindings
The bindings save with the .gha project file, not in localStorage.
Each project has its own gesture rig — opening a different show
loads its own mappings. We did consider auto-persisting to local
storage; it would be convenient for casual experimentation but the
moment you start working across multiple projects it gets confusing
fast. Project file is the right home.
What to try first
light up in the macro bar.
watch it ride your finger spread.
slider. Move your right hand laterally; the layer fades in and
out.
Brush Thickness. Slow waves, big strokes; fist for a fine line.
Free, open source under AGPL-3.0. Source on GitHub at
github.com/riskcapital/ghost-arcade.
Download the signed v1.7.1 build at /download.
Frequently Asked Questions
Do I need a special camera?
No — any webcam works. We capture at 640x480 by default; MediaPipe downsamples internally so resolution isn't the bottleneck. Built-in laptop cameras, USB webcams, virtual cameras (OBS, etc.) all work. The MediaPipe panel's device picker lists everything the OS sees.
Will it run on my machine?
If your machine can run Ghost Arcade, it can run MediaPipe. The Hand Landmarker uses the WebGPU delegate (~10ms per frame on Apple Silicon and modern integrated GPUs). Older Intel iGPUs may dip to ~30Hz inference; you'll still get usable tracking but with more visible lag.
Can I bind the same gesture to multiple things at once?
Yes — that's the headline use case. In the Learn modal, pick a signal once and keep tapping orange-outlined params. Every tap adds a new binding to the same signal. The gesture then drives all of them in sync.
Where do the bindings live?
In the .gha project file. They serialize alongside macros, snapshots, OSC config, etc. Open a different project — that project's bindings load. Close without saving — unsaved bindings are lost. (No localStorage; we want gesture rigs to travel with the show file.)
Is it actually low-latency enough to perform with?
Depends what you're doing. For knob-style control (palm Y → bloom intensity), absolutely — the predict-ahead masks most of the pipeline lag. For percussive triggers (gesture fires a clip), there's ~40ms of total round-trip and you'll notice it on the offbeat. Use Pinch Spray instead of categorical gestures for momentary effects — pinch state is read from landmarks directly, no second-model lag.