How We Built a 3D Rock Paper Scissors Game with Room-Code Multiplayer
A complete tour of building a polished browser game with Three.js, Web Audio synth sounds, and HTTP-polling multiplayer rooms. The geometry choices for rock/paper/scissors pads, the performance budget on retina screens, why we skipped WebSockets, and the room-code architecture designed for a future tournament hub.
Rock Paper Scissors is the simplest possible game โ three options, one round, one winner. Making it feel good in 3D required more thought than you'd expect. This post walks through every meaningful decision we made building the version at yalikit.com/games/rock-paper-scissors, including the ones that look obvious in hindsight but weren't at the time. If you're building a small browser game with Three.js or thinking about how to do room-based multiplayer without WebSocket infrastructure, this is the post for you.
Why 3D for such a simple game?
2D RPS exists everywhere. A 3D version isn't about better gameplay โ it's about presence. When you hover over the rock pad and it lifts toward you with a soft glow, your brain registers that something is responding. When you click and the rock shoots forward while the other two pads fade back, the choice feels committed. These are micro-feedback loops that 2D games can fake but never quite match.
The bet was: a 3D RPS would feel more like 'opening an app' and less like 'using a webpage,' even though under the hood it's HTML + WebGL like any other site. The difference matters for retention. Players who get a 'wow' moment in the first 5 seconds come back; players who get a flat 2D experience may or may not. For a free game with no signup, the only retention lever is first-impression delight.
The Three.js setup
Three.js is the canonical WebGL abstraction for the web. It's not the only choice โ Babylon.js is heavier but full-featured, PlayCanvas is editor-driven โ but Three.js is the most-used and best-documented. Our scene is built with vanilla Three.js (no React Three Fiber wrapper) because the scene state needs to update on every frame, and reconciling that through React's render cycle adds overhead and bugs.
Renderer configuration
const renderer = new THREE.WebGLRenderer({
antialias: true,
powerPreference: "high-performance",
});
renderer.setPixelRatio(Math.min(window.devicePixelRatio, 2));
renderer.shadowMap.enabled = true;
renderer.shadowMap.type = THREE.PCFSoftShadowMap;
renderer.toneMapping = THREE.ACESFilmicToneMapping;
renderer.toneMappingExposure = 1.15;Two things matter here. First, pixelRatio is capped at 2 โ even on 3x retina screens, we render at 2x. The visual difference between 2x and 3x is imperceptible at this scene complexity, but the GPU cost is 2.25x higher (9x pixels vs 4x). Second, antialiasing is on. It costs roughly 10% performance but eliminates the jagged edges that scream 'amateur 3D.' Worth it.
ACESFilmicToneMapping is the modern default โ it makes lights look more cinematic and prevents the washed-out overbright look you get with the default Linear tonemapping. Combined with toneMappingExposure: 1.15, our scene has the slightly-warm, slightly-rich color that feels premium.
Lighting setup
Three-light setup: key, fill, rim. The key light is the bright directional light that casts shadows โ usually positioned high and to the side. The fill is a softer light from the opposite direction that prevents the unlit side from going black. The rim is a colored point light from behind the subject that creates an edge highlight.
scene.add(new THREE.AmbientLight(0x6366f1, 0.4));
const key = new THREE.DirectionalLight(0xffffff, 1.4);
key.position.set(5, 10, 5);
key.castShadow = true;
key.shadow.mapSize.set(1024, 1024);
scene.add(key);
const fill = new THREE.DirectionalLight(0x818cf8, 0.6);
fill.position.set(-6, 4, -3);
scene.add(fill);
const rim = new THREE.PointLight(0xa855f7, 2, 16);
rim.position.set(0, 3, -5);
scene.add(rim);Only the key light casts shadows. Shadow casting is expensive โ every shadow-casting light requires a shadow map render per frame, which is essentially another full scene render. One shadow source is plenty for this scene; two would noticeably hurt frame rate on mid-range laptops.
Geometry choices for each piece
Each shape needs to be instantly recognizable in 3D. We went with stylized rather than realistic representations.
Rock: IcosahedronGeometry with vertex noise
const geo = new THREE.IcosahedronGeometry(0.7, 1);
const positions = geo.attributes.position;
for (let i = 0; i < positions.count; i++) {
const v = new THREE.Vector3().fromBufferAttribute(positions, i);
v.multiplyScalar(1 + (Math.random() - 0.5) * 0.12);
positions.setXYZ(i, v.x, v.y, v.z);
}
geo.computeVertexNormals();An icosahedron has 12 vertices and 20 faces. By perturbing each vertex slightly (random scale between 0.94 and 1.06), the smooth shape becomes a rough boulder without needing a texture. computeVertexNormals re-calculates the lighting normals so the new geometry shades correctly. Total cost: about 0.1 KB of geometry data.
Paper: thin BoxGeometry with crease detail
const geo = new THREE.BoxGeometry(1.0, 0.04, 1.3);
const mesh = new THREE.Mesh(geo, mat);
// Add a subtle ridge to suggest folded paper
const creaseGeo = new THREE.BoxGeometry(1.0, 0.05, 0.05);
const crease = new THREE.Mesh(creaseGeo, mat);A thin box with a slightly thicker ridge across the middle reads as 'folded paper' from any angle. Could have used a more complex geometry โ a paper-fold deformation, for example โ but the simple ridge is enough and renders in microseconds.
Scissors: two cylinders + two torus handles
The trickiest of the three. Real scissors have a complex shape with curved blades, pivoting joint, and handle loops. Faithfully modeling that would take hundreds of vertices and a long time to author. Instead: two CylinderGeometry blades rotated 30ยฐ apart, plus two TorusGeometry rings positioned at the handle end. Reads instantly as scissors even though it's geometrically nothing like real scissors. Sometimes simpler is better.
Performance budget
Frame budget on a mid-range laptop running at 60 FPS is 16.6ms per frame. Of that, we have to fit: animation updates, raycasting for mouse hover, lighting and shadow rendering, particle updates, and any DOM work React is doing. We targeted under 8ms per frame total, leaving headroom for slower devices and tab-internal CPU pressure.
- PixelRatio capped at 2x โ biggest single perf win on retina screens.
- 1024ร1024 shadow map (not 2048) โ small visual difference, big perf win.
- MeshStandardMaterial for visible objects, MeshBasicMaterial for the win beam โ basic materials skip lighting calculations entirely.
- OrbitControls deliberately omitted โ would have meant always-on damping calculations and extra mouse-tracking code.
- Particle burst uses one shared BufferGeometry with attribute updates rather than creating new geometries every frame.
- Win beam (the cylinder connecting winning cells) is one MeshBasicMaterial with additive blending โ basically free to render.
Animation loop architecture
All animations are time-based, not frame-based. We track elapsed time via THREE.Clock, and every easing computation uses delta time. This means animations run at the same wall-clock speed whether the player is at 30, 60, 90, or 144 FPS.
const clock = new THREE.Clock();
const animate = () => {
raf = requestAnimationFrame(animate);
const dt = Math.min(clock.getDelta(), 0.05); // cap at 50ms
const t = clock.getElapsedTime();
// Smooth easing toward target Y position
pad.position.y += (targetY - pad.position.y) * 0.15;
// Idle rotation โ frame-independent
pad.rotation.y += dt * 0.45;
renderer.render(scene, camera);
};Capping dt at 0.05 (50ms) prevents huge jumps if the tab was hidden for a long time. Without the cap, a tab that was inactive for 10 seconds would resume with dt=10, and the next frame's animations would snap to whatever they would have been 10 seconds later. Looks bad.
Pause when tab is hidden
When the user switches tabs, we cancel the requestAnimationFrame loop. This saves the GPU completely while the tab is in the background โ important for laptop battery and for being a good web citizen. When the tab comes back, we reset the clock and resume.
const onVisibility = () => {
if (document.hidden) {
cancelAnimationFrame(raf);
} else {
clock.getDelta(); // reset so the next dt is small
animate();
}
};
document.addEventListener("visibilitychange", onVisibility);Mouse parallax and idle drift
The camera doesn't sit perfectly still โ it gently drifts and follows the cursor. Both are intentional and both are very subtle, because anything more would feel jarring.
const targetCamX = baseCam.x + parallax.x * 0.35 + Math.sin(t * 0.3) * 0.1;
const targetCamY = baseCam.y - parallax.y * 0.2 + Math.cos(t * 0.4) * 0.08;
camera.position.x += (targetCamX - camera.position.x) * 0.05;
camera.position.y += (targetCamY - camera.position.y) * 0.05;
camera.lookAt(0, 0.5, 0);The parallax multipliers (0.35 horizontal, 0.2 vertical) are small. Going higher made the scene feel like it was sliding around when the cursor moved. Going lower made it feel completely static. These numbers were tuned by trial and error over a few iterations.
Synth sound effects: zero audio files
Every sound in the game is generated on the fly with the Web Audio API. There are no .mp3 or .wav files in the bundle. Total audio payload: 0 KB. The cost is some code to define each sound; the benefit is a smaller download and no licensing concerns.
Lazy AudioContext creation
Browser autoplay policies require user interaction before audio can play. We create the AudioContext lazily, on the first user-initiated sound (a hover or click). This sidesteps all autoplay restrictions.
private ensure() {
if (this.ctx) return;
const Ctx = window.AudioContext || (window as any).webkitAudioContext;
if (Ctx) this.ctx = new Ctx();
}Oscillator + gain envelope
Each sound is one or more sine/triangle oscillators with an attack-decay envelope on a GainNode. The envelope is what makes it sound like a 'pluck' rather than a sustained tone. Exponential ramps (rather than linear) sound more natural because human hearing is logarithmic.
const osc = ctx.createOscillator();
const gain = ctx.createGain();
osc.type = "triangle";
osc.frequency.value = 660;
const now = ctx.currentTime;
gain.gain.setValueAtTime(0.0001, now);
gain.gain.exponentialRampToValueAtTime(0.2, now + 0.012); // attack
gain.gain.exponentialRampToValueAtTime(0.0001, now + 0.1); // decay
osc.connect(gain).connect(ctx.destination);
osc.start(now);
osc.stop(now + 0.12);Sound design choices
- Hover: 1200Hz sine, 40ms, very quiet โ only plays when you transition to a new pad, never on continuous mouse movement.
- Pick: 660Hz triangle + 990Hz sine harmonic, ~80ms total โ gives a satisfying 'whoosh' quality.
- Reveal: 220Hz triangle bass + 440Hz triangle mid, layered โ feels weighty, marks the moment of truth.
- Win: C5 โ E5 โ G5 โ C6 arpeggio over 300ms โ major-key resolution that triggers reward feeling.
- Lose: G4 โ C4 descending โ minor-key resolution, feels appropriately disappointing.
- Draw: two equal 440Hz notes โ neutral, neither winning nor losing.
Multiplayer architecture: polling, not WebSockets
Real-time multiplayer usually means WebSockets โ persistent bidirectional connections between client and server. WebSockets are great for high-frequency updates (a 60-player FPS), but they come with deployment complexity (separate server process, connection state, keepalive logic, reconnect handling, scaling sharding).
For a turn-based game like RPS, WebSockets are massive overkill. Players wait for the opponent's input; nothing is happening between throws. We use HTTP polling against a Next.js API route. Players send a heartbeat every ~1s, the server returns the current room state. Latency is ~1-2 seconds, which feels instantaneous for a game where you're waiting for the opponent to pick.
The benefit: zero new infrastructure. The same Next.js server that serves the static site handles the multiplayer API. No WebSocket server, no Pusher account, no Redis pub/sub. Deployment is one Vercel push.
Room code format
Five uppercase alphanumeric characters. 36^5 = 60 million combinations. Random collisions are astronomically unlikely; deliberate guessing is also impractical because rooms expire after a few minutes of inactivity. Five characters is also short enough that you can dictate it over voice without typos โ important because the most common multiplayer scenario is 'open the link on your phone, I'll create a room and tell you the code.'
Codes auto-uppercase as the user types. We also accept the code via URL parameter (?room=ABCDE), so when the host clicks 'Copy link', the friend pastes the URL and the lobby auto-opens with the code pre-filled. This shaved roughly half the friction out of the join flow.
Server-side state model
A room is a JavaScript object stored in server memory: { id, gameType, mode, players, gameState, status }. Players is an array; each player has { id, name, score, progress, stats }. Progress is the current round number; stats is a game-specific bag (for RPS, it holds the player's current choice).
The state is in-memory only โ no database. Rooms vanish on server restart. This is acceptable for a casual game where matches last minutes; players just rejoin if it happens. For higher-stakes games we'd persist to Redis or Postgres, but the simplicity win here is enormous.
Designed for a future tournament hub
The MultiplayerHub component that powers RPS multiplayer was deliberately designed to be game-agnostic. gameType is a prop. The same component works for Connect Four, 2048, or any other game we add later. This means a future tournament hub โ where players queue up, get matched to a game, and compete across multiple games โ is just an additional component on top of what already exists. No backend rework.
Lessons learned
- Small details compound. The hover tick sound is 40ms long and adds maybe 0.5% to the overall game weight. But the difference between 'with hover sound' and 'without' in subjective feel is huge. Players notice.
- Test on a mid-range laptop, not your dev machine. We caught a perf regression where the 2k shadow map worked fine on M2 Macs and killed frame rate on a 5-year-old Intel laptop. Always test on the floor, not the ceiling.
- Browser autoplay policies are non-negotiable. Plan for lazy audio context creation from day one rather than bolting it on later.
- HTTP polling is underrated. For turn-based games and any game where the player is the bottleneck (waiting for choices), it's a far better fit than WebSockets.
- Three.js's documentation has improved enormously since 2020. The tutorials at threejs.org are now the single best starting point. The community examples are gold.
- Capping pixelRatio at 2 is the single highest-leverage perf tweak for retina screens. We've never seen a case where 3x rendering was perceptibly better than 2x for our type of scene.
What we'd do differently
Two things. First, we'd build the MultiplayerHub before the first game, not alongside the first game. We had to refactor it once after RPS revealed shortcomings; that work could have been avoided by designing the abstraction first. Second, we'd write sound effects in a more declarative format (a JSON spec describing each sound) rather than hard-coded function calls. Would make iterating on sound design much faster.
Try it yourself
The game is live at yalikit.com/games/rock-paper-scissors. Open it in two browser windows (one in incognito), create a room in one, join from the other. You'll see the lobby sync in real time, then the simultaneous reveal in 3D. Source code for the Three.js scene and multiplayer hub is at github.com/Muralivvrsn/yalikit. If you spot something we could do better, send a PR.
Open rock paper scissors now
No signup. No upload. Runs entirely in your browser.
Open game โFounder of YaliKit. Builds developer tools full-time and ships every tool you see on the site. Previously worked on data platforms at scale. Writes about JSON, CSV, regex, performance, and the small details that make browser tools feel native.