Interaction: eye tracking, hand gestures, look-and-pinch, voice commands, hover effects
General XR: XR (extended reality), passthrough, optic flow, hand mesh, spatial audio, field of view (FOV)
0 / 5 completed
1 / 5
An Apple developer explains visionOS concepts at a conference: "visionOS is Apple's operating system for spatial computing. Apps have three presentation styles. A window is a 2D panel floating in space — like a regular app. A volume is a 3D bounded space — a virtual chess board or a 3D model viewer. An immersive space is full immersion — it can hide or show the real world. Passthrough lets you see the real world through the cameras while virtual content is blended in." What is the difference between a volume and an immersive space in visionOS?
visionOS presentation styles: Window — a 2D panel floating in the user's environment. Standard SwiftUI views. Multiple apps can show windows simultaneously. Volume — a bounded 3D region where RealityKit content is rendered. Fixed size; placed in the environment. Other apps can coexist. Good for: 3D model viewers, games, product visualisation. Immersive space — can show: Mixed (AR: virtual objects blended with real world), Progressive (gradually more immersive), Full (completely replaces real world view). Only one app can have an immersive space at a time. visionOS vocabulary: Passthrough — using cameras to show the real world inside the headset. visionOS is passthrough-based (unlike Meta Quest OLED passthrough). Ornament — a small UI element attached to a window, positioned outside its bounds (e.g., toolbar). Spatial Audio — 3D positional audio; sounds appear to come from specific locations in space. Optic flow — the visual experience of moving through space; must be comfortable to avoid motion sickness. In conversation: "We built the anatomy app as a volume — doctors can have it open next to their other windows while they review patient notes."
2 / 5
An ARKit developer explains world tracking: "ARKit provides world tracking — the device understands the 3D structure of the environment. Plane detection finds horizontal and vertical surfaces like tables and walls. Scene understanding builds a mesh of the room geometry. We use anchoring to attach virtual objects to real-world positions — when the user moves, the virtual object stays in place relative to the room." What is scene understanding in ARKit?
Scene understanding: ARKit's set of capabilities for understanding the physical environment. Components: Plane detection — identifies horizontal (floor, table, desk) and vertical (wall, door) planes. Enables placing virtual objects on real surfaces. Scene mesh — a real-time polygon mesh of the room geometry, reconstructed from LiDAR (iPad Pro, iPhone Pro). Enables occlusion (virtual objects hidden behind real objects) and physics. Object detection — recognises specific physical objects by comparing to 3D reference models. Image tracking — tracks 2D images (posters, books) and anchors content to them. Body tracking — tracks human body pose (skeleton). ARKit vocabulary: ARAnchor — a fixed point in the real world to attach virtual content. Types: world anchor, image anchor, plane anchor, body anchor. ARSession — the core ARKit session managing world tracking, sensor fusion, and rendering coordination. RealityKit — Apple's framework for rendering 3D content in AR; handles lighting, shadows, physics, occlusion. Works with ARKit for tracking. LiDAR scanner — Time-of-Flight sensor on Pro iPhones/iPads; enables instant plane detection and scene mesh reconstruction. In conversation: "With LiDAR scene mesh, we can do real occlusion — the virtual character walks behind the sofa, not through it."
3 / 5
A visionOS developer explains interaction design: "In visionOS, the primary input is eyes and hands. The user looks at a button — that's the selection. Then they pinch their fingers — that's the tap. We call it look-and-pinch. We add hover effects to interactive elements so users know what they're looking at. For precision tasks, we also support keyboard and trackpad via Bluetooth." How does eye tracking work as input in visionOS?
Eye tracking as input in visionOS: the headset tracks where the user is looking (gaze direction) to determine intent. Privacy design: no visible cursor is shown to other people or other apps — gaze data stays on-device and is never exposed to apps directly. The system reports "this element was selected" without revealing exactly where the user is looking. Interaction model: Look — user's gaze lands on an interactive element → hover effect appears. Pinch — user pinches index and thumb together → confirms selection (equivalent to a tap). Direct touch — for elements within reach, user can touch virtual UI panels directly. Voice — Siri and dictation throughout the system. Interaction vocabulary: Hover effect — visual feedback (brightness, scale, highlight) when an element is looked at; required for accessibility. Hover phase — .active (eye on element), .inactive (eye elsewhere). Spatial gesture — a gesture performed in 3D space (e.g., turning hand to scroll). Room-scale tracking — tracking the user's full body movement around a room. Comfort zone — UI design guideline: keep content in the 45° comfortable gaze zone in front of the user. In conversation: "We spent a week getting the hover effects right — if the element doesn't respond visually when the user looks at it, they don't know it's interactive."
4 / 5
An XR developer compares AR and VR terminology: "The spectrum goes from fully real to fully virtual. AR (Augmented Reality) overlays virtual content on the real world. MR (Mixed Reality) goes further — virtual objects interact with the real world (occlusion, physics). VR (Virtual Reality) replaces the real world entirely. XR or Extended Reality is the umbrella term for all of them. Apple calls visionOS 'spatial computing' — they want to avoid the VR stigma." What is the reality-virtuality continuum in XR vocabulary?
Reality-virtuality continuum (Milgram & Kishino, 1994): the conceptual spectrum from fully real to fully virtual. Positions: Real environment — the physical world. Augmented Reality (AR) — virtual content overlaid on the real world without interaction between them. Example: Pokémon Go, IKEA Place. Mixed Reality (MR) — virtual objects interact with the real world: occlusion (virtual behind real objects), physics (virtual ball bounces off real table). Example: HoloLens, visionOS mixed mode. Augmented Virtuality (AV) — mostly virtual with real elements inserted. Virtual Reality (VR) — completely immersive virtual environment. Example: Meta Quest in full VR mode. XR vocabulary: XR (Extended Reality) — umbrella term for AR, MR, VR. Field of View (FOV) — the angle of the visible display area. Wide FOV = more immersive. Human FOV: ~200°; headset FOV: typically 90–120°. Presence — the subjective feeling of "being there" in a virtual environment. Motion sickness / cybersickness — discomfort from mismatch between visual motion and vestibular (inner ear) signals. Minimised by low latency, high frame rate, and avoiding artificial locomotion. 6DoF (Six Degrees of Freedom) — tracking position (x,y,z) and rotation (pitch, yaw, roll). Full 6DoF enables natural movement. In conversation: "We chose mixed mode for the architecture review app — clients want to see how the building fits in their actual space, not replace reality with a virtual room."
5 / 5
A developer describes building spatial experiences with RealityKit: "RealityKit is Apple's rendering framework for spatial content. You define a scene as a hierarchy of entities and components — an Entity-Component architecture. An entity is a node in the scene graph. Components add behaviour: ModelComponent for 3D meshes, PhysicsBodyComponent for physics simulation, CollisionComponent for interaction. RealityComposer Pro is the visual editor for building these scenes." What is the Entity-Component architecture pattern in game/spatial computing engines?
Entity-Component (EC) / Entity-Component-System (ECS) architecture: a composition-over-inheritance pattern used in game engines and spatial computing frameworks. Entity — a bare identifier/container in the scene. Has no inherent behaviour or data. Component — a data bag attached to an entity. Defines properties: ModelComponent (has a 3D mesh), PhysicsBodyComponent (has mass, physics properties), CollisionComponent (has a collision shape), InputTargetComponent (can be interacted with). System — logic that processes all entities with a specific set of components. Example: PhysicsSystem processes all entities with PhysicsBodyComponent. Why better than inheritance: a monster in a game might need rendering, physics, AI, and animation — in inheritance you'd need a complex hierarchy; in EC, you just add components. RealityKit vocabulary: Entity — the scene node. Base class: `Entity`. Subclasses: `ModelEntity`, `AnchorEntity`. Scene graph — the tree of entities and their parent-child relationships. RealityComposer Pro — visual editor for building and previewing RealityKit scenes. Reality file (.reality) — compiled scene file for RealityKit. USD (Universal Scene Description) — Pixar's open 3D scene format; used extensively in Apple's spatial computing stack. In conversation: "ECS makes it easy to add behaviour without touching existing code — we added a HighlightComponent to every interactive entity without changing any existing logic."