Director AI Devlog

๐Ÿง  Development Log

(Prototype & Research Track)
Overall Goal:
  • Build a playable FPS-survivor-style prototype with aid of AI director.
  • Explore a Director / Mastermind AI for dynamic enemy placement and difficulty control
  • Lay technical foundations for future Reinforcement Learning / Offline RL / Decision Transformer experiments

๐Ÿ“… Weeks 1โ€“2: Direction Exploration & Paradigm Choice

๐ŸŽฏ Core Questions

  • What kind of small game can realistically reach gameplay validation within one month?
  • How can the project retain research value, rather than becoming a simple demo?

๐Ÿง  Key Decisions

  • Visual quality is intentionally ignored; focus is placed on systems and mechanics
  • Unreal Engine is chosen because:
    • No need to learn a new engine
    • Full source access enables deep debugging
    • Closer to real-world industry and research environments

๐ŸŽฎ Gameplay Direction Exploration

  • Compared:
    • Vampire Survivorsโ€“like designs
    • FPS Survivors (e.g. Bloodshed)
  • Conclusion:
    • Bloodshed essentially forces a survivor loop onto FPS mechanics
    • Auto-shooting + enemy swarms reduce player input to camera movement and positioning
    • Manual shooting degenerates into repetitive endurance input
โžก๏ธ Key Insight:
For an FPS survivor to work, aiming and decision-making must matter again.

๐Ÿ“… Weeks 3โ€“4: Combat Philosophy & System Decomposition

๐Ÿ”ฅ Combat Design Shift

Inspired by DOOM Eternal and Dark Ages:
  • Enemies must have explicit weaknesses
  • Weapon choice must matter against different enemy types
  • Combat should reward:
    • Aggression
    • Spatial movement
    • Riskโ€“reward decisions

๐Ÿงฉ System-Level Thinking

  • Rogue-like depth should not come from random enemies alone
  • Instead, it should emerge from:
    • Weapon upgrade combinations
    • Randomized effect bindings
    • Meta-progression influencing starting loadouts
โš ๏ธ Important Realization:
Complexity should emerge from system interaction, not from mechanical fatigue.

๐Ÿ“… Weeks 5โ€“6: ECS-Oriented Thinking & Core Gameplay Systems

๐Ÿงฑ ECS-Style Modeling (within Unreal constraints)

  • HealthComponent
  • WeaponComponent
  • Damage handling as a conceptual โ€œsystemโ€
  • Decoupling Player and Enemy logic

๐Ÿ’ฅ Key Discussion

  • Does ApplyDamage violate ECS principles?
  • Why does Unreal place it at the Actor / Component level?
โžก๏ธ Conclusion:
  • Unreal is not a pure ECS
  • But ECS-style data flow can be achieved through:
    • Components
    • Delegates
    • Event-driven logic

๐Ÿ“… Weeks 7โ€“8: AI, Spawning, and World Lifecycle Issues

๐Ÿค– AI Initialization Problems

  • Spawned enemies did not execute Behavior Trees
  • Root causes:
    • Missing AIController
    • Auto Possess AI not configured

๐ŸŒ World Initialization Pitfalls (Critical)

  • Enemies were destroyed immediately after spawning (KillZ)
  • Even though the ground mesh was visible
โžก๏ธ Core Discovery:
  • At BeginPlay, the following are not guaranteed to be ready:
    • Collision
    • Physics
    • NavMesh
  • Procedural spawning โ‰  manually placed actors

๐Ÿ“… Weeks 9โ€“10: Director AI & Unrealโ€“Python Communication

๐Ÿง  Director Goal Clarification (Very Important)

The objective was explicitly defined as:
โŒ Training an AI to play the game
โœ… Training a Mastermind AI that places enemies to create fair and meaningful challenges
This distinction significantly shaped the research direction.

๐Ÿ”Œ Unreal โ†” Python TCP Bridge

  • Python server implemented manually
  • Unreal acts as a client:
    • Sends state vectors
    • Receives director actions
  • Issues encountered and solved:
    • Blocking socket Recv freezing Unreal
    • Partial packet buffering
    • Invalid JSON due to message fragmentation

๐Ÿ“… Weeks 11โ€“12: Reinforcement Learning Direction Alignment

๐Ÿง  RL Methods Explored

  • Q-learning
  • Offline RL
  • Decision Transformers
  • DeepMindโ€™s StarCraft II research

Key Takeaways

  • Long episodes do not prevent training
  • Practical strategies include:
    • Temporal slicing
    • State aggregation
    • Wave-level decisions

Important Insight:

The Director AI operates as a strategic scheduling layer,
not as a low-level continuous control agent.

Weeks 15โ€“16: Event-Driven AI Director Architecture

๐Ÿง  System Design Shift

  • Moved from time-based (temporal polling) logic to a fully event-driven control model
  • Reframed the Director from:
    • a continuous real-time updater
      • โ†’ to a decision-making agent triggered by semantic game events

Key Changes Implemented

  • UE now emits raw gameplay events (e.g. EnemyKilled, PlayerHit, WaveStart)
  • Python server maintains a belief state reconstructed from event streams
  • Director decisions are made only at meaningful decision points, not every frame
  • Introduced event aggregation / batching to avoid frame-level noise

RL-Ready Infrastructure

  • State is now:
    • Implicit and inferred (belief variables like stress, skill, pressure)
    • Not hard-coded or synchronized from UE
  • Actions operate at the wave / encounter level
  • System now naturally supports:
    • Tabular Q-learning
    • Episode-based training
    • Offline replay from logs

Important Insight:

The Director AI should operate on semantic events and belief states
not on raw frame time or low-level signals.
This transforms the problem from real-time control
into a strategic scheduling and resource allocation task
which is fundamentally more learnable and scalable for RL.

Recent Milestone: Event-Driven AI Director Loop Established

โœ… Achievements

  • Unreal client now emits semantic gameplay events (e.g. WaveStart, EnemyKilled, PlayerDamageBatch)
  • Python server maintains a persistent belief state reconstructed from event streams
  • Director decisions are triggered by event-based decision points, not time polling
  • UE successfully executes returned high-level director actions

Architectural Progress

  • Replaced temporal state syncing with event-driven telemetry protocol
  • Established a clean UE โ†’ Python โ†’ UE control loop
  • Introduced event aggregation to avoid frame-level noise and over-sampling
  • Director now operates at wave / encounter timescale
โžก๏ธ First learning-ready loop achieved:
Events โ†’ Belief State โ†’ Director Policy โ†’ Actions โ†’ Gameplay

Next Week Plan: Minimal Learning Director (v0)

Objective

Turn the current rule-based Director into a trainable learning agent that can adapt its behavior across encounters.

Core Tasks

1. Belief State Formalization
  • Finalize belief state variables (e.g. HP ratio, stress, skill, pressure)
  • Discretize or normalize them into a compact state representation
2. Reward Function Design
  • Define wave-level reward signals:
    • Player survival / death
    • Near-death penalty
    • Over-easy penalty (low engagement)
3. Learning Layer Integration
  • Implement a simple tabular Q-learning (or equivalent)
  • Add:
    • Exploration strategy (ฮต-greedy)
    • Q-value updates at WaveEnd
4. Logging & Replay
  • Log (state, action, reward, next_state)
  • Support offline replay from recorded event streams
  • Enable reproducible training runs without UE