Director AI Devlog

🧠 Development Log

(Prototype & Research Track)

Overall Goal:

Build a playable FPS-survivor-style prototype with aid of AI director.

Explore a Director / Mastermind AI for dynamic enemy placement and difficulty control

Lay technical foundations for future Reinforcement Learning / Offline RL / Decision Transformer experiments

📅 Weeks 1–2: Direction Exploration & Paradigm Choice

🎯 Core Questions

What kind of small game can realistically reach gameplay validation within one month?

How can the project retain research value, rather than becoming a simple demo?

🧠 Key Decisions

Visual quality is intentionally ignored; focus is placed on systems and mechanics

Unreal Engine is chosen because:

No need to learn a new engine
Full source access enables deep debugging
Closer to real-world industry and research environments

🎮 Gameplay Direction Exploration

Compared:

Vampire Survivors–like designs
FPS Survivors (e.g. Bloodshed)

Conclusion:

Bloodshed essentially forces a survivor loop onto FPS mechanics
Auto-shooting + enemy swarms reduce player input to camera movement and positioning
Manual shooting degenerates into repetitive endurance input

➡️ Key Insight:

For an FPS survivor to work, aiming and decision-making must matter again.

📅 Weeks 3–4: Combat Philosophy & System Decomposition

🔥 Combat Design Shift

Inspired by DOOM Eternal and Dark Ages:

Enemies must have explicit weaknesses

Weapon choice must matter against different enemy types

Combat should reward:

Aggression
Spatial movement
Risk–reward decisions

🧩 System-Level Thinking

Rogue-like depth should not come from random enemies alone

Instead, it should emerge from:

Weapon upgrade combinations
Randomized effect bindings
Meta-progression influencing starting loadouts

⚠️ Important Realization:

Complexity should emerge from system interaction, not from mechanical fatigue.

📅 Weeks 5–6: ECS-Oriented Thinking & Core Gameplay Systems

🧱 ECS-Style Modeling (within Unreal constraints)

HealthComponent

WeaponComponent

Damage handling as a conceptual “system”

Decoupling Player and Enemy logic

💥 Key Discussion

Does ApplyDamage violate ECS principles?

Why does Unreal place it at the Actor / Component level?

➡️ Conclusion:

Unreal is not a pure ECS

But ECS-style data flow can be achieved through:

Components
Delegates
Event-driven logic

📅 Weeks 7–8: AI, Spawning, and World Lifecycle Issues

🤖 AI Initialization Problems

Spawned enemies did not execute Behavior Trees

Root causes:

Missing AIController
Auto Possess AI not configured

🌍 World Initialization Pitfalls (Critical)

Enemies were destroyed immediately after spawning (KillZ)

Even though the ground mesh was visible

➡️ Core Discovery:

At BeginPlay, the following are not guaranteed to be ready:

Collision
Physics
NavMesh

Procedural spawning ≠ manually placed actors

📅 Weeks 9–10: Director AI & Unreal–Python Communication

🧠 Director Goal Clarification (Very Important)

The objective was explicitly defined as:

❌ Training an AI to play the game
✅ Training a Mastermind AI that places enemies to create fair and meaningful challenges

This distinction significantly shaped the research direction.

🔌 Unreal ↔ Python TCP Bridge

Python server implemented manually

Unreal acts as a client:

Sends state vectors
Receives director actions

Issues encountered and solved:

Blocking socket Recv freezing Unreal
Partial packet buffering
Invalid JSON due to message fragmentation

📅 Weeks 11–12: Reinforcement Learning Direction Alignment

🧠 RL Methods Explored

Q-learning

Offline RL

Decision Transformers

DeepMind’s StarCraft II research

Key Takeaways

Long episodes do not prevent training

Practical strategies include:

Temporal slicing
State aggregation
Wave-level decisions

Important Insight:

The Director AI operates as a strategic scheduling layer,
not as a low-level continuous control agent.

Weeks 15–16: Event-Driven AI Director Architecture

🧠 System Design Shift

Moved from time-based (temporal polling) logic to a fully event-driven control model

Reframed the Director from:

a continuous real-time updater

→ to a decision-making agent triggered by semantic game events

Key Changes Implemented

UE now emits raw gameplay events (e.g. EnemyKilled, PlayerHit, WaveStart)

Python server maintains a belief state reconstructed from event streams

Director decisions are made only at meaningful decision points, not every frame

Introduced event aggregation / batching to avoid frame-level noise

RL-Ready Infrastructure

State is now:

Implicit and inferred (belief variables like stress, skill, pressure)
Not hard-coded or synchronized from UE

Actions operate at the wave / encounter level

System now naturally supports:

Tabular Q-learning
Episode-based training
Offline replay from logs

Important Insight:

The Director AI should operate on semantic events and belief states
not on raw frame time or low-level signals.
This transforms the problem from real-time control
into a strategic scheduling and resource allocation task
which is fundamentally more learnable and scalable for RL.

Recent Milestone: Event-Driven AI Director Loop Established

✅ Achievements

Unreal client now emits semantic gameplay events (e.g. WaveStart, EnemyKilled, PlayerDamageBatch)

Python server maintains a persistent belief state reconstructed from event streams

Director decisions are triggered by event-based decision points, not time polling

UE successfully executes returned high-level director actions

Architectural Progress

Replaced temporal state syncing with event-driven telemetry protocol

Established a clean UE → Python → UE control loop

Introduced event aggregation to avoid frame-level noise and over-sampling

Director now operates at wave / encounter timescale

➡️ First learning-ready loop achieved:

Events → Belief State → Director Policy → Actions → Gameplay

Next Week Plan: Minimal Learning Director (v0)

Objective

Turn the current rule-based Director into a trainable learning agent that can adapt its behavior across encounters.

Core Tasks

1. Belief State Formalization

Finalize belief state variables (e.g. HP ratio, stress, skill, pressure)

Discretize or normalize them into a compact state representation

2. Reward Function Design

Define wave-level reward signals:

Player survival / death
Near-death penalty
Over-easy penalty (low engagement)

3. Learning Layer Integration

Implement a simple tabular Q-learning (or equivalent)

Add:

Exploration strategy (ε-greedy)
Q-value updates at WaveEnd

4. Logging & Replay

Log (state, action, reward, next_state)

Support offline replay from recorded event streams

Enable reproducible training runs without UE