๐ง Development Log
(Prototype & Research Track)
Overall Goal:
- Build a playable FPS-survivor-style prototype with aid of AI director.
- Explore a Director / Mastermind AI for dynamic enemy placement and difficulty control
- Lay technical foundations for future Reinforcement Learning / Offline RL / Decision Transformer experiments
๐ Weeks 1โ2: Direction Exploration & Paradigm Choice
๐ฏ Core Questions
- What kind of small game can realistically reach gameplay validation within one month?
- How can the project retain research value, rather than becoming a simple demo?
๐ง Key Decisions
- Visual quality is intentionally ignored; focus is placed on systems and mechanics
- Unreal Engine is chosen because:
- No need to learn a new engine
- Full source access enables deep debugging
- Closer to real-world industry and research environments
๐ฎ Gameplay Direction Exploration
- Compared:
- Vampire Survivorsโlike designs
- FPS Survivors (e.g. Bloodshed)
- Conclusion:
- Bloodshed essentially forces a survivor loop onto FPS mechanics
- Auto-shooting + enemy swarms reduce player input to camera movement and positioning
- Manual shooting degenerates into repetitive endurance input
โก๏ธ Key Insight:
For an FPS survivor to work, aiming and decision-making must matter again.
๐ Weeks 3โ4: Combat Philosophy & System Decomposition
๐ฅ Combat Design Shift
Inspired by DOOM Eternal and Dark Ages:
- Enemies must have explicit weaknesses
- Weapon choice must matter against different enemy types
- Combat should reward:
- Aggression
- Spatial movement
- Riskโreward decisions
๐งฉ System-Level Thinking
- Rogue-like depth should not come from random enemies alone
- Instead, it should emerge from:
- Weapon upgrade combinations
- Randomized effect bindings
- Meta-progression influencing starting loadouts
โ ๏ธ Important Realization:
Complexity should emerge from system interaction, not from mechanical fatigue.
๐ Weeks 5โ6: ECS-Oriented Thinking & Core Gameplay Systems
๐งฑ ECS-Style Modeling (within Unreal constraints)
- HealthComponent
- WeaponComponent
- Damage handling as a conceptual โsystemโ
- Decoupling Player and Enemy logic
๐ฅ Key Discussion
- Does
ApplyDamageviolate ECS principles?
- Why does Unreal place it at the Actor / Component level?
โก๏ธ Conclusion:
- Unreal is not a pure ECS
- But ECS-style data flow can be achieved through:
- Components
- Delegates
- Event-driven logic
๐ Weeks 7โ8: AI, Spawning, and World Lifecycle Issues
๐ค AI Initialization Problems
- Spawned enemies did not execute Behavior Trees
- Root causes:
- Missing AIController
- Auto Possess AI not configured
๐ World Initialization Pitfalls (Critical)
- Enemies were destroyed immediately after spawning (KillZ)
- Even though the ground mesh was visible
โก๏ธ Core Discovery:
- At
BeginPlay, the following are not guaranteed to be ready: - Collision
- Physics
- NavMesh
- Procedural spawning โ manually placed actors
๐ Weeks 9โ10: Director AI & UnrealโPython Communication
๐ง Director Goal Clarification (Very Important)
The objective was explicitly defined as:
โ Training an AI to play the gameโ Training a Mastermind AI that places enemies to create fair and meaningful challenges
This distinction significantly shaped the research direction.
๐ Unreal โ Python TCP Bridge
- Python server implemented manually
- Unreal acts as a client:
- Sends state vectors
- Receives director actions
- Issues encountered and solved:
- Blocking socket
Recvfreezing Unreal - Partial packet buffering
- Invalid JSON due to message fragmentation
๐ Weeks 11โ12: Reinforcement Learning Direction Alignment
๐ง RL Methods Explored
- Q-learning
- Offline RL
- Decision Transformers
- DeepMindโs StarCraft II research
Key Takeaways
- Long episodes do not prevent training
- Practical strategies include:
- Temporal slicing
- State aggregation
- Wave-level decisions
Important Insight:
The Director AI operates as a strategic scheduling layer,not as a low-level continuous control agent.
Weeks 15โ16: Event-Driven AI Director Architecture
๐ง System Design Shift
- Moved from time-based (temporal polling) logic to a fully event-driven control model
- Reframed the Director from:
- a continuous real-time updater
โ to a decision-making agent triggered by semantic game events
Key Changes Implemented
- UE now emits raw gameplay events (e.g.
EnemyKilled,PlayerHit,WaveStart)
- Python server maintains a belief state reconstructed from event streams
- Director decisions are made only at meaningful decision points, not every frame
- Introduced event aggregation / batching to avoid frame-level noise
RL-Ready Infrastructure
- State is now:
- Implicit and inferred (belief variables like stress, skill, pressure)
- Not hard-coded or synchronized from UE
- Actions operate at the wave / encounter level
- System now naturally supports:
- Tabular Q-learning
- Episode-based training
- Offline replay from logs
Important Insight:
The Director AI should operate on semantic events and belief statesnot on raw frame time or low-level signals.This transforms the problem from real-time controlinto a strategic scheduling and resource allocation taskwhich is fundamentally more learnable and scalable for RL.
Recent Milestone: Event-Driven AI Director Loop Established
โ Achievements
- Unreal client now emits semantic gameplay events (e.g.
WaveStart,EnemyKilled,PlayerDamageBatch)
- Python server maintains a persistent belief state reconstructed from event streams
- Director decisions are triggered by event-based decision points, not time polling
- UE successfully executes returned high-level director actions
Architectural Progress
- Replaced temporal state syncing with event-driven telemetry protocol
- Established a clean UE โ Python โ UE control loop
- Introduced event aggregation to avoid frame-level noise and over-sampling
- Director now operates at wave / encounter timescale
โก๏ธ First learning-ready loop achieved:
Events โ Belief State โ Director Policy โ Actions โ Gameplay
Next Week Plan: Minimal Learning Director (v0)
Objective
Turn the current rule-based Director into a trainable learning agent that can adapt its behavior across encounters.
Core Tasks
1. Belief State Formalization
- Finalize belief state variables (e.g. HP ratio, stress, skill, pressure)
- Discretize or normalize them into a compact state representation
2. Reward Function Design
- Define wave-level reward signals:
- Player survival / death
- Near-death penalty
- Over-easy penalty (low engagement)
3. Learning Layer Integration
- Implement a simple tabular Q-learning (or equivalent)
- Add:
- Exploration strategy (ฮต-greedy)
- Q-value updates at
WaveEnd
4. Logging & Replay
- Log
(state, action, reward, next_state)
- Support offline replay from recorded event streams
- Enable reproducible training runs without UE