This work began as a personal challenge to bridge the gap between a childhood love for strategy games and a fascination with deep learning. Developed over the course of a year within the context of a Master’s in Computer Science, the talk presents a methodological case study on creating a sophisticated game bot from scratch. Strategy games like Antiyoy present a unique hurdle for standard Reinforcement Learning: they aren't just about quick reflexes; they require long-term economic planning and spatial reasoning on a grid where every cell has six neighbors instead of four. The talk answers the question of how to move from a raw game engine to a standardized Python environment that stable learning algorithms can actually use. The presentation is divided into three thematic explorations: Part 1: How do we translate a world of hexagons into a language of tensors? The first challenge of any RL project is the interface. The talk describes the process of wrapping game logic into a Gymnasium-compatible environment. You will hear about the design decisions behind a 91-channel observation space—essentially "images" that encode everything from unit positions to economic health. We will also address the "spiral" mathematics of hexagonal grids: how do we efficiently map 2D coordinates to a discrete action space that a model can output? Part 2: How can a model learn to navigate 17,000 possible choices? With an action space significantly larger than that of Chess, a random agent will almost never find a winning move by accident. The talk explores how we use "action masking" to prevent the model from wasting time on illegal moves and how the architecture of the neural network—splitting its focus between predicting the "value" of a board and the "policy" of a move—allows it to develop a sense of strategy. We will discuss the trade-offs between lightweight models and deep residual networks when working with limited hardware. Part 3: What does emergent strategy look like in a trained agent? The final part of the talk focuses on the training process itself. How do we provide feedback to an agent when the only thing that truly matters is winning or losing thirty minutes later? You will hear about "hybrid reward shaping"—a way to give the agent small hints about territory and economy without distracting it from the ultimate goal. The talk concludes with an analysis of the results: identifying the moment the agent stopped making random moves and started demonstrating recognizable human-like tactics, as well as the limitations we discovered along the way.