Sammanfattning: This licentiate thesis explores the application of deep reinforcement learning (DRL) to flow control in bluff bodies, focusing on reducing drag forces in infinite cylinders. The research spans a range of flow conditions, from laminar to fully turbulent, aiming to advance the state-of-the-art in DRL by exploring novel scenarios not yet covered in the fluid-mechanics literature. Our focus is on the flow around cylinders in two and three dimensions, over a range of Reynolds numbers Re_D based on freestream velocity U and cylinder diameter D. We first consider a single-agent reinforcement learning (SARL) approach using the proximal-policy optimization (PPO) algorithm, coupled with the Alya numerical solver. This approach led to significant drag reductions of 20% and 17.7% for Re_D = 1000 and 2000, respectively, in a two-dimensional (2D) setting. The framework was designed for deployment on high-performance computers, enabling large-scale training with synchronized numerical simulations.Next, we focused on three-dimensional (3D) cylinders, where spanwise instabilities emerge for Re_D > 250. Drawing inspiration from studies such as Williamson (1996) and findings from Tang et al. (2020), we explored strategies for Re_D = 100 to 400 with a multi-agent reinforcement learning (MARL) framework. This approach focused on local invariants, using multiple jets across the top and bottom surfaces. The MARL framework successfully reduced drag by 21% and 16.5% for Re_D = 300 and 400, respectively, outperforming periodic-control strategies by 10 percentage points and doubling efficiency.Finally, the framework was tested in a fully turbulent environment at Re_D = 3900, a well-established case in the literature. Despite the significant computational challenges and complex flow structures, the MARL approach delivered significant results, with an 8.3% drag reduction and reducing the mass flow used in the actuation by two orders of magnitude compared with Kim & Choi (2005). Across these studies, the drag-reduction mechanisms learned by the agents involve altering the wake topology to attenuate and move the location of the Reynolds-stresses maximum values upstream, focusing on enlarging the recirculation bubble and reducing pressure drag.