They reference ESA's research in "Guidance and Control Nets", and when looking at ESA's page for their "Advanced Concepts Team" [0] they in turn reference ETH Zürich's research in RL for drone control. Specifically [1] this paper from 2023: "Champion-level drone racing using deep reinforcement learning" [2]. They use a 2x128 MLP for the control policy.
[0] https://www.esa.int/gsp/ACT/
[1] https://www.esa.int/gsp/ACT/projects/rl_vs_imitation_learnin...
[2] https://www.zora.uzh.ch/id/eprint/257405/