memorax.algorithms#
Reinforcement learning algorithms for training agents.
PPO#
PPO - Proximal Policy Optimization for discrete and continuous action spaces.
PPOConfig - Configuration dataclass for PPO.
PPOState - Training state for PPO.
MAPPO#
MAPPO - Multi-Agent PPO for multi-agent environments.
MAPPOConfig - Configuration dataclass for MAPPO.
MAPPOState - Training state for MAPPO.
DQN#
DQN - Deep Q-Network with double Q-learning.
DQNConfig - Configuration dataclass for DQN.
DQNState - Training state for DQN.
R2D2#
R2D2 - Recurrent Experience Replay in Distributed RL.
R2D2Config - Configuration dataclass for R2D2.
R2D2State - Training state for R2D2.
SAC#
SAC - Soft Actor-Critic for continuous control.
SACConfig - Configuration dataclass for SAC.
SACState - Training state for SAC.
PQN#
PQN - Parallelised Q-Network (on-policy Q-learning).
PQNConfig - Configuration dataclass for PQN.
PQNState - Training state for PQN.
StreamAC#
StreamAC - Actor-Critic with eligibility traces.
StreamACConfig - Configuration dataclass for StreamAC.
StreamACState - Training state for StreamAC.
GradientPPO#
GradientPPO - PPO with gradient eligibility traces.
GradientPPOConfig - Configuration dataclass for GradientPPO.
GradientPPOState - Training state for GradientPPO.