NEWVectors or files. Pick a path.Start →

    Reinforcement Learning Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    215 models available

    Showing 2548 of 215 models

    Reinforcement Learning

    edbeeching/decision-transformer-gym-hopper-medium

    1K
    7
    transformers
    Reinforcement Learning

    formalmathatepfl/deepseek-prover-v2-grpo-800

    1K
    transformers
    Reinforcement Learning

    mradermacher/Vero-Qwen3T-8B-GGUF

    1K
    transformers
    Reinforcement Learning

    Abdine/qwen3-4b-medrect-mixed-r2

    1K
    1
    transformers
    Reinforcement Learning

    PKU-Alignment/beaver-7b-v1.0-reward

    1K
    17
    safe-rlhf
    Reinforcement Learning

    sb3/sac-BipedalWalker-v3

    985
    stable-baselines3
    Reinforcement Learning

    mrinaalarora/wordle-grpo-Qwen3-1.7B

    980
    transformers
    Reinforcement Learning

    mradermacher/VeriReason-Qwen2.5-7b-SFT-Reasoning-i1-GGUF

    976
    1
    transformers
    Reinforcement Learning

    PKU-Alignment/beaver-7b-v1.0-cost

    958
    10
    safe-rlhf
    Reinforcement Learning

    PKU-Alignment/beaver-7b-unified-reward

    956
    safe-rlhf
    Reinforcement Learning

    mradermacher/SocialR1-8B-GGUF

    940
    1
    transformers
    Reinforcement Learning

    mradermacher/GCIRS-Reasoning-1.5B-R1-i1-GGUF

    934
    transformers
    Reinforcement Learning

    mradermacher/Reflector-Internalizing-Safety-Llama-3.1-8B-RL-GGUF

    890
    1
    transformers
    Reinforcement Learning

    mradermacher/BEPA-7B-S2-GGUF

    869
    transformers
    Reinforcement Learning

    sb3/dqn-PongNoFrameskip-v4

    855
    2
    stable-baselines3
    Reinforcement Learning

    PKU-Alignment/beaver-7b-unified-cost

    848
    2
    safe-rlhf
    Reinforcement Learning

    mradermacher/TutorAI-Chemistry-Phi4-GGUF

    827
    transformers
    Reinforcement Learning

    sb3/dqn-LunarLander-v2

    777
    stable-baselines3
    Reinforcement Learning

    mradermacher/nexus-1.5b-GGUF

    772
    transformers
    Reinforcement Learning

    mradermacher/Vero-Qwen35-9B-Base-GGUF

    752
    transformers
    Reinforcement Learning

    mradermacher/SocialR1-4B-GGUF

    747
    transformers
    Reinforcement Learning

    mradermacher/GCIRS-Reasoning-1.5B-R1-GGUF

    740
    1
    transformers
    Reinforcement Learning

    Arijit-07/aria-devops-llama3b

    728
    Reinforcement Learning

    mradermacher/GPRM-4B-GGUF

    723
    transformers
    2 / 9