NEWAgents can now see video via MCP.Try it now →

    Reinforcement Learning Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    106 models available

    Showing 2548 of 106 models

    Reinforcement Learning

    mradermacher/MediX-R1-8B-i1-GGUF

    2K
    1
    transformers
    Reinforcement Learning

    ValueFX9507/Tifa-DeepsexV3-14b-GGUF-Q6

    1K
    42
    transformers
    Reinforcement Learning

    mradermacher/KnowRL-Nemotron-1.5B-i1-GGUF

    1K
    transformers
    Reinforcement Learning

    ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4

    1K
    224
    transformers
    Reinforcement Learning

    mradermacher/LiteResearcher-4B-i1-GGUF

    1K
    transformers
    Reinforcement Learning

    mradermacher/Vero-Qwen3T-8B-GGUF

    1K
    transformers
    Reinforcement Learning

    mradermacher/TutorAI-Chemistry-Phi4-GGUF

    1K
    transformers
    Reinforcement Learning

    mradermacher/Agent-STAR-RL-7B-i1-GGUF

    989
    1
    transformers
    Reinforcement Learning

    mradermacher/Vero-MiMo-7B-GGUF

    985
    1
    transformers
    Reinforcement Learning

    mradermacher/Miner-4B-GGUF

    985
    transformers
    Reinforcement Learning

    mrinaalarora/wordle-grpo-Qwen3-1.7B

    980
    transformers
    Reinforcement Learning

    sb3/sac-BipedalWalker-v3

    977
    stable-baselines3
    Reinforcement Learning

    PKU-Alignment/beaver-7b-v1.0-reward

    958
    17
    safe-rlhf
    Reinforcement Learning

    mradermacher/VeriReason-Qwen2.5-7b-SFT-Reasoning-i1-GGUF

    948
    1
    transformers
    Reinforcement Learning

    mradermacher/GCIRS-Reasoning-1.5B-R1-i1-GGUF

    936
    transformers
    Reinforcement Learning

    mradermacher/AutoBM-Seed-Coder-8B-R-GGUF

    912
    transformers
    Reinforcement Learning

    mradermacher/SEOcrate-4B_grpo_new_01-i1-GGUF

    900
    transformers
    Reinforcement Learning

    igpaub/ppo-CarRacing-v2

    895
    stable-baselines3
    Reinforcement Learning

    mradermacher/Metis-8B-RL-GGUF

    828
    1
    transformers
    Reinforcement Learning

    mradermacher/ToolOmni-Qwen3-4B-GGUF

    796
    1
    transformers
    Reinforcement Learning

    mradermacher/Vero-Qwen25-7B-GGUF

    764
    transformers
    Reinforcement Learning

    PKU-Alignment/beaver-7b-unified-cost

    756
    2
    safe-rlhf
    Reinforcement Learning

    RLinf/RLinf-OpenVLAOFT-LIBERO-130

    750
    3
    Reinforcement Learning

    mradermacher/GCIRS-Reasoning-1.5B-R1-GGUF

    748
    1
    transformers
    2 / 5