NEWAgents can now see video via MCP.Try it now →

    Reinforcement Learning Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    106 models available

    Showing 4972 of 106 models

    Reinforcement Learning

    mradermacher/Dynamical-30B-A3B-GGUF

    687
    transformers
    Reinforcement Learning

    mradermacher/MINT-empathy-Qwen3-4B-GGUF

    674
    1
    transformers
    Reinforcement Learning

    Arijit-07/aria-devops-llama3b

    671
    Reinforcement Learning

    mradermacher/Tifa-Deepsex-14b-CoT-GGUF

    667
    23
    transformers
    Reinforcement Learning

    mradermacher/Vero-Qwen3I-8B-GGUF

    663
    transformers
    Reinforcement Learning

    mradermacher/Aryabhata-1.0-GGUF

    646
    1
    transformers
    Reinforcement Learning

    mradermacher/GPRM-4B-GGUF

    635
    transformers
    Reinforcement Learning

    mradermacher/LongWriter-Zero-32B-GGUF

    625
    3
    transformers
    Reinforcement Learning

    mradermacher/AReaL-SEA-235B-A22B-GGUF

    611
    transformers
    Reinforcement Learning

    Abc8264/TutorAI-Chemistry-Phi4

    588
    1
    Reinforcement Learning

    mradermacher/DeepHermes-Egregore-8B-131K-i1-GGUF

    564
    1
    transformers
    Reinforcement Learning

    THU-KEG/LLaDA-8B-BGPO-countdown

    551
    1
    Reinforcement Learning

    mradermacher/LiteResearcher-4B-GGUF

    549
    transformers
    Reinforcement Learning

    mradermacher/Agent-STAR-RL-3B-i1-GGUF

    534
    transformers
    Reinforcement Learning

    mradermacher/PRIMO-COT-SFT-7B-GGUF

    529
    transformers
    Reinforcement Learning

    mradermacher/Agent-STAR-RL-1.5B-i1-GGUF

    526
    transformers
    Reinforcement Learning

    mradermacher/PRIMO-R1-7B-GGUF

    511
    transformers
    Reinforcement Learning

    ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-F16

    504
    91
    transformers
    Reinforcement Learning

    PKU-Alignment/beaver-7b-unified-reward

    502
    safe-rlhf
    Reinforcement Learning

    mradermacher/KnowRL-Nemotron-1.5B-GGUF

    501
    transformers
    Reinforcement Learning

    mradermacher/VeriReason-Qwen2.5-7b-RTLCoder-Verilog-GRPO-reasoning-tb-i1-GGUF

    494
    4
    transformers
    Reinforcement Learning

    Ding-Qiang/ppo-CarRacing-v2

    481
    stable-baselines3
    Reinforcement Learning

    mradermacher/RLinf-math-7B-i1-GGUF

    478
    1
    transformers
    Reinforcement Learning

    mradermacher/LongWriter-Zero-32B-i1-GGUF

    475
    2
    transformers
    3 / 5