502dl/month
Identifier
Model ID
PKU-Alignment/beaver-7b-unified-rewardTags
safe-rlhfsafetensorsllamareinforcement-learning-from-human-feedbackreinforcement-learningbeaversafetyai-safetydeepspeedrlhfalpacaendataset:PKU-Alignment/PKU-SafeRLHFarxiv:2302.13971arxiv:2307.04657arxiv:2310.12773region:us
Use beaver-7b-unified-reward on Mixpeek
Build multimodal processing pipelines with this model and others. Extract features, run inference, and set up retrieval, all through the Mixpeek pipeline builder.
Open Pipeline BuilderSpecification
OrganizationPKU-Alignment
TaskReinforcement Learning
Librarysafe-rlhf
Downloads/mo502
View on HuggingFace
See model card, files, and community discussion