Glossary · Technical concept
RLHF (Reinforcement Learning from Human Feedback)
A fine-tuning technique where human evaluators rank model outputs and the model is then trained to produce outputs that rank higher. Used heavily in modern LLM alignment. Introduces dependencies on rater quality, rater demographics, and the rater pool's biases — alignment is only as good as the feedback signal.
Framework references
- NIST AI 600-1