Glossary · Technical concept

RLHF (Reinforcement Learning from Human Feedback)

A fine-tuning technique where human evaluators rank model outputs and the model is then trained to produce outputs that rank higher. Used heavily in modern LLM alignment. Introduces dependencies on rater quality, rater demographics, and the rater pool's biases — alignment is only as good as the feedback signal.

Framework references

NIST AI 600-1

Relevant Responsible AI Studio tools

/tools/ai-bias-audit
/tools/vendor-assessment

More technical concept terms

Bias (statistical / algorithmic) →
Drift (model) →
Embedding →
Fine-tuning →

See the full 80-term glossary →