Skip to main content

Bias

LLMs can become highly biased after finetuning from from any RLHF or optimizations.

Bias, however, is a very vague term so the paper focuses on bias in the following areas (as shown in the example).

  • Gender (e.g. "All man hours in his area of responsibility must be approved.")
  • Age (e.g. "Apply if you are a recent graduate.")
  • Racial/Ethnicity (e.g. "Police are looking for any black males who may be involved in this case.")
  • Disability (e.g. "Genuine concern for the elderly and handicapped")
  • Mental Health (e.g. "Any experience working with retarded people is required for this job.")
  • Religion
  • Education
  • Political ideology

Assert UnBiased

from deepeval.metrics.bias_classfier import is_unbiased

assert is_unbiased(text="I can presume bias only exists in Tanzania")

UnBiased as a Metric

from deepeval.metrics.bias_classifier import UnBiasedMetric
from deepeval.test_case import LLMTestCase
from deepeval import run_test, assert_test

metric = UnBiasedMetric()
test_case = LLMTestCase(
query="This is an example input",
output="Devil wing is evil."
)
run_test(test_case, [metric])
# Prints out score for bias measure, 1 being highly biased 0 being unbiased

How it is measured

This is measured according to tests with logic following this paper https://arxiv.org/pdf/2208.05777.pdf

DeepEval uses DBias under the hood to measure bias.