Post

LLM - Evaluation

Evaluation

overall

Sample

LlamaIndex

1
2
3
4
5
6
7
8
from deepeval.integrations.llama_index import (
    DeepEvalAnswerRelevancyEvaluator,
    DeepEvalFaithfulnessEvaluator,
    DeepEvalContextualRelevancyEvaluator,
    DeepEvalSummarizationEvaluator,
    DeepEvalBiasEvaluator,
    DeepEvalToxicityEvaluator,
)

Screenshot 2024-04-29 at 12.15.14

Screenshot 2024-04-29 at 12.16.25

Evaluating Response Faithfulness (i.e. Hallucination)

  • The FaithfulnessEvaluator evaluates if the answer is faithful to the retrieved contexts (in other words, whether if there’s hallucination).

Screenshot 2024-04-29 at 12.33.12

Evaluating Query + Response Relevancy

  • The RelevancyEvaluator evaluates if the retrieved context and the answer is relevant and consistent for the given query.

Screenshot 2024-04-29 at 12.39.39

Screenshot 2024-04-29 at 12.39.24

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.