Skip to content

Evaluator#

The Evaluator class is the central processor running the evaluation of a model on a dataset. It uses a ModelInterface to score options within a set of answers.

To create an Evaluator for a given model, the Evaluator.from_model method can be used. The appropriate ModelInterface class is then chosen automatically.

evaluator = Evaluator.from_model("gpt", model_type="CLM")

Evaluator #

Evaluator(*, model_interface: ModelInterface, templater: Optional[Templater] = None)

Methods:

Name Description
evaluate_dataset

Evaluate the model on all relations in the dataset.

evaluate_item

Return the scores for each of the answer options.

evaluate_dataset #

evaluate_dataset(dataset: Dataset, template_index: Union[int, Sequence[int], None] = None, *, subsample: Optional[int] = None, save_path: Optional[PathLike] = None, fmt: InstanceTableFileFormat = None, create_instance_table: bool = True, metric: Optional[MultiMetricSpecification] = None, **kw) -> DatasetResults

Evaluate the model on all relations in the dataset.

evaluate_item #

evaluate_item(item: Item, *, template: Literal[None] = None, answers: Literal[None] = None, subject: Literal[None] = None, print_ranking: bool = False, **kw) -> Union[ItemScores, ItemTokenScoresAndRoles]
evaluate_item(item: Iterable[Item], *, template: Literal[None] = None, answers: Literal[None] = None, subject: Literal[None] = None, print_ranking: Literal[False] = False, **kw) -> Union[Iterator[ItemScores], Iterator[ItemTokenScoresAndRoles]]
evaluate_item(item: Literal[None] = None, *, template: str, answers: Sequence[str], subject: Optional[str] = None, print_ranking: bool = False, **kw) -> Union[ItemScores, ItemTokenScoresAndRoles]
evaluate_item(item: Union[None, Item, Iterator[Item]] = None, *, template: Optional[str] = None, answers: Optional[Sequence[str]] = None, subject: Optional[str] = None, print_ranking: bool = False, **kw) -> Union[ItemScores, ItemTokenScoresAndRoles, Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Return the scores for each of the answer options.

This function needs to be implemented by each of the concrete Evaluator subclasses.