Skip to content

Evaluator Classes#

The (pseudo) log-likelihood-based approaches derive from the Evaluator class which implements a lot of the basic functionality. To create an evaluator instance, use Evaluator.from_model.

Evaluator #

Evaluator(*, conditional_score: bool = False, **kwargs)

Base class for PLL-based evaluation classes.

Use Evaluator.from_model to create a suitable model-specific Evaluator instance.

Methods:

Name Description
encode

Encode the statements using the tokenizer and create an appropriate scoring mask.

evaluate_dataset

Evaluate the model on all relations in the dataset.

from_model

Create an evaluator instance for the given model.

replace_placeholders

Replace all placeholders in the template with the respective values.

score_answers

Calculate sequence scores using the Casual Language Model.

score_statements

Compute the PLL score for the tokens (determined by the scoring mask) in a statements.

encode abstractmethod #

encode(
    statements: Sequence[str],
    span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

evaluate_dataset #

evaluate_dataset(
    dataset: Dataset,
    template_index: int = 0,
    *,
    batch_size: int = 1,
    subsample: Optional[int] = None,
    save_path: Optional[PathLike] = None,
    fmt: InstanceTableFileFormat = None,
    reduction: Optional[str] = "default",
    create_instance_table: bool = True,
    metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults

Evaluate the model on all relations in the dataset.

from_model classmethod #

from_model(
    model: Union[str, PreTrainedModel],
    model_type: Optional[str] = None,
    **kw,
) -> Evaluator

Create an evaluator instance for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name Type Description Default

model #

str | PreTrainedModel

The model to evaluate.

required

model_type #

str | None

The type of model (determines the scoring scheme to be used).

None

Returns:

Name Type Description
Evaluator Evaluator

The evaluator instance suitable for the model.

replace_placeholders #

replace_placeholders(
    *,
    template: str,
    subject: Optional[str],
    answer: Optional[str]
) -> tuple[str, SpanRoles]

Replace all placeholders in the template with the respective values.

Parameters:

Name Type Description Default

template #

str

The temaplate string with appropriate placeholders.

required

subject #

Optional[str]

The subject label to fill in at the resective placeholder.

required

answer #

Optional[str]

The answer span to fill in.

required

Returns:

Type Description
tuple[str, SpanRoles]

The final string as well as the spans of the respective elements in the final string.

score_answers #

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: None,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> EachTokenReturnFormat
score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: str,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> ReducedReturnFormat
score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: Optional[str],
    subject: Optional[str] = None,
    batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]

Calculate sequence scores using the Casual Language Model.

Parameters:

Name Type Description Default

template #

str

The template to use (should contain a [Y] marker).

required

answers #

list[str]

List of answers to calculate score for.

required

Returns:

Type Description
Union[EachTokenReturnFormat, ReducedReturnFormat]

list[float]: List of suprisals scores per sequence

score_statements abstractmethod #

score_statements(
    batched_statements: BatchEncoding,
    *,
    scoring_masks: Optional[Sequence[ScoringMask]],
    batch_size: int = 1
) -> list[list[float]]

Compute the PLL score for the tokens (determined by the scoring mask) in a statements.

This function must be implemented by child-classes for each model-type.

MaskedLMEvaluator #

MaskedLMEvaluator(
    *, pll_metric: str = "within_word_l2r", **kw
)

Methods:

Name Description
create_masked_batch

Extend the existing batch and mask the relevant tokens based on the scoring mask.

encode

Encode the statements using the tokenizer and create an appropriate scoring mask.

evaluate_dataset

Evaluate the model on all relations in the dataset.

from_model

Create an evaluator instance for the given model.

mask_to_indices

Transform the scoring mask to a list of indices.

replace_placeholders

Replace all placeholders in the template with the respective values.

score_answers

Calculate sequence scores using the Casual Language Model.

Attributes:

Name Type Description
mask_token int

Return the mask token id used by the tokenizer.

mask_token property #

mask_token: int

Return the mask token id used by the tokenizer.

create_masked_batch #

create_masked_batch(
    batch: BatchEncoding,
    scoring_masks: Sequence[ScoringMask],
) -> BatchEncoding

Extend the existing batch and mask the relevant tokens based on the scoring mask.

encode #

encode(
    statements: Sequence[str],
    span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

evaluate_dataset #

evaluate_dataset(
    dataset: Dataset,
    template_index: int = 0,
    *,
    batch_size: int = 1,
    subsample: Optional[int] = None,
    save_path: Optional[PathLike] = None,
    fmt: InstanceTableFileFormat = None,
    reduction: Optional[str] = "default",
    create_instance_table: bool = True,
    metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults

Evaluate the model on all relations in the dataset.

from_model classmethod #

from_model(
    model: Union[str, PreTrainedModel],
    model_type: Optional[str] = None,
    **kw,
) -> Evaluator

Create an evaluator instance for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name Type Description Default

model #

str | PreTrainedModel

The model to evaluate.

required

model_type #

str | None

The type of model (determines the scoring scheme to be used).

None

Returns:

Name Type Description
Evaluator Evaluator

The evaluator instance suitable for the model.

mask_to_indices #

mask_to_indices(
    scoring_masks: Sequence[ScoringMask],
) -> list[Tensor]

Transform the scoring mask to a list of indices.

replace_placeholders #

replace_placeholders(
    *,
    template: str,
    subject: Optional[str],
    answer: Optional[str]
) -> tuple[str, SpanRoles]

Replace all placeholders in the template with the respective values.

Parameters:

Name Type Description Default

template #

str

The temaplate string with appropriate placeholders.

required

subject #

Optional[str]

The subject label to fill in at the resective placeholder.

required

answer #

Optional[str]

The answer span to fill in.

required

Returns:

Type Description
tuple[str, SpanRoles]

The final string as well as the spans of the respective elements in the final string.

score_answers #

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: None,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> EachTokenReturnFormat
score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: str,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> ReducedReturnFormat
score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: Optional[str],
    subject: Optional[str] = None,
    batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]

Calculate sequence scores using the Casual Language Model.

Parameters:

Name Type Description Default

template #

str

The template to use (should contain a [Y] marker).

required

answers #

list[str]

List of answers to calculate score for.

required

Returns:

Type Description
Union[EachTokenReturnFormat, ReducedReturnFormat]

list[float]: List of suprisals scores per sequence

CausalLMEvaluator #

CausalLMEvaluator(
    *, conditional_score: bool = False, **kwargs
)

Methods:

Name Description
encode

Encode the statements using the tokenizer and create an appropriate scoring mask.

evaluate_dataset

Evaluate the model on all relations in the dataset.

from_model

Create an evaluator instance for the given model.

replace_placeholders

Replace all placeholders in the template with the respective values.

score_answers

Calculate sequence scores using the Casual Language Model.

encode #

encode(
    statements: Sequence[str],
    span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

evaluate_dataset #

evaluate_dataset(
    dataset: Dataset,
    template_index: int = 0,
    *,
    batch_size: int = 1,
    subsample: Optional[int] = None,
    save_path: Optional[PathLike] = None,
    fmt: InstanceTableFileFormat = None,
    reduction: Optional[str] = "default",
    create_instance_table: bool = True,
    metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults

Evaluate the model on all relations in the dataset.

from_model classmethod #

from_model(
    model: Union[str, PreTrainedModel],
    model_type: Optional[str] = None,
    **kw,
) -> Evaluator

Create an evaluator instance for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name Type Description Default

model #

str | PreTrainedModel

The model to evaluate.

required

model_type #

str | None

The type of model (determines the scoring scheme to be used).

None

Returns:

Name Type Description
Evaluator Evaluator

The evaluator instance suitable for the model.

replace_placeholders #

replace_placeholders(
    *,
    template: str,
    subject: Optional[str],
    answer: Optional[str]
) -> tuple[str, SpanRoles]

Replace all placeholders in the template with the respective values.

Parameters:

Name Type Description Default

template #

str

The temaplate string with appropriate placeholders.

required

subject #

Optional[str]

The subject label to fill in at the resective placeholder.

required

answer #

Optional[str]

The answer span to fill in.

required

Returns:

Type Description
tuple[str, SpanRoles]

The final string as well as the spans of the respective elements in the final string.

score_answers #

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: None,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> EachTokenReturnFormat
score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: str,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> ReducedReturnFormat
score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: Optional[str],
    subject: Optional[str] = None,
    batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]

Calculate sequence scores using the Casual Language Model.

Parameters:

Name Type Description Default

template #

str

The template to use (should contain a [Y] marker).

required

answers #

list[str]

List of answers to calculate score for.

required

Returns:

Type Description
Union[EachTokenReturnFormat, ReducedReturnFormat]

list[float]: List of suprisals scores per sequence