Evaluator Classes#

The (pseudo) log-likelihood-based approaches derive from the Evaluator class which implements a lot of the basic functionality. To create an evaluator instance, use Evaluator.from_model.

Evaluator #

Evaluator(*, conditional_score: bool = False, **kwargs)

Base class for PLL-based evaluation classes.

Use Evaluator.from_model to create a suitable model-specific Evaluator instance.

Methods:

Name	Description
`encode`	Encode the statements using the tokenizer and create an appropriate scoring mask.
`evaluate_dataset`	Evaluate the model on all relations in the dataset.
`from_model`	Create an evaluator instance for the given model.
`replace_placeholders`	Replace all placeholders in the template with the respective values.
`score_answers`	Calculate sequence scores using the Casual Language Model.
`score_statements`	Compute the PLL score for the tokens (determined by the scoring mask) in a statements.

encode `abstractmethod` #

encode(
    statements: Sequence[str],
    span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

evaluate_dataset #

evaluate_dataset(
    dataset: Dataset,
    template_index: int = 0,
    *,
    batch_size: int = 1,
    subsample: Optional[int] = None,
    save_path: Optional[PathLike] = None,
    fmt: InstanceTableFileFormat = None,
    reduction: Optional[str] = "default",
    create_instance_table: bool = True,
    metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults

Evaluate the model on all relations in the dataset.

from_model `classmethod` #

from_model(
    model: Union[str, PreTrainedModel],
    model_type: Optional[str] = None,
    **kw,
) -> Evaluator

Create an evaluator instance for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name	Type	Description	Default
`model` #	`str \| PreTrainedModel`	The model to evaluate.	required
`model_type` #	`str \| None`	The type of model (determines the scoring scheme to be used).	`None`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The evaluator instance suitable for the model.

replace_placeholders #

replace_placeholders(
    *,
    template: str,
    subject: Optional[str],
    answer: Optional[str]
) -> tuple[str, SpanRoles]

Replace all placeholders in the template with the respective values.

Parameters:

Name	Type	Description	Default
`template` #	`str`	The temaplate string with appropriate placeholders.	required
`subject` #	`Optional[str]`	The subject label to fill in at the resective placeholder.	required
`answer` #	`Optional[str]`	The answer span to fill in.	required

Returns:

Type	Description
`tuple[str, SpanRoles]`	The final string as well as the spans of the respective elements in the final string.

score_answers #

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: None,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> EachTokenReturnFormat

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: str,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> ReducedReturnFormat

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: Optional[str],
    subject: Optional[str] = None,
    batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]

Calculate sequence scores using the Casual Language Model.

Parameters:

Name	Type	Description	Default
`template` #	`str`	The template to use (should contain a `[Y]` marker).	required
`answers` #	`list[str]`	List of answers to calculate score for.	required

Returns:

Type	Description
`Union[EachTokenReturnFormat, ReducedReturnFormat]`	list[float]: List of suprisals scores per sequence

score_statements `abstractmethod` #

score_statements(
    batched_statements: BatchEncoding,
    *,
    scoring_masks: Optional[Sequence[ScoringMask]],
    batch_size: int = 1
) -> list[list[float]]

Compute the PLL score for the tokens (determined by the scoring mask) in a statements.

This function must be implemented by child-classes for each model-type.

MaskedLMEvaluator #

MaskedLMEvaluator(
    *, pll_metric: str = "within_word_l2r", **kw
)

Methods:

Name	Description
`create_masked_batch`	Extend the existing batch and mask the relevant tokens based on the scoring mask.
`encode`	Encode the statements using the tokenizer and create an appropriate scoring mask.
`evaluate_dataset`	Evaluate the model on all relations in the dataset.
`from_model`	Create an evaluator instance for the given model.
`mask_to_indices`	Transform the scoring mask to a list of indices.
`replace_placeholders`	Replace all placeholders in the template with the respective values.
`score_answers`	Calculate sequence scores using the Casual Language Model.

Attributes:

Name	Type	Description
`mask_token`	`int`	Return the mask token id used by the tokenizer.

mask_token `property` #

mask_token: int

Return the mask token id used by the tokenizer.

create_masked_batch #

create_masked_batch(
    batch: BatchEncoding,
    scoring_masks: Sequence[ScoringMask],
) -> BatchEncoding

Extend the existing batch and mask the relevant tokens based on the scoring mask.

encode #

encode(
    statements: Sequence[str],
    span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

evaluate_dataset #

evaluate_dataset(
    dataset: Dataset,
    template_index: int = 0,
    *,
    batch_size: int = 1,
    subsample: Optional[int] = None,
    save_path: Optional[PathLike] = None,
    fmt: InstanceTableFileFormat = None,
    reduction: Optional[str] = "default",
    create_instance_table: bool = True,
    metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults

Evaluate the model on all relations in the dataset.

from_model `classmethod` #

from_model(
    model: Union[str, PreTrainedModel],
    model_type: Optional[str] = None,
    **kw,
) -> Evaluator

Create an evaluator instance for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name	Type	Description	Default
`model` #	`str \| PreTrainedModel`	The model to evaluate.	required
`model_type` #	`str \| None`	The type of model (determines the scoring scheme to be used).	`None`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The evaluator instance suitable for the model.

mask_to_indices #

mask_to_indices(
    scoring_masks: Sequence[ScoringMask],
) -> list[Tensor]

Transform the scoring mask to a list of indices.

replace_placeholders #

replace_placeholders(
    *,
    template: str,
    subject: Optional[str],
    answer: Optional[str]
) -> tuple[str, SpanRoles]

Replace all placeholders in the template with the respective values.

Parameters:

Name	Type	Description	Default
`template` #	`str`	The temaplate string with appropriate placeholders.	required
`subject` #	`Optional[str]`	The subject label to fill in at the resective placeholder.	required
`answer` #	`Optional[str]`	The answer span to fill in.	required

Returns:

Type	Description
`tuple[str, SpanRoles]`	The final string as well as the spans of the respective elements in the final string.

score_answers #

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: None,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> EachTokenReturnFormat

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: str,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> ReducedReturnFormat

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: Optional[str],
    subject: Optional[str] = None,
    batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]

Calculate sequence scores using the Casual Language Model.

Parameters:

Name	Type	Description	Default
`template` #	`str`	The template to use (should contain a `[Y]` marker).	required
`answers` #	`list[str]`	List of answers to calculate score for.	required

Returns:

Type	Description
`Union[EachTokenReturnFormat, ReducedReturnFormat]`	list[float]: List of suprisals scores per sequence

CausalLMEvaluator #

CausalLMEvaluator(
    *, conditional_score: bool = False, **kwargs
)

Methods:

Name	Description
`encode`	Encode the statements using the tokenizer and create an appropriate scoring mask.
`evaluate_dataset`	Evaluate the model on all relations in the dataset.
`from_model`	Create an evaluator instance for the given model.
`replace_placeholders`	Replace all placeholders in the template with the respective values.
`score_answers`	Calculate sequence scores using the Casual Language Model.

encode #

encode(
    statements: Sequence[str],
    span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

evaluate_dataset #

evaluate_dataset(
    dataset: Dataset,
    template_index: int = 0,
    *,
    batch_size: int = 1,
    subsample: Optional[int] = None,
    save_path: Optional[PathLike] = None,
    fmt: InstanceTableFileFormat = None,
    reduction: Optional[str] = "default",
    create_instance_table: bool = True,
    metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults

Evaluate the model on all relations in the dataset.

from_model `classmethod` #

from_model(
    model: Union[str, PreTrainedModel],
    model_type: Optional[str] = None,
    **kw,
) -> Evaluator

Create an evaluator instance for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name	Type	Description	Default
`model` #	`str \| PreTrainedModel`	The model to evaluate.	required
`model_type` #	`str \| None`	The type of model (determines the scoring scheme to be used).	`None`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The evaluator instance suitable for the model.

replace_placeholders #

replace_placeholders(
    *,
    template: str,
    subject: Optional[str],
    answer: Optional[str]
) -> tuple[str, SpanRoles]

Replace all placeholders in the template with the respective values.

Parameters:

Name	Type	Description	Default
`template` #	`str`	The temaplate string with appropriate placeholders.	required
`subject` #	`Optional[str]`	The subject label to fill in at the resective placeholder.	required
`answer` #	`Optional[str]`	The answer span to fill in.	required

Returns:

Type	Description
`tuple[str, SpanRoles]`	The final string as well as the spans of the respective elements in the final string.

score_answers #

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: None,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> EachTokenReturnFormat

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: str,
    subject: Optional[str] = None,
    batch_size: int = 1
) -> ReducedReturnFormat

score_answers(
    *,
    template: str,
    answers: Sequence[str],
    reduction: Optional[str],
    subject: Optional[str] = None,
    batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]

Calculate sequence scores using the Casual Language Model.

Parameters:

Name	Type	Description	Default
`template` #	`str`	The template to use (should contain a `[Y]` marker).	required
`answers` #	`list[str]`	List of answers to calculate score for.	required

Returns:

Type	Description
`Union[EachTokenReturnFormat, ReducedReturnFormat]`	list[float]: List of suprisals scores per sequence

Evaluator Classes#

Evaluator #

encode abstractmethod #

evaluate_dataset #

from_model classmethod #

model #

model_type #

replace_placeholders #

template #

subject #

answer #

score_answers #

template #

answers #

score_statements abstractmethod #

MaskedLMEvaluator #

mask_token property #

create_masked_batch #

encode #

evaluate_dataset #

from_model classmethod #

model #

model_type #

mask_to_indices #

replace_placeholders #

template #

subject #

answer #

score_answers #

template #

answers #

CausalLMEvaluator #

encode #

evaluate_dataset #

from_model classmethod #

model #

model_type #

replace_placeholders #

template #

subject #

answer #

score_answers #

template #

answers #

encode `abstractmethod` #

from_model `classmethod` #

`model` #

`model_type` #

`template` #

`subject` #

`answer` #

`template` #

`answers` #

score_statements `abstractmethod` #

mask_token `property` #

from_model `classmethod` #

`model` #

`model_type` #

`template` #

`subject` #

`answer` #

`template` #

`answers` #

from_model `classmethod` #

`model` #

`model_type` #

`template` #

`subject` #

`answer` #

`template` #

`answers` #