Evaluator Classes#
The (pseudo) log-likelihood-based approaches derive from the Evaluator
class which implements a lot of the basic functionality.
To create an evaluator instance, use Evaluator.from_model
.
Evaluator
#
Base class for PLL-based evaluation classes.
Use Evaluator.from_model
to create a suitable model-specific Evaluator
instance.
Methods:
Name | Description |
---|---|
encode |
Encode the statements using the tokenizer and create an appropriate scoring mask. |
evaluate_dataset |
Evaluate the model on all relations in the dataset. |
from_model |
Create an evaluator instance for the given model. |
replace_placeholders |
Replace all placeholders in the template with the respective values. |
score_answers |
Calculate sequence scores using the Casual Language Model. |
score_statements |
Compute the PLL score for the tokens (determined by the scoring mask) in a statements. |
encode
abstractmethod
#
encode(
statements: Sequence[str],
span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]
Encode the statements using the tokenizer and create an appropriate scoring mask.
In case the conditional scores need to be created, set the scoring mask accordingly.
evaluate_dataset
#
evaluate_dataset(
dataset: Dataset,
template_index: int = 0,
*,
batch_size: int = 1,
subsample: Optional[int] = None,
save_path: Optional[PathLike] = None,
fmt: InstanceTableFileFormat = None,
reduction: Optional[str] = "default",
create_instance_table: bool = True,
metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults
Evaluate the model on all relations in the dataset.
from_model
classmethod
#
from_model(
model: Union[str, PreTrainedModel],
model_type: Optional[str] = None,
**kw,
) -> Evaluator
Create an evaluator instance for the given model.
In some cases, the model type can be derived from the model itself. To ensure
the right type is chosen, it's recommended to set model_type
manually.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str | PreTrainedModel
|
The model to evaluate. |
required |
|
str | None
|
The type of model (determines the scoring scheme to be used). |
None
|
Returns:
Name | Type | Description |
---|---|---|
Evaluator |
Evaluator
|
The evaluator instance suitable for the model. |
replace_placeholders
#
replace_placeholders(
*,
template: str,
subject: Optional[str],
answer: Optional[str]
) -> tuple[str, SpanRoles]
Replace all placeholders in the template with the respective values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The temaplate string with appropriate placeholders. |
required |
|
Optional[str]
|
The subject label to fill in at the resective placeholder. |
required |
|
Optional[str]
|
The answer span to fill in. |
required |
Returns:
Type | Description |
---|---|
tuple[str, SpanRoles]
|
The final string as well as the spans of the respective elements in the final string. |
score_answers
#
score_answers(
*,
template: str,
answers: Sequence[str],
reduction: Optional[str],
subject: Optional[str] = None,
batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]
Calculate sequence scores using the Casual Language Model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The template to use (should contain a |
required |
|
list[str]
|
List of answers to calculate score for. |
required |
Returns:
Type | Description |
---|---|
Union[EachTokenReturnFormat, ReducedReturnFormat]
|
list[float]: List of suprisals scores per sequence |
score_statements
abstractmethod
#
score_statements(
batched_statements: BatchEncoding,
*,
scoring_masks: Optional[Sequence[ScoringMask]],
batch_size: int = 1
) -> list[list[float]]
Compute the PLL score for the tokens (determined by the scoring mask) in a statements.
This function must be implemented by child-classes for each model-type.
MaskedLMEvaluator
#
Methods:
Name | Description |
---|---|
create_masked_batch |
Extend the existing batch and mask the relevant tokens based on the scoring mask. |
encode |
Encode the statements using the tokenizer and create an appropriate scoring mask. |
evaluate_dataset |
Evaluate the model on all relations in the dataset. |
from_model |
Create an evaluator instance for the given model. |
mask_to_indices |
Transform the scoring mask to a list of indices. |
replace_placeholders |
Replace all placeholders in the template with the respective values. |
score_answers |
Calculate sequence scores using the Casual Language Model. |
Attributes:
Name | Type | Description |
---|---|---|
mask_token |
int
|
Return the mask token id used by the tokenizer. |
create_masked_batch
#
create_masked_batch(
batch: BatchEncoding,
scoring_masks: Sequence[ScoringMask],
) -> BatchEncoding
Extend the existing batch and mask the relevant tokens based on the scoring mask.
encode
#
encode(
statements: Sequence[str],
span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]
Encode the statements using the tokenizer and create an appropriate scoring mask.
In case the conditional scores need to be created, set the scoring mask accordingly.
evaluate_dataset
#
evaluate_dataset(
dataset: Dataset,
template_index: int = 0,
*,
batch_size: int = 1,
subsample: Optional[int] = None,
save_path: Optional[PathLike] = None,
fmt: InstanceTableFileFormat = None,
reduction: Optional[str] = "default",
create_instance_table: bool = True,
metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults
Evaluate the model on all relations in the dataset.
from_model
classmethod
#
from_model(
model: Union[str, PreTrainedModel],
model_type: Optional[str] = None,
**kw,
) -> Evaluator
Create an evaluator instance for the given model.
In some cases, the model type can be derived from the model itself. To ensure
the right type is chosen, it's recommended to set model_type
manually.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str | PreTrainedModel
|
The model to evaluate. |
required |
|
str | None
|
The type of model (determines the scoring scheme to be used). |
None
|
Returns:
Name | Type | Description |
---|---|---|
Evaluator |
Evaluator
|
The evaluator instance suitable for the model. |
mask_to_indices
#
Transform the scoring mask to a list of indices.
replace_placeholders
#
replace_placeholders(
*,
template: str,
subject: Optional[str],
answer: Optional[str]
) -> tuple[str, SpanRoles]
Replace all placeholders in the template with the respective values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The temaplate string with appropriate placeholders. |
required |
|
Optional[str]
|
The subject label to fill in at the resective placeholder. |
required |
|
Optional[str]
|
The answer span to fill in. |
required |
Returns:
Type | Description |
---|---|
tuple[str, SpanRoles]
|
The final string as well as the spans of the respective elements in the final string. |
score_answers
#
score_answers(
*,
template: str,
answers: Sequence[str],
reduction: Optional[str],
subject: Optional[str] = None,
batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]
Calculate sequence scores using the Casual Language Model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The template to use (should contain a |
required |
|
list[str]
|
List of answers to calculate score for. |
required |
Returns:
Type | Description |
---|---|
Union[EachTokenReturnFormat, ReducedReturnFormat]
|
list[float]: List of suprisals scores per sequence |
CausalLMEvaluator
#
Methods:
Name | Description |
---|---|
encode |
Encode the statements using the tokenizer and create an appropriate scoring mask. |
evaluate_dataset |
Evaluate the model on all relations in the dataset. |
from_model |
Create an evaluator instance for the given model. |
replace_placeholders |
Replace all placeholders in the template with the respective values. |
score_answers |
Calculate sequence scores using the Casual Language Model. |
encode
#
encode(
statements: Sequence[str],
span_roles: Sequence[SpanRoles],
) -> tuple[BatchEncoding, Sequence[ScoringMask]]
Encode the statements using the tokenizer and create an appropriate scoring mask.
In case the conditional scores need to be created, set the scoring mask accordingly.
evaluate_dataset
#
evaluate_dataset(
dataset: Dataset,
template_index: int = 0,
*,
batch_size: int = 1,
subsample: Optional[int] = None,
save_path: Optional[PathLike] = None,
fmt: InstanceTableFileFormat = None,
reduction: Optional[str] = "default",
create_instance_table: bool = True,
metric: Optional[MultiMetricSpecification] = None
) -> DatasetResults
Evaluate the model on all relations in the dataset.
from_model
classmethod
#
from_model(
model: Union[str, PreTrainedModel],
model_type: Optional[str] = None,
**kw,
) -> Evaluator
Create an evaluator instance for the given model.
In some cases, the model type can be derived from the model itself. To ensure
the right type is chosen, it's recommended to set model_type
manually.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str | PreTrainedModel
|
The model to evaluate. |
required |
|
str | None
|
The type of model (determines the scoring scheme to be used). |
None
|
Returns:
Name | Type | Description |
---|---|---|
Evaluator |
Evaluator
|
The evaluator instance suitable for the model. |
replace_placeholders
#
replace_placeholders(
*,
template: str,
subject: Optional[str],
answer: Optional[str]
) -> tuple[str, SpanRoles]
Replace all placeholders in the template with the respective values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The temaplate string with appropriate placeholders. |
required |
|
Optional[str]
|
The subject label to fill in at the resective placeholder. |
required |
|
Optional[str]
|
The answer span to fill in. |
required |
Returns:
Type | Description |
---|---|
tuple[str, SpanRoles]
|
The final string as well as the spans of the respective elements in the final string. |
score_answers
#
score_answers(
*,
template: str,
answers: Sequence[str],
reduction: Optional[str],
subject: Optional[str] = None,
batch_size: int = 1
) -> Union[EachTokenReturnFormat, ReducedReturnFormat]
Calculate sequence scores using the Casual Language Model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The template to use (should contain a |
required |
|
list[str]
|
List of answers to calculate score for. |
required |
Returns:
Type | Description |
---|---|
Union[EachTokenReturnFormat, ReducedReturnFormat]
|
list[float]: List of suprisals scores per sequence |