Model Interfaces#

A ModelInterface implements the interaction with the model: Taking a set of statements, it needs to score each of the options. Depending on the type of the model (causal vs. masked language models), different ModelInterfaces are required.

Base Interface#

ModelInterface #

Shared interface for methods that process each answer separately (PLL scoring) or sets of answers (like TYQ).

Methods:

Name	Description
`from_model`	Create a `ModelInterface` from the given parameters.
`score_statement_options`	Score sets of text options.

from_model `abstractmethod` `classmethod` #

from_model(model: Any, **kw) -> ModelInterface

Create a ModelInterface from the given parameters.

Load the model, tokenizer, etc. and instantiate a ModelInterface.

score_statement_options `abstractmethod` #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemTokenScoresAndRoles], Iterable[ItemScores]]

Score sets of text options.

The ModelInterface itself is resonsible for batching the requests.

The PLLModelInterfaceMixin can be used to join the statement sets into one iterable, requiring the interface to just score individual texts.

PLLModelInterfaceMixin #

Interface for methods that assign a score per statement.

Methods:

Name	Description
`score_statement_options`	Join the sets of statements, process each statement, and order the scores according to the inputs.
`score_statements`	Score individual texts (independent of the other options) using the Casual/Masked Language Model.

score_statement_options #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements `abstractmethod` #

score_statements(statements: Iterable[str], *, text_roles: Optional[Iterable[TextRoles]] = None, **kw) -> Union[Iterable[StatementScore], Iterable[TokenScoresAndRoles]]

Score individual texts (independent of the other options) using the Casual/Masked Language Model.

Parameters:

Name	Type	Description	Default
`statements` #	`Iterable[str]`	The statements to score.	required
`text_roles` #	`Optional[Iterable[TextRoles]]`	Which parts of the statement are the answer, template, and subject.	`None`

Returns:

Type	Description
`Union[Iterable[StatementScore], Iterable[TokenScoresAndRoles]]`	Scores (or scores and roles) per statement

Sentence-Loglikelihood-based Interfaces#

The following interfaces implement (pseudo) loglikelihood scoring for the text options.

CLMInterface #

CLMInterface(*args, ensure_bos_token_added: bool = True, conditional_score: bool = False, **kw)

Methods:

Name	Description
`encode`	Encode the statements using the tokenizer and create an appropriate scoring mask.
`from_model`	Create an interface for the given model.
`score_statement_options`	Join the sets of statements, process each statement, and order the scores according to the inputs.
`score_statements`	Score individual texts (independent of the other options) using the Casual Language Model.

encode #

encode(statements: Sequence[str], roles: Optional[Sequence[TextRoles]]) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

from_model `classmethod` #

from_model(model: Union[str, PreTrainedModel], model_type: Optional[str] = None, **kw) -> Self

Create an interface for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name	Type	Description	Default
`model str \| PreTrainedModel` #		The model to evaluate.	required
`model_type str \| None` #		The type of model (determines the scoring scheme to be used).	required

Returns:

Name	Type	Description
`HFPLLModelInterface`	`Self`	The evaluator instance suitable for the model.

score_statement_options #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements #

score_statements(statements: Iterable[str], *, text_roles: Optional[Iterable[TextRoles]] = None, batch_size: Optional[int] = None, **kw) -> Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]

Score individual texts (independent of the other options) using the Casual Language Model.

Parameters:

Name	Type	Description	Default
`statements` #	`Iterable[str]`	The statements to score.	required
`text_roles` #	`Optional[Iterable[TextRoles]]`	Which parts of the statement are the answer, template, and subject.	`None`

Returns:

Type	Description
`Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]`	Scores (or scores and roles) per statement

MLMInterface #

MLMInterface(*args, pll_metric: str = 'within_word_l2r', conditional_score: bool = False, preprocessing_batch_size: int = 1000, **kw)

Methods:

Name	Description
`create_masked_requests`	Extend the existing batch and mask the relevant tokens based on the scoring mask.
`from_model`	Create an interface for the given model.
`preprocess_statements`	Tokenize statements, translate text roles (char level) to token roles and determine which tokens to score.
`process_extended_statements`	Process a stream of inputs in batches.
`score_statement_options`	Join the sets of statements, process each statement, and order the scores according to the inputs.
`score_statements`	Score individual texts (independent of the other options) using the Masked Language Model.

Attributes:

Name	Type	Description
`mask_token`	`int`	Return the mask token id used by the tokenizer.

mask_token `property` #

mask_token: int

Return the mask token id used by the tokenizer.

create_masked_requests #

create_masked_requests(batch: BatchEncoding) -> BatchEncoding

Extend the existing batch and mask the relevant tokens based on the scoring mask.

from_model `classmethod` #

from_model(model: Union[str, PreTrainedModel], model_type: Optional[str] = None, **kw) -> Self

Create an interface for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name	Type	Description	Default
`model str \| PreTrainedModel` #		The model to evaluate.	required
`model_type str \| None` #		The type of model (determines the scoring scheme to be used).	required

Returns:

Name	Type	Description
`HFPLLModelInterface`	`Self`	The evaluator instance suitable for the model.

preprocess_statements #

preprocess_statements(statements: list[str], *, text_roles: Optional[list[TextRoles]] = None) -> BatchEncoding

Tokenize statements, translate text roles (char level) to token roles and determine which tokens to score.

process_extended_statements #

process_extended_statements(large_batch: BatchEncoding, *, batch_size: Optional[int] = None) -> Iterable[tuple[int, ScoredToken]]

Process a stream of inputs in batches.

Each input statement typically requires multiple inputs to the model. Since the exact number cannot be determined prior to tokenization, statements need to be tokenized already (and then extended) before being processed by the model.

score_statement_options #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements #

score_statements(statements: Iterable[str], *, text_roles: Optional[Iterable[TextRoles]] = None, batch_size: Optional[int] = None, **kw) -> Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]

Score individual texts (independent of the other options) using the Masked Language Model.

Parameters:

Name	Type	Description	Default
`statements` #	`Iterable[str]`	The statements to score.	required
`text_roles` #	`Optional[Iterable[TextRoles]]`	Which parts of the statement are the answer, template, and subject.	`None`

Returns:

Type	Description
`Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]`	Scores (or scores and roles) per statement

Other Interfaces#

TyQModelInterface #

TyQModelInterface(*, model: PreTrainedModel, model_type: Optional[str] = None, model_name: Optional[str] = None, model_kw: Optional[dict[str, Any]] = None, tokenizer: Optional[PreTrainedTokenizerFast] = None, device: Union[device, int, str, None] = None, batch_size: int = 1)

Methods:

Name	Description
`from_model`	Create an interface for the given model.

Attributes:

Name	Type	Description
`mask_token`	`int`	Return the mask token id used by the tokenizer.

mask_token `property` #

mask_token: int

Return the mask token id used by the tokenizer.

from_model `classmethod` #

from_model(model: Union[str, PreTrainedModel], model_type: Optional[str] = None, **kw) -> Self

Create an interface for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name	Type	Description	Default
`model str \| PreTrainedModel` #		The model to evaluate.	required
`model_type str \| None` #		The type of model (determines the scoring scheme to be used).	required

Returns:

Name	Type	Description
`HFPLLModelInterface`	`Self`	The evaluator instance suitable for the model.

Model Interfaces#

Base Interface#

ModelInterface #

from_model `abstractmethod` `classmethod` #

score_statement_options `abstractmethod` #

PLLModelInterfaceMixin #

score_statement_options #

score_statements `abstractmethod` #

`statements` #

`text_roles` #

Sentence-Loglikelihood-based Interfaces#

CLMInterface #

encode #

from_model `classmethod` #

`model str | PreTrainedModel` #

`model_type str | None` #

score_statement_options #

score_statements #

`statements` #

`text_roles` #

MLMInterface #

mask_token `property` #

create_masked_requests #

from_model `classmethod` #

`model str | PreTrainedModel` #

`model_type str | None` #

preprocess_statements #

process_extended_statements #

score_statement_options #

score_statements #

`statements` #

`text_roles` #

Other Interfaces#

TyQModelInterface #

mask_token `property` #

from_model `classmethod` #

`model str | PreTrainedModel` #

`model_type str | None` #

Model Interfaces#

Base Interface#

ModelInterface #

from_model abstractmethod classmethod #

score_statement_options abstractmethod #

PLLModelInterfaceMixin #

score_statement_options #

score_statements abstractmethod #

statements #

text_roles #

Sentence-Loglikelihood-based Interfaces#

CLMInterface #

encode #

from_model classmethod #

model str | PreTrainedModel #

model_type str | None #

score_statement_options #

score_statements #

statements #

text_roles #

MLMInterface #

mask_token property #

create_masked_requests #

from_model classmethod #

model str | PreTrainedModel #

model_type str | None #

preprocess_statements #

process_extended_statements #

score_statement_options #

score_statements #

statements #

text_roles #

Other Interfaces#

TyQModelInterface #

mask_token property #

from_model classmethod #

model str | PreTrainedModel #

model_type str | None #

from_model `abstractmethod` `classmethod` #

score_statement_options `abstractmethod` #

score_statements `abstractmethod` #

`statements` #

`text_roles` #

from_model `classmethod` #

`model str | PreTrainedModel` #

`model_type str | None` #

`statements` #

`text_roles` #

mask_token `property` #

from_model `classmethod` #

`model str | PreTrainedModel` #

`model_type str | None` #

`statements` #

`text_roles` #

mask_token `property` #

from_model `classmethod` #

`model str | PreTrainedModel` #

`model_type str | None` #