Skip to content

Model Interfaces#

A ModelInterface implements the interaction with the model: Taking a set of statements, it needs to score each of the options. Depending on the type of the model (causal vs. masked language models), different ModelInterfaces are required.

Base Interface#

ModelInterface #

Shared interface for methods that process each answer separately (PLL scoring) or sets of answers (like TYQ).

Methods:

Name Description
from_model

Create a ModelInterface from the given parameters.

score_statement_options

Score sets of text options.

from_model abstractmethod classmethod #

from_model(model: Any, **kw) -> ModelInterface

Create a ModelInterface from the given parameters.

Load the model, tokenizer, etc. and instantiate a ModelInterface.

score_statement_options abstractmethod #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemTokenScoresAndRoles], Iterable[ItemScores]]

Score sets of text options.

The ModelInterface itself is resonsible for batching the requests.

The PLLModelInterfaceMixin can be used to join the statement sets into one iterable, requiring the interface to just score individual texts.

PLLModelInterfaceMixin #

Interface for methods that assign a score per statement.

Methods:

Name Description
score_statement_options

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements

Score individual texts (independent of the other options) using the Casual/Masked Language Model.

score_statement_options #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements abstractmethod #

score_statements(statements: Iterable[str], *, text_roles: Optional[Iterable[TextRoles]] = None, **kw) -> Union[Iterable[StatementScore], Iterable[TokenScoresAndRoles]]

Score individual texts (independent of the other options) using the Casual/Masked Language Model.

Parameters:

Name Type Description Default

statements #

Iterable[str]

The statements to score.

required

text_roles #

Optional[Iterable[TextRoles]]

Which parts of the statement are the answer, template, and subject.

None

Returns:

Type Description
Union[Iterable[StatementScore], Iterable[TokenScoresAndRoles]]

Scores (or scores and roles) per statement

Sentence-Loglikelihood-based Interfaces#

The following interfaces implement (pseudo) loglikelihood scoring for the text options.

CLMInterface #

CLMInterface(*args, ensure_bos_token_added: bool = True, conditional_score: bool = False, **kw)

Methods:

Name Description
encode

Encode the statements using the tokenizer and create an appropriate scoring mask.

from_model

Create an interface for the given model.

score_statement_options

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements

Score individual texts (independent of the other options) using the Casual Language Model.

encode #

encode(statements: Sequence[str], roles: Optional[Sequence[TextRoles]]) -> tuple[BatchEncoding, Sequence[ScoringMask]]

Encode the statements using the tokenizer and create an appropriate scoring mask.

In case the conditional scores need to be created, set the scoring mask accordingly.

from_model classmethod #

from_model(model: Union[str, PreTrainedModel], model_type: Optional[str] = None, **kw) -> Self

Create an interface for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name Type Description Default

model str | PreTrainedModel #

The model to evaluate.

required

model_type str | None #

The type of model (determines the scoring scheme to be used).

required

Returns:

Name Type Description
HFPLLModelInterface Self

The evaluator instance suitable for the model.

score_statement_options #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements #

score_statements(statements: Iterable[str], *, text_roles: Optional[Iterable[TextRoles]] = None, batch_size: Optional[int] = None, **kw) -> Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]

Score individual texts (independent of the other options) using the Casual Language Model.

Parameters:

Name Type Description Default

statements #

Iterable[str]

The statements to score.

required

text_roles #

Optional[Iterable[TextRoles]]

Which parts of the statement are the answer, template, and subject.

None

Returns:

Type Description
Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]

Scores (or scores and roles) per statement

MLMInterface #

MLMInterface(*args, pll_metric: str = 'within_word_l2r', conditional_score: bool = False, preprocessing_batch_size: int = 1000, **kw)

Methods:

Name Description
create_masked_requests

Extend the existing batch and mask the relevant tokens based on the scoring mask.

from_model

Create an interface for the given model.

preprocess_statements

Tokenize statements, translate text roles (char level) to token roles and determine which tokens to score.

process_extended_statements

Process a stream of inputs in batches.

score_statement_options

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements

Score individual texts (independent of the other options) using the Masked Language Model.

Attributes:

Name Type Description
mask_token int

Return the mask token id used by the tokenizer.

mask_token property #

mask_token: int

Return the mask token id used by the tokenizer.

create_masked_requests #

create_masked_requests(batch: BatchEncoding) -> BatchEncoding

Extend the existing batch and mask the relevant tokens based on the scoring mask.

from_model classmethod #

from_model(model: Union[str, PreTrainedModel], model_type: Optional[str] = None, **kw) -> Self

Create an interface for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name Type Description Default

model str | PreTrainedModel #

The model to evaluate.

required

model_type str | None #

The type of model (determines the scoring scheme to be used).

required

Returns:

Name Type Description
HFPLLModelInterface Self

The evaluator instance suitable for the model.

preprocess_statements #

preprocess_statements(statements: list[str], *, text_roles: Optional[list[TextRoles]] = None) -> BatchEncoding

Tokenize statements, translate text roles (char level) to token roles and determine which tokens to score.

process_extended_statements #

process_extended_statements(large_batch: BatchEncoding, *, batch_size: Optional[int] = None) -> Iterable[tuple[int, ScoredToken]]

Process a stream of inputs in batches.

Each input statement typically requires multiple inputs to the model. Since the exact number cannot be determined prior to tokenization, statements need to be tokenized already (and then extended) before being processed by the model.

score_statement_options #

score_statement_options(statement_options: Iterable[Sequence[str]], *, text_roles: Optional[Iterable[Sequence[TextRoles]]] = None, **kw) -> Union[Iterable[ItemScores], Iterable[ItemTokenScoresAndRoles]]

Join the sets of statements, process each statement, and order the scores according to the inputs.

score_statements #

score_statements(statements: Iterable[str], *, text_roles: Optional[Iterable[TextRoles]] = None, batch_size: Optional[int] = None, **kw) -> Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]

Score individual texts (independent of the other options) using the Masked Language Model.

Parameters:

Name Type Description Default

statements #

Iterable[str]

The statements to score.

required

text_roles #

Optional[Iterable[TextRoles]]

Which parts of the statement are the answer, template, and subject.

None

Returns:

Type Description
Union[Iterable[TokenScoresAndRoles], Iterable[StatementScore]]

Scores (or scores and roles) per statement

Other Interfaces#

TyQModelInterface #

TyQModelInterface(*, model: PreTrainedModel, model_type: Optional[str] = None, model_name: Optional[str] = None, model_kw: Optional[dict[str, Any]] = None, tokenizer: Optional[PreTrainedTokenizerFast] = None, device: Union[device, int, str, None] = None, batch_size: int = 1)

Methods:

Name Description
from_model

Create an interface for the given model.

Attributes:

Name Type Description
mask_token int

Return the mask token id used by the tokenizer.

mask_token property #

mask_token: int

Return the mask token id used by the tokenizer.

from_model classmethod #

from_model(model: Union[str, PreTrainedModel], model_type: Optional[str] = None, **kw) -> Self

Create an interface for the given model.

In some cases, the model type can be derived from the model itself. To ensure the right type is chosen, it's recommended to set model_type manually.

Parameters:

Name Type Description Default

model str | PreTrainedModel #

The model to evaluate.

required

model_type str | None #

The type of model (determines the scoring scheme to be used).

required

Returns:

Name Type Description
HFPLLModelInterface Self

The evaluator instance suitable for the model.