Dataset Representation#

There are two classes which are used to represent a dataset: Relation and Dataset (which is essentially a container for a number of relations).

Relation #

Relation(
    relation_code: str,
    *,
    templates: list[str],
    answer_space: Optional[Series],
    instance_table: Optional[DataFrame],
    lazy_options: Optional[dict[str, Any]],
    relation_info: Optional[dict[str, Any]] = None
)

Represents a relation within a dataset, including its code, answer space, templates, and an instance table.

Methods:

Name	Description
`activated`	Return self or a copy of self with the instance_table loaded (lazy loading disabled).
`from_path`	Loads a relation from a JSONL file and associated metadata.
`relation_info`	Get or set additional relation information.
`save`	Save results to a file and export meta_data
`save_instance_table`	Save instance table with the format determined by the path suffix.
`search_path`	Search path for instance files.
`subsample`	Returns only a subsampled version of the dataset of the size n.

Attributes:

Name	Type	Description
`answer_space`	`Series`	The answer space of the relation.
`instance_table`	`DataFrame`	A `pandas.DataFrame` containing all items in the relation.
`relation_code`	`str`	The identifier of the relation.

answer_space `property` #

answer_space: Series

The answer space of the relation.

instance_table `property` #

instance_table: DataFrame

A pandas.DataFrame containing all items in the relation.

relation_code `property` #

relation_code: str

The identifier of the relation.

activated #

activated() -> Self

Return self or a copy of self with the instance_table loaded (lazy loading disabled).

from_path `classmethod` #

from_path(
    path: PathLike,
    *,
    relation_code: Optional[str] = None,
    lazy: bool = True,
    fmt: InstanceTableFileFormat = None
) -> Self

Loads a relation from a JSONL file and associated metadata.

Parameters:

Name	Type	Description	Default
`path` #	`PathLike`	The path to the dataset directory.	required
`relation_code` #	`str`	The specific code of the relation to load.	`None`
`lazy` #	`bool`	If False, the instance table is loaded directly into memory.	`True`

Returns:

Name	Type	Description
`Relation`	`Self`	An instance of the Relation class populated with data from the file.

Raises:

Type	Description
`Exception`	If there is an error in loading the file or processing the data.

relation_info #

relation_info(**kw) -> dict[str, Any]

relation_info(key: str) -> Any

relation_info(
    key: Optional[str] = None, /, **kw
) -> Union[None, Any, dict[str, Any]]

Get or set additional relation information.

Use relation.relation_info(<field name>=<new value>) to set fields in the relation info dictionary. If a single field is selected, the respective value is returned. Otherwise the complete dictionary is returned.

Parameters:

Name	Type	Description	Default
`key` #	`Optional[str]`	The field to retrieve.	`None`
`**kw` #		The fields not modify.	`{}`

Returns:

Type	Description
`Union[None, Any, dict[str, Any]]`	If a field is selected, the respective value is returned, otherwise, the complete info dictionary is
`Union[None, Any, dict[str, Any]]`	returned.

save #

save(
    path: PathLike, fmt: InstanceTableFileFormat = None
) -> Optional[Path]

Save results to a file and export meta_data

save_instance_table `classmethod` #

save_instance_table(
    instance_table: DataFrame,
    path: Path,
    fmt: InstanceTableFileFormat = None,
)

Save instance table with the format determined by the path suffix.

Parameters:

Name	Type	Description	Default
`instance_table` #	`DataFrame`	The instances to save.	required
`path` #	`Path`	Where to save the instance table. If format is not specified, the suffix is used to determined the format.	required
`fmt` #	`str`	Which to save the instances in.	`None`

search_path `classmethod` #

search_path(
    path: Path,
    relation_code: None = None,
    fmt: InstanceTableFileFormat = None,
) -> list[Path]

search_path(
    path: Path,
    relation_code: str,
    fmt: InstanceTableFileFormat = None,
) -> Path

search_path(
    path: Path,
    relation_code: Optional[str] = None,
    fmt: InstanceTableFileFormat = None,
) -> Union[list[Path], Path, None]

Search path for instance files.

subsample #

subsample(n: int = 10) -> DataFrame

Returns only a subsampled version of the dataset of the size n.

Parameters:

Name	Type	Description	Default
`n` #	`int`	Size of the subsampled dataset	`10`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Subsampled version of the dataset.

Dataset #

Dataset(
    relations: list[Relation],
    path: PathLike,
    name: Optional[str] = None,
)

A collection of relations forming a multiple choice dataset.

Usage

The prefferred way to load the BEAR knowledge probe is to load it by name:

from lm_pub_quiz import Dataset dataset = Dataset.from_name("BEAR")

Methods:

Name	Description
`from_name`	Loads a dataset from the cache (if available) or the url which is specified in the internal dataset table.
`from_path`	Loads a multiple choice dataset from a specified directory path.

from_name `classmethod` #

from_name(
    name: str,
    *,
    lazy: bool = True,
    base_path: Optional[Path] = None,
    chunk_size: int = 10 * 1024,
    relation_info: Optional[PathLike] = None,
    **kwargs
) -> Self

Loads a dataset from the cache (if available) or the url which is specified in the internal dataset table.

Parameters:

Name	Type	Description	Default
`name` #	`str`	The name of the dataset.	required
`lazy` #	`bool`	If False, the instance tables of all relations are directly loaded into memory.	`True`

Returns:

Name	Type	Description
`Dataset`	`Self`	An instance if Dataset loaded with the relations from the directory.

Raises:

Type	Description
`Exception`	If there is an error in loading the dataset.

Usage

Loading the BEAR-dataset.

>>> from lm_pub_quiz import Dataset
>>> dataset = Dataset.from_name("BEAR")

from_path `classmethod` #

from_path(
    path: PathLike,
    *,
    lazy: bool = True,
    fmt: InstanceTableFileFormat = None,
    relation_info: Optional[PathLike] = None,
    **kwargs
) -> Self

Loads a multiple choice dataset from a specified directory path.

This method scans the directory for relation files and assembles them into a MultipleChoiceDataset.

Parameters:

Name	Type	Description	Default
`path` #	`str`	The directory path where the dataset is stored.	required
`lazy` #	`bool`	If False, the instance tables of all relations are directly loaded into memory.	`True`

Returns:

Name	Type	Description
`Dataset`	`Self`	An instance if Dataset loaded with the relations from the directory.

Raises:

Type	Description
`Exception`	If there is an error in loading the dataset.

Usage

Loading the BEAR-dataset.

>>> from lm_pub_quiz import Dataset
>>> dataset = Dataset.from_path("/path/to/dataset/BEAR")

Dataset Representation#

Relation #

answer_space `property` #

instance_table `property` #

relation_code `property` #

activated #

from_path `classmethod` #

`path` #

`relation_code` #

`lazy` #

relation_info #

`key` #

`**kw` #

save #

save_instance_table `classmethod` #

`instance_table` #

`path` #

`fmt` #

search_path `classmethod` #

subsample #

`n` #

Dataset #

from_name `classmethod` #

`name` #

`lazy` #

from_path `classmethod` #

`path` #

`lazy` #

Dataset Representation#

Relation #

answer_space property #

instance_table property #

relation_code property #

activated #

from_path classmethod #

path #

relation_code #

lazy #

relation_info #

key #

**kw #

save #

save_instance_table classmethod #

instance_table #

path #

fmt #

search_path classmethod #

subsample #

n #

Dataset #

from_name classmethod #

name #

lazy #

from_path classmethod #

path #

lazy #

answer_space `property` #

instance_table `property` #

relation_code `property` #

from_path `classmethod` #

`path` #

`relation_code` #

`lazy` #

`key` #

`**kw` #

save_instance_table `classmethod` #

`instance_table` #

`path` #

`fmt` #

search_path `classmethod` #

`n` #

from_name `classmethod` #

`name` #

`lazy` #

from_path `classmethod` #

`path` #

`lazy` #