torchref.io.datasets.collection module
Dataset collection for handling multiple crystallographic datasets.
This module provides the DatasetCollection class for managing multiple related ReflectionData objects, useful for joint refinement, MAD phasing, and time-series crystallography.
- class torchref.io.datasets.collection.DatasetCollection(hkl=None, F=None, F_sigma=None, I=None, I_sigma=None, rfree_flags=None, resolution=None, bin_indices=None, outlier_flags=None, phase=None, fom=None, _centric_flags=None, E=None, E_squared=None, F_squared_corrected=None, U_aniso=None, radial_shell_indices=None, cell=None, spacegroup=None, device=<factory>, verbose=1, rfree_source=None, amplitude_source=None, intensity_source=None, phase_source=None, wilson_b=None, wilson_b_structure=None, wilson_b_solvent=None, wilson_k_sol=None, outlier_detection_params=None, _datasets=<factory>, _dataset_order=<factory>, _reference_dataset=None, _common_hkl=None, _cell=None, _spacegroup=None, _resolution=None, _scale_factors=<factory>)[source]
Bases:
CrystalDatasetContainer for multiple related crystal datasets.
All datasets share a common HKL set for efficient computation. Datasets are aligned using the first dataset as a reference, with missing reflections in subsequent datasets masked out.
- Parameters:
- hkl
Common HKL set for all datasets.
- Type:
Examples
from torchref.io import DatasetCollection, ReflectionData collection = DatasetCollection(device='cuda') native = ReflectionData().load_mtz('native.mtz') derivative = ReflectionData().load_mtz('derivative.mtz') collection.add_dataset('native', native, set_as_reference=True) collection.add_dataset('derivative', derivative) for name, dataset in collection: print(f"{name}: {len(dataset)} reflections") # Access by name native_F = collection['native'].F
- add_dataset(name, dataset, set_as_reference=False)[source]
Add a dataset to the collection.
- Parameters:
name (str) – Identifier for this dataset.
dataset (ReflectionData) – The dataset to add.
set_as_reference (bool, optional) – If True, use this dataset’s HKL as the reference. Default is False, but the first dataset added automatically becomes the reference.
- Returns:
Self, for method chaining.
- Return type:
- Raises:
ValueError – If a dataset with the same name already exists.
Examples
collection = DatasetCollection() collection.add_dataset('native', native_data, set_as_reference=True) collection.add_dataset('derivative', derivative_data)
- property datasets: Dict[str, ReflectionData]
Access all datasets as a dictionary.
- __getitem__(name)[source]
Get dataset by name.
- __iter__()[source]
Iterate over (name, dataset) pairs in order of addition.
- Yields:
tuple of (str, ReflectionData) – Name and dataset for each dataset in collection.
- scale()[source]
Scale all datasets to a common reference scale. This method optimizes the scaling parameters of all non-reference datasets to minimize the mean squared error between their structure factors and those of the reference dataset. The optimization corrects for both overall scale differences and anisotropy. The method uses the L-BFGS optimizer with strong Wolfe line search to iteratively refine the scaling parameters over multiple optimization steps.
The collection instance, allowing for method chaining.
- Raises:
ValueError – If no reference dataset has been set prior to calling this method or only a reference dataset exists. Make sure to have at least 2 datasets duh…
Notes
The reference dataset must be set before calling this method using the appropriate setter. All datasets except the reference will have their scaling parameters optimized. “”” Scale all datasets to the same overall scale. Corrects overall scale and anisotropy based on the reference dataset.
- Returns:
for method chaining.
- Return type:
self
- __init__(hkl=None, F=None, F_sigma=None, I=None, I_sigma=None, rfree_flags=None, resolution=None, bin_indices=None, outlier_flags=None, phase=None, fom=None, _centric_flags=None, E=None, E_squared=None, F_squared_corrected=None, U_aniso=None, radial_shell_indices=None, cell=None, spacegroup=None, device=<factory>, verbose=1, rfree_source=None, amplitude_source=None, intensity_source=None, phase_source=None, wilson_b=None, wilson_b_structure=None, wilson_b_solvent=None, wilson_k_sol=None, outlier_detection_params=None, _datasets=<factory>, _dataset_order=<factory>, _reference_dataset=None, _common_hkl=None, _cell=None, _spacegroup=None, _resolution=None, _scale_factors=<factory>)