torchref.io.cif_readers module
4 CIF readers for different data types in crystallographic refinement.
This module provides 4 main classes: - CIFReader: Base class for reading CIF/mmCIF files - ReflectionCIFReader: For reading structure factor data (reflection data) - ModelCIFReader: For reading atomic coordinate data (model structures) - RestraintCIFReader: For reading chemical restraint dictionaries
Space groups are returned as gemmi.SpaceGroup objects for consistency throughout torchref.
Specialized classes are typesave and should handle most edge cases in CIF files.
- class torchref.io.cif_readers.CIFReader(filepath=None, data_block=None, parse_all_blocks=False)[source]
Bases:
objectA dictionary-like reader for CIF/mmCIF files.
Loops are stored as pandas DataFrames. Other data is stored in a hierarchical dictionary structure.
- Parameters:
filepath (str, optional) – Path to CIF file to load immediately.
data_block (str, optional) – Specific data block name to read (e.g., ‘r1vlmsf’). If None and parse_all_blocks=False, reads the first data block. If None and parse_all_blocks=True, reads all data blocks.
parse_all_blocks (bool, default False) – If True, parse all data blocks and merge them into a single dictionary (useful for restraint files). If False, parse only the specified block or the first block.
- filepath
Path to the loaded CIF file.
- Type:
Path or None
- classmethod from_string(content, **kwargs)[source]
Create CIFReader from string content instead of a file.
- class torchref.io.cif_readers.ReflectionCIFReader(filepath, verbose=0, data_block=None)[source]
Bases:
objectReader for structure factor CIF files (e.g., *-sf.cif from PDB).
Handles extraction of: - Miller indices (h, k, l) - Structure factor amplitudes (F) and uncertainties (σF) - Intensities (I) and uncertainties (σI) - Phases and figures of merit - R-free flags - Unit cell and space group metadata
- Compatible with legacy MTZ reader interface:
reader = ReflectionCIFReader(‘7JI4-sf.cif’).read() data_dict, spacegroup, cell = reader()
- Example:
reader = ReflectionCIFReader(‘7JI4-sf.cif’) refln_data = reader.get_reflection_data() h, k, l = refln_data[‘h’], refln_data[‘k’], refln_data[‘l’] F_obs = refln_data[‘F_obs’]
- __init__(filepath, verbose=0, data_block=None)[source]
Initialize and load structure factor CIF file.
- read(filepath=None)[source]
Read a CIF file (for compatibility with legacy interface).
- Args:
filepath: Path to CIF file (optional, uses initialization path if not provided)
- Returns:
self for method chaining
- __call__()[source]
Get data in legacy MTZ-compatible format.
- Returns:
data (dict) – Dictionary with extracted data arrays: - ‘h’, ‘k’, ‘l’: Miller indices - ‘F’, ‘SIGF’: Amplitudes and sigmas (if available) - ‘I’, ‘SIGI’: Intensities and sigmas (if available) - ‘R-free-flags’: R-free test set flags (if available)
cell (numpy.ndarray) – Cell parameters [a, b, c, alpha, beta, gamma].
spacegroup (gemmi.SpaceGroup) – Space group object.
- Return type:
- get_reflection_data()[source]
Extract reflection data with standardized column names.
- Returns:
DataFrame with columns: - h, k, l: Miller indices - F_obs, sigma_F_obs: Observed amplitudes (if available) - I_obs, sigma_I_obs: Observed intensities (if available) - phase, fom: Phase and figure of merit (if available) - free_flag: R-free flags (if available)
- Return type:
Notes
Missing columns will be filled with NaN or appropriate defaults.
- get_miller_indices()[source]
Get Miller indices as Nx3 array.
- Returns:
Array of shape (N, 3) with h, k, l indices
- get_amplitudes()[source]
Get structure factor amplitudes and uncertainties.
- Returns:
Dict with keys ‘F’ and ‘sigma_F’, or None if not available
- get_intensities()[source]
Get intensities and uncertainties.
- Returns:
Dict with keys ‘I’ and ‘sigma_I’, or None if not available
- class torchref.io.cif_readers.ModelCIFReader(filepath, verbose=0)[source]
Bases:
objectReader for model/structure CIF files (e.g., *.cif from PDB).
Handles extraction of: - Atomic coordinates and properties - Alternative conformations - Anisotropic displacement parameters - Unit cell and space group
- Compatible with legacy PDB reader interface:
reader = ModelCIFReader(‘3E98.cif’).read() dataframe, cell, spacegroup = reader()
- Example:
reader = ModelCIFReader(‘3E98.cif’) atom_df = reader.get_atom_data() cell = reader.get_cell_parameters()
- read(filepath=None)[source]
Read a CIF file (for compatibility with legacy interface).
- Parameters:
filepath (str, optional) – Path to CIF file. Uses initialization path if not provided.
- Returns:
Self for method chaining.
- Return type:
- __call__()[source]
Get data in legacy PDB-compatible format.
- Returns:
dataframe (pandas.DataFrame) – Atom data with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge, anisou_flag, u11, u22, u33, u12, u13, u23.
cell (list) – Cell parameters [a, b, c, alpha, beta, gamma].
spacegroup (gemmi.SpaceGroup) – Space group object.
- Return type:
- get_atom_data()[source]
Extract atomic coordinate data in PDB-compatible format.
- Returns:
DataFrame with columns matching PDB format: - ATOM, serial, name, altloc, resname, chainid, resseq, icode - x, y, z, occupancy, tempfactor - element, charge - anisou_flag, u11, u22, u33, u12, u13, u23
- Return type:
- get_atom_data_by_model()[source]
Split atom data by
pdbx_PDB_model_num.For single-model files, returns
{1: dataframe}. For multi-model files, returns one DataFrame per model number.- Returns:
Mapping of model number to atom DataFrame.
- Return type:
dict of int -> pandas.DataFrame
- get_space_group()[source]
Extract space group name.
- Returns:
Space group name string. Returns “P 1” if not found.
- Return type:
- get_coordinates()[source]
Extract atomic coordinates as numpy array.
- Returns:
Nx3 array of [x, y, z] coordinates, or None if not available.
- Return type:
numpy.ndarray or None
- class torchref.io.cif_readers.RestraintCIFReader(filepath)[source]
Bases:
objectReader for chemical restraint dictionary CIF files (e.g., from monomer library).
Handles extraction of: - Bond restraints (ideal lengths and ESDs) - Angle restraints - Torsion/dihedral restraints - Planarity restraints - Chirality definitions
Validates that the file contains proper restraint parameters (not just structure definitions).
- Example:
reader = RestraintCIFReader(‘external_monomer_library/a/ALA.cif’) comp_data = reader.get_all_restraints() bond_df = comp_data[‘ALA’][‘bonds’]
- __init__(filepath)[source]
Initialize and load restraint CIF file.
- Parameters:
filepath (str) – Path to restraint dictionary CIF file.
- get_all_restraints()[source]
Extract all restraint data for all compounds with standardized column names.
- Returns:
Dictionary mapping compound ID to dict of restraint types:
{ 'ALA': { 'bonds': DataFrame(atom1, atom2, value, sigma), 'angles': DataFrame(atom1, atom2, atom3, value, sigma), 'torsions': DataFrame(atom1, atom2, atom3, atom4, value, sigma, periodicity), 'planes': DataFrame(atom, plane_id), 'chirals': DataFrame(atom_centre, atom1, atom2, atom3, volume_sign) }, ... }
- Return type:
- get_compound_restraints(comp_id)[source]
Extract restraints for a specific compound with standardized column names.
- Parameters:
comp_id (str) – Compound identifier (e.g., ‘ALA’).
- Returns:
Dictionary of restraint DataFrames with standardized columns:
{ 'bonds': DataFrame(atom1, atom2, value, sigma) 'angles': DataFrame(atom1, atom2, atom3, value, sigma) 'torsions': DataFrame(atom1, atom2, atom3, atom4, value, sigma, periodicity) 'planes': DataFrame(atom, plane_id) 'chirals': DataFrame(atom_centre, atom1, atom2, atom3, volume_sign) 'atoms': DataFrame(atom_id, type_symbol, charge, etc.) }
- Return type: