torchref.io.cif_readers module

4 CIF readers for different data types in crystallographic refinement.

This module provides 4 main classes: - CIFReader: Base class for reading CIF/mmCIF files - ReflectionCIFReader: For reading structure factor data (reflection data) - ModelCIFReader: For reading atomic coordinate data (model structures) - RestraintCIFReader: For reading chemical restraint dictionaries

Space groups are returned as gemmi.SpaceGroup objects for consistency throughout torchref.

Specialized classes are typesave and should handle most edge cases in CIF files.

class torchref.io.cif_readers.CIFReader(filepath=None, data_block=None, parse_all_blocks=False)[source]

Bases: object

A dictionary-like reader for CIF/mmCIF files.

Loops are stored as pandas DataFrames. Other data is stored in a hierarchical dictionary structure.

Parameters:
  • filepath (str, optional) – Path to CIF file to load immediately.

  • data_block (str, optional) – Specific data block name to read (e.g., ‘r1vlmsf’). If None and parse_all_blocks=False, reads the first data block. If None and parse_all_blocks=True, reads all data blocks.

  • parse_all_blocks (bool, default False) – If True, parse all data blocks and merge them into a single dictionary (useful for restraint files). If False, parse only the specified block or the first block.

data

Dictionary storing parsed CIF data.

Type:

dict

filepath

Path to the loaded CIF file.

Type:

Path or None

available_blocks

List of data block names found in the file.

Type:

list

__init__(filepath=None, data_block=None, parse_all_blocks=False)[source]

Initialize CIF reader.

Parameters:
  • filepath (str, optional) – Path to CIF file to load immediately.

  • data_block (str, optional) – Specific data block name to read.

  • parse_all_blocks (bool, default False) – If True, parse all data blocks and merge.

classmethod from_string(content, **kwargs)[source]

Create CIFReader from string content instead of a file.

load(filepath)[source]

Load and parse a CIF file.

Parameters:

filepath (str) – Path to CIF file.

write(filepath)[source]

Write the CIF data back to a file.

Parameters:

filepath (str) – Output file path.

__getitem__(key)[source]

Get item by key.

__setitem__(key, value)[source]

Set item by key.

__contains__(key)[source]

Check if key exists.

__len__()[source]

Return number of top-level categories.

keys()[source]

Return dictionary keys.

values()[source]

Return dictionary values.

items()[source]

Return dictionary items.

get(key, default=None)[source]

Get item with default value.

__repr__()[source]

String representation.

summary()[source]

Print a summary of the CIF contents.

class torchref.io.cif_readers.ReflectionCIFReader(filepath, verbose=0, data_block=None)[source]

Bases: object

Reader for structure factor CIF files (e.g., *-sf.cif from PDB).

Handles extraction of: - Miller indices (h, k, l) - Structure factor amplitudes (F) and uncertainties (σF) - Intensities (I) and uncertainties (σI) - Phases and figures of merit - R-free flags - Unit cell and space group metadata

Compatible with legacy MTZ reader interface:

reader = ReflectionCIFReader(‘7JI4-sf.cif’).read() data_dict, spacegroup, cell = reader()

Example:

reader = ReflectionCIFReader(‘7JI4-sf.cif’) refln_data = reader.get_reflection_data() h, k, l = refln_data[‘h’], refln_data[‘k’], refln_data[‘l’] F_obs = refln_data[‘F_obs’]

__init__(filepath, verbose=0, data_block=None)[source]

Initialize and load structure factor CIF file.

Parameters:
  • filepath (str) – Path to structure factor CIF file.

  • verbose (int, default 0) – Verbosity level (0=silent, 1=info, 2=debug).

  • data_block (str, optional) – Specific data block name to read (e.g., ‘r1vlmsf’). If None, reads the first data block. Useful for files with multiple datasets.

read(filepath=None)[source]

Read a CIF file (for compatibility with legacy interface).

Args:

filepath: Path to CIF file (optional, uses initialization path if not provided)

Returns:

self for method chaining

__call__()[source]

Get data in legacy MTZ-compatible format.

Returns:

  • data (dict) – Dictionary with extracted data arrays: - ‘h’, ‘k’, ‘l’: Miller indices - ‘F’, ‘SIGF’: Amplitudes and sigmas (if available) - ‘I’, ‘SIGI’: Intensities and sigmas (if available) - ‘R-free-flags’: R-free test set flags (if available)

  • cell (numpy.ndarray) – Cell parameters [a, b, c, alpha, beta, gamma].

  • spacegroup (gemmi.SpaceGroup) – Space group object.

Return type:

Tuple[Dict[str, ndarray], ndarray, SpaceGroup]

get_reflection_data()[source]

Extract reflection data with standardized column names.

Returns:

DataFrame with columns: - h, k, l: Miller indices - F_obs, sigma_F_obs: Observed amplitudes (if available) - I_obs, sigma_I_obs: Observed intensities (if available) - phase, fom: Phase and figure of merit (if available) - free_flag: R-free flags (if available)

Return type:

pandas.DataFrame

Notes

Missing columns will be filled with NaN or appropriate defaults.

has_miller_indices()[source]

Check if file contains Miller indices.

has_amplitudes()[source]

Check if file contains structure factor amplitudes.

has_intensities()[source]

Check if file contains intensity measurements.

has_phases()[source]

Check if file contains phase information.

has_rfree_flags()[source]

Check if file contains R-free flags.

get_miller_indices()[source]

Get Miller indices as Nx3 array.

Returns:

Array of shape (N, 3) with h, k, l indices

get_amplitudes()[source]

Get structure factor amplitudes and uncertainties.

Returns:

Dict with keys ‘F’ and ‘sigma_F’, or None if not available

get_intensities()[source]

Get intensities and uncertainties.

Returns:

Dict with keys ‘I’ and ‘sigma_I’, or None if not available

get_cell_parameters()[source]

Extract unit cell parameters [a, b, c, alpha, beta, gamma].

Returns:

List of 6 floats, or None if not found

get_space_group()[source]

Extract space group name.

Returns:

Space group name string. Returns “P 1” if not found.

Return type:

str

class torchref.io.cif_readers.ModelCIFReader(filepath, verbose=0)[source]

Bases: object

Reader for model/structure CIF files (e.g., *.cif from PDB).

Handles extraction of: - Atomic coordinates and properties - Alternative conformations - Anisotropic displacement parameters - Unit cell and space group

Compatible with legacy PDB reader interface:

reader = ModelCIFReader(‘3E98.cif’).read() dataframe, cell, spacegroup = reader()

Example:

reader = ModelCIFReader(‘3E98.cif’) atom_df = reader.get_atom_data() cell = reader.get_cell_parameters()

__init__(filepath, verbose=0)[source]

Initialize and load model CIF file.

Parameters:
  • filepath (str) – Path to model CIF file.

  • verbose (int, default 0) – Verbosity level (0=silent, 1=info, 2=debug).

read(filepath=None)[source]

Read a CIF file (for compatibility with legacy interface).

Parameters:

filepath (str, optional) – Path to CIF file. Uses initialization path if not provided.

Returns:

Self for method chaining.

Return type:

ModelCIFReader

__call__()[source]

Get data in legacy PDB-compatible format.

Returns:

  • dataframe (pandas.DataFrame) – Atom data with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge, anisou_flag, u11, u22, u33, u12, u13, u23.

  • cell (list) – Cell parameters [a, b, c, alpha, beta, gamma].

  • spacegroup (gemmi.SpaceGroup) – Space group object.

Return type:

Tuple[DataFrame, List[float], SpaceGroup]

get_atom_data()[source]

Extract atomic coordinate data in PDB-compatible format.

Returns:

DataFrame with columns matching PDB format: - ATOM, serial, name, altloc, resname, chainid, resseq, icode - x, y, z, occupancy, tempfactor - element, charge - anisou_flag, u11, u22, u33, u12, u13, u23

Return type:

pandas.DataFrame

get_atom_data_by_model()[source]

Split atom data by pdbx_PDB_model_num.

For single-model files, returns {1: dataframe}. For multi-model files, returns one DataFrame per model number.

Returns:

Mapping of model number to atom DataFrame.

Return type:

dict of int -> pandas.DataFrame

get_cell_parameters()[source]

Extract unit cell parameters [a, b, c, alpha, beta, gamma].

get_space_group()[source]

Extract space group name.

Returns:

Space group name string. Returns “P 1” if not found.

Return type:

str

has_coordinates()[source]

Check if atomic coordinates are available.

has_cell_parameters()[source]

Check if unit cell parameters are available.

has_space_group()[source]

Check if space group information is available.

has_occupancy()[source]

Check if occupancy data is available.

has_bfactor()[source]

Check if B-factor/temperature factor data is available.

has_anisotropic_data()[source]

Check if anisotropic displacement parameters are available.

get_coordinates()[source]

Extract atomic coordinates as numpy array.

Returns:

Nx3 array of [x, y, z] coordinates, or None if not available.

Return type:

numpy.ndarray or None

get_atom_info()[source]

Extract atom information (without coordinates).

Returns:

DataFrame with atom names, residue info, elements, etc.

Return type:

pandas.DataFrame

class torchref.io.cif_readers.RestraintCIFReader(filepath)[source]

Bases: object

Reader for chemical restraint dictionary CIF files (e.g., from monomer library).

Handles extraction of: - Bond restraints (ideal lengths and ESDs) - Angle restraints - Torsion/dihedral restraints - Planarity restraints - Chirality definitions

Validates that the file contains proper restraint parameters (not just structure definitions).

Example:

reader = RestraintCIFReader(‘external_monomer_library/a/ALA.cif’) comp_data = reader.get_all_restraints() bond_df = comp_data[‘ALA’][‘bonds’]

__init__(filepath)[source]

Initialize and load restraint CIF file.

Parameters:

filepath (str) – Path to restraint dictionary CIF file.

get_all_restraints()[source]

Extract all restraint data for all compounds with standardized column names.

Returns:

Dictionary mapping compound ID to dict of restraint types:

{
    'ALA': {
        'bonds': DataFrame(atom1, atom2, value, sigma),
        'angles': DataFrame(atom1, atom2, atom3, value, sigma),
        'torsions': DataFrame(atom1, atom2, atom3, atom4, value, sigma, periodicity),
        'planes': DataFrame(atom, plane_id),
        'chirals': DataFrame(atom_centre, atom1, atom2, atom3, volume_sign)
    },
    ...
}

Return type:

dict

get_compound_restraints(comp_id)[source]

Extract restraints for a specific compound with standardized column names.

Parameters:

comp_id (str) – Compound identifier (e.g., ‘ALA’).

Returns:

Dictionary of restraint DataFrames with standardized columns:

{
    'bonds': DataFrame(atom1, atom2, value, sigma)
    'angles': DataFrame(atom1, atom2, atom3, value, sigma)
    'torsions': DataFrame(atom1, atom2, atom3, atom4, value, sigma, periodicity)
    'planes': DataFrame(atom, plane_id)
    'chirals': DataFrame(atom_centre, atom1, atom2, atom3, volume_sign)
    'atoms': DataFrame(atom_id, type_symbol, charge, etc.)
}

Return type:

dict

get_bond_restraints(comp_id)[source]

Get bond restraints with standardized column names.

Returns:
DataFrame with columns:
  • atom1, atom2: Atom names

  • value: Ideal bond length (Å)

  • sigma: Estimated standard deviation (Å)

get_compound_id()[source]

Get the primary compound ID from this file.

has_bond_restraints()[source]

Check if bond restraints are available.

has_angle_restraints()[source]

Check if angle restraints are available.

has_torsion_restraints()[source]

Check if torsion restraints are available.

has_plane_restraints()[source]

Check if plane restraints are available.

has_chirality_restraints()[source]

Check if chirality definitions are available.