torchref.model.segmented_internal_coordinates module

Segmented internal coordinate parametrization for atomic structures.

This module provides the SegmentedInternalCoordinateTensor class which addresses the “lever arm problem” in internal coordinate parametrization by breaking the molecular chain into independent segments, each with its own rigid body parameters.

Key features: - Segments the molecule into groups of N amino acids (default: 3 per segment) - Each segment has independent internal coordinates (bonds, angles, torsions) - Each segment has rigid body parameters (position + orientation) - Shallow spanning trees within segments (depth ~15-30 instead of ~1000) - Changes in one segment don’t propagate to distant segments - Fully differentiable reconstruction from internal coordinates - Parallelized construction for fast initialization - Fused ring systems (indole in TRP, purines, etc.) treated as single rigid groups

This approach solves the lever arm problem where small torsion changes near the root of a deep tree cause large displacements at distant atoms.

class torchref.model.segmented_internal_coordinates.SegmentedInternalCoordinateTensor(initial_xyz, pdb, n_aa_per_segment=3, bond_cutoff=2.0, cif_dict=None, requires_grad=True, dtype=None, device=None)[source]

Bases: DeviceMixin, CachedForwardMixin, Module

Parameter wrapper using segmented internal coordinates.

Stores: per-segment bond_lengths, angles, torsions, segment_positions, segment_orientations Reconstructs: Cartesian xyz on forward()

This provides a physically meaningful parametrization that avoids the lever arm problem by breaking the molecule into independent segments, each with shallow spanning trees and rigid body parameters.

Parameters:

initial_xyz (torch.Tensor) – Initial Cartesian coordinates of shape (N, 3).
pdb (pd.DataFrame) – PDB DataFrame with columns ‘chainid’, ‘resseq’, ‘name’, ‘index’.
n_aa_per_segment (int, optional) – Number of amino acids per segment. Default is 3.
bond_cutoff (float, optional) – Distance cutoff for bond detection in Angstroms. Default is 2.0.
requires_grad (bool, optional) – Whether parameters should have gradients. Default is True.
dtype (torch.dtype, optional) – Data type for tensors. Default is same as initial_xyz.
device (torch.device, optional) – Device for tensors. Default is same as initial_xyz.

n_atoms

Number of atoms.

Type:: int

n_segments

Number of segments.

Type:: int

max_depth

Maximum depth in any segment’s spanning tree.

Type:: int

bond_lengths

Bond length parameters in Angstroms.

Type:: nn.Parameter

angles

Angle parameters in radians.

Type:: nn.Parameter

torsions

Torsion angle parameters in radians.

Type:: nn.Parameter

segment_positions

Absolute positions of segment root atoms.

Type:: nn.Parameter

segment_orientations

ZYZ Euler angle orientations for each segment.

Type:: nn.Parameter

AA_NAMES = frozenset({'ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 'LEU', 'LYS', 'MET', 'MSE', 'PHE', 'PRO', 'SEC', 'SER', 'THR', 'TRP', 'TYR', 'VAL'})

__init__(initial_xyz, pdb, n_aa_per_segment=3, bond_cutoff=2.0, cif_dict=None, requires_grad=True, dtype=None, device=None)[source]

Initialize SegmentedInternalCoordinateTensor.

Parameters:

initial_xyz (torch.Tensor) – Initial Cartesian coordinates of shape (N, 3).
pdb (pd.DataFrame) – PDB DataFrame with columns ‘chainid’, ‘resseq’, ‘name’, ‘index’, ‘resname’.
n_aa_per_segment (int, optional) – Number of amino acids per segment. Default is 3.
bond_cutoff (float, optional) – Distance cutoff for bond detection in Angstroms (used as fallback). Default is 2.0.
cif_dict (dict, optional) – CIF dictionary containing bond definitions per residue type. If provided, bonds are determined from chemical definitions rather than distances, which is more robust for structures with poor geometry. Expected format: cif_dict[resname][‘bonds’] is a DataFrame with ‘atom1’ and ‘atom2’ columns.
requires_grad (bool, optional) – Whether parameters should have gradients. Default is True.
dtype (torch.dtype, optional) – Data type for tensors. Default is same as initial_xyz.
device (torch.device, optional) – Device for tensors. Default is same as initial_xyz.

property dtype: Return the dtype of tensors.

property device: Return the device of tensors.

forward()[source]

Reconstruct Cartesian xyz from internal coordinates.

Uses fully vectorized operations for maximum performance.

Returns:: Reconstructed Cartesian coordinates of shape (N, 3).
Return type:: torch.Tensor

shake(magnitude=0.1)[source]

Add Gaussian noise to internal parameters.

Parameters:: magnitude (float, optional) – Standard deviation of Gaussian noise. Default is 0.1.
Returns:: New Cartesian coordinates after perturbation.
Return type:: torch.Tensor

fix(selection=None, freeze_at_current=True)[source]

Fix (freeze) atoms to use fixed xyz coordinates.

Parameters:

selection (torch.Tensor, slice, or None) – Boolean mask or indices of atoms to fix.
freeze_at_current (bool, optional) – If True, store current coordinates for selected atoms.

freeze(selection=None, freeze_at_current=True)[source]

Alias for fix().

refine(selection=None, rebuild=True)[source]

Make atoms refinable.

Parameters:

selection (torch.Tensor, slice, or None) – Boolean mask or indices of atoms to make refinable.
rebuild (bool, optional) – If True, rebuild internal coordinates from fixed_xyz.

unfreeze(selection=None, rebuild=True)[source]

Alias for refine().

fix_all(freeze_at_current=True)[source]

Fix all atoms.

freeze_all(freeze_at_current=True)[source]

Alias for fix_all().

refine_all(rebuild=True)[source]

Make all atoms refinable.

unfreeze_all(rebuild=True)[source]

Alias for refine_all().

property n_refinable: int: Return the number of refinable atoms.

property n_fixed: int: Return the number of fixed atoms.