torchref.io.pdb module

PDB file format reading and writing.

This module provides functions for reading and writing PDB files containing atomic coordinate data.

Functions

read: Read a PDB file and return a reader object.
write: Write atomic coordinates to a PDB file.
find_header_length: Find the number of header lines in a PDB file.
load_as_dataframe: Load a PDB file into a pandas DataFrame.
read_crystallographic_info: Extract unit cell and space group from a PDB file.

Classes

PDBReader: Reader class for PDB files.

Examples

from torchref.io import pdb

# Reading
reader = pdb.read('structure.pdb', verbose=1)
df, cell, spacegroup = reader()

# Writing
pdb.write(df, 'output.pdb')

torchref.io.pdb.find_header_length(filepath, max_header_length=100000)[source]

Find the number of header lines in a PDB file.

Scans the file line by line until an ATOM or HETATM record is found.

Parameters:

filepath (str) – Path to the PDB file.
max_header_length (int, optional) – Maximum number of header lines to scan. Default is 100000.

Returns:

Number of header lines before the first ATOM/HETATM record.

Return type:

int

Raises:

ValueError – If header length exceeds max_header_length.

torchref.io.pdb.read_crystallographic_info(filepath)[source]

Extract crystallographic information from a PDB file.

Reads the CRYST1 record to obtain unit cell parameters and space group.

Parameters:

filepath (str) – Path to the PDB file.

Returns:

cell (list of float or None) – Unit cell parameters [a, b, c, alpha, beta, gamma] in A and degrees.
spacegroup (str or None) – Space group symbol.
z (str or None) – Number of molecules per unit cell.

Return type:

Tuple[List[float] | None, str | None, str | None]

torchref.io.pdb.load_as_dataframe(filepath, skipheader=0, skipfooter=1)[source]

Load a PDB file into a pandas DataFrame.

Parses ATOM, HETATM, and ANISOU records from a PDB file and returns a structured DataFrame with all atomic properties.

Parameters:

filepath (str) – Path to the PDB file.
skipheader (int, optional) – Number of header lines to skip. If 0, automatically detected.
skipfooter (int, optional) – Number of footer lines to skip. Default is 1.

Returns:

DataFrame with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge, anisou_flag, u11, u22, u33, u12, u13, u23, index. DataFrame attributes include ‘cell’, ‘spacegroup’, and ‘z’.

Return type:

pd.DataFrame

class torchref.io.pdb.PDBReader(verbose=0)[source]

Bases: object

Reader for PDB files containing atomic coordinate data.

This class reads PDB files and extracts atomic coordinates, properties, and crystallographic metadata.

verbose

Verbosity level for logging.

Type:: int

dataframe

DataFrame containing atomic data.

Type:: pd.DataFrame

cell

Unit cell parameters [a, b, c, alpha, beta, gamma].

Type:: list or None

spacegroup

Space group symbol.

Type:: str or None

Examples

reader = pdb.read('structure.pdb', verbose=1)
df, cell, spacegroup = reader()
print(f"Loaded {len(df)} atoms")

__init__(verbose=0)[source]

Initialize PDB reader.

Parameters:: verbose (int, optional) – Verbosity level (0=silent, 1=normal, 2=debug). Default is 0.

read(filepath)[source]

Read a PDB file and extract atomic data.

Parameters:: filepath (str) – Path to the PDB file.
Returns:: Self, for method chaining.
Return type:: PDBReader

__call__()[source]

Return extracted data in a standardized format.

Returns:

dataframe (pd.DataFrame) – DataFrame with atomic data.
cell (np.ndarray or None) – Unit cell parameters [a, b, c, alpha, beta, gamma].
spacegroup (str or None) – Space group symbol.

Return type:

Tuple[DataFrame, ndarray | None, str | None]

torchref.io.pdb.read(filepath, verbose=0)[source]

Read a PDB file.

Parameters:

filepath (str) – Path to the PDB file.
verbose (int, optional) – Verbosity level. Default is 0.

Returns:

Reader object with data loaded.

Return type:

PDBReader

torchref.io.pdb.extract_pdb_headers(filepath)[source]

Read all header lines (before first ATOM/HETATM) from a PDB file.

Parameters:: filepath (str) – Path to the PDB file.
Returns:: Header lines (without trailing newlines).
Return type:: list of str

torchref.io.pdb.extract_link_records(filepath, verbose=0)[source]

Parse LINK records from a PDB file (PDB v3.3 format).

Symmetry-mate links (sym1 or sym2 not blank/1555) are skipped with a warning, since the asymmetric unit holds no copy of the symmetry mate that the bond can attach to.

Parameters:

filepath (str) – Path to the PDB file.
verbose (int, optional) – If > 0, prints a one-line summary; if > 1, also warns about skipped symmetry-mate or malformed records.

Returns:

One row per accepted LINK record with columns name1, altloc1, resname1, chainid1, resseq1, icode1 (and the matching *2 set), plus length (NaN if blank). Empty DataFrame if none.

Return type:

pd.DataFrame

torchref.io.pdb.write(df, filepath, template=None, metadata=None)[source]

Write a DataFrame to a PDB file.

Parameters:

df (pandas.DataFrame) – DataFrame containing atom data with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge.
filepath (str) – Output PDB filename.
template (str, optional) – PDB template file to copy header from (deprecated, use metadata).
metadata (RefinementMetadata, optional) – Metadata to render as PDB header (REMARK 3, TITLE, etc.).

torchref.io.pdb.write_multi_model(dataframes, filepath, model_names=None)[source]

Write multiple models to a single PDB file with MODEL/ENDMDL records.

Each DataFrame is wrapped in a MODEL/ENDMDL pair, producing a multi-model PDB file suitable for ensemble or time-resolved data.

Parameters:

dataframes (list of pandas.DataFrame) – List of atom DataFrames (same format as write() expects).
filepath (str) – Output PDB filename.
model_names (list of str, optional) – Names for each model (written as REMARK before each MODEL record). If None, models are numbered sequentially.

torchref.io.pdb.PDB: alias of PDBReader

torchref.io.pdb.find_header_length_pdb_file(filepath, max_header_length=100000)

Find the number of header lines in a PDB file.

Scans the file line by line until an ATOM or HETATM record is found.

Parameters:

filepath (str) – Path to the PDB file.
max_header_length (int, optional) – Maximum number of header lines to scan. Default is 100000.

Returns:

Number of header lines before the first ATOM/HETATM record.

Return type:

int

Raises:

ValueError – If header length exceeds max_header_length.

torchref.io.pdb.load_pdb_as_pd(filepath, skipheader=0, skipfooter=1)

Load a PDB file into a pandas DataFrame.

Parses ATOM, HETATM, and ANISOU records from a PDB file and returns a structured DataFrame with all atomic properties.

Parameters:

filepath (str) – Path to the PDB file.
skipheader (int, optional) – Number of header lines to skip. If 0, automatically detected.
skipfooter (int, optional) – Number of footer lines to skip. Default is 1.

Returns:

DataFrame with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge, anisou_flag, u11, u22, u33, u12, u13, u23, index. DataFrame attributes include ‘cell’, ‘spacegroup’, and ‘z’.

Return type:

pd.DataFrame