torchref.io.pdb module
PDB file format reading and writing.
This module provides functions for reading and writing PDB files containing atomic coordinate data.
Functions
- read
Read a PDB file and return a reader object.
- write
Write atomic coordinates to a PDB file.
- find_header_length
Find the number of header lines in a PDB file.
- load_as_dataframe
Load a PDB file into a pandas DataFrame.
- read_crystallographic_info
Extract unit cell and space group from a PDB file.
Classes
- PDBReader
Reader class for PDB files.
Examples
from torchref.io import pdb
# Reading
reader = pdb.read('structure.pdb', verbose=1)
df, cell, spacegroup = reader()
# Writing
pdb.write(df, 'output.pdb')
- torchref.io.pdb.find_header_length(filepath, max_header_length=100000)[source]
Find the number of header lines in a PDB file.
Scans the file line by line until an ATOM or HETATM record is found.
- Parameters:
- Returns:
Number of header lines before the first ATOM/HETATM record.
- Return type:
- Raises:
ValueError – If header length exceeds max_header_length.
- torchref.io.pdb.read_crystallographic_info(filepath)[source]
Extract crystallographic information from a PDB file.
Reads the CRYST1 record to obtain unit cell parameters and space group.
- Parameters:
filepath (str) – Path to the PDB file.
- Returns:
cell (list of float or None) – Unit cell parameters [a, b, c, alpha, beta, gamma] in A and degrees.
spacegroup (str or None) – Space group symbol.
z (str or None) – Number of molecules per unit cell.
- Return type:
- torchref.io.pdb.load_as_dataframe(filepath, skipheader=0, skipfooter=1)[source]
Load a PDB file into a pandas DataFrame.
Parses ATOM, HETATM, and ANISOU records from a PDB file and returns a structured DataFrame with all atomic properties.
- Parameters:
- Returns:
DataFrame with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge, anisou_flag, u11, u22, u33, u12, u13, u23, index. DataFrame attributes include ‘cell’, ‘spacegroup’, and ‘z’.
- Return type:
pd.DataFrame
- class torchref.io.pdb.PDBReader(verbose=0)[source]
Bases:
objectReader for PDB files containing atomic coordinate data.
This class reads PDB files and extracts atomic coordinates, properties, and crystallographic metadata.
- dataframe
DataFrame containing atomic data.
- Type:
pd.DataFrame
Examples
reader = pdb.read('structure.pdb', verbose=1) df, cell, spacegroup = reader() print(f"Loaded {len(df)} atoms")
- __init__(verbose=0)[source]
Initialize PDB reader.
- Parameters:
verbose (int, optional) – Verbosity level (0=silent, 1=normal, 2=debug). Default is 0.
- torchref.io.pdb.extract_pdb_headers(filepath)[source]
Read all header lines (before first ATOM/HETATM) from a PDB file.
- torchref.io.pdb.extract_link_records(filepath, verbose=0)[source]
Parse LINK records from a PDB file (PDB v3.3 format).
Symmetry-mate links (sym1 or sym2 not blank/
1555) are skipped with a warning, since the asymmetric unit holds no copy of the symmetry mate that the bond can attach to.- Parameters:
- Returns:
One row per accepted LINK record with columns
name1,altloc1,resname1,chainid1,resseq1,icode1(and the matching*2set), pluslength(NaN if blank). Empty DataFrame if none.- Return type:
pd.DataFrame
- torchref.io.pdb.write(df, filepath, template=None, metadata=None)[source]
Write a DataFrame to a PDB file.
- Parameters:
df (pandas.DataFrame) – DataFrame containing atom data with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge.
filepath (str) – Output PDB filename.
template (str, optional) – PDB template file to copy header from (deprecated, use metadata).
metadata (RefinementMetadata, optional) – Metadata to render as PDB header (REMARK 3, TITLE, etc.).
- torchref.io.pdb.write_multi_model(dataframes, filepath, model_names=None)[source]
Write multiple models to a single PDB file with MODEL/ENDMDL records.
Each DataFrame is wrapped in a MODEL/ENDMDL pair, producing a multi-model PDB file suitable for ensemble or time-resolved data.
- Parameters:
dataframes (list of pandas.DataFrame) – List of atom DataFrames (same format as
write()expects).filepath (str) – Output PDB filename.
model_names (list of str, optional) – Names for each model (written as REMARK before each MODEL record). If None, models are numbered sequentially.
- torchref.io.pdb.find_header_length_pdb_file(filepath, max_header_length=100000)
Find the number of header lines in a PDB file.
Scans the file line by line until an ATOM or HETATM record is found.
- Parameters:
- Returns:
Number of header lines before the first ATOM/HETATM record.
- Return type:
- Raises:
ValueError – If header length exceeds max_header_length.
- torchref.io.pdb.load_pdb_as_pd(filepath, skipheader=0, skipfooter=1)
Load a PDB file into a pandas DataFrame.
Parses ATOM, HETATM, and ANISOU records from a PDB file and returns a structured DataFrame with all atomic properties.
- Parameters:
- Returns:
DataFrame with columns: ATOM, serial, name, altloc, resname, chainid, resseq, icode, x, y, z, occupancy, tempfactor, element, charge, anisou_flag, u11, u22, u33, u12, u13, u23, index. DataFrame attributes include ‘cell’, ‘spacegroup’, and ‘z’.
- Return type:
pd.DataFrame