torchref.base.kernels.optimized_ops module

Optimized versions of map building functions with kernel fusion to reduce CPU-GPU synchronization overhead.

torchref.base.kernels.optimized_ops.warmup_cuda_operations(device='cuda')[source]

Warm up CUDA kernels to avoid lazy loading overhead.

This function runs dummy operations to trigger CUDA kernel compilation and loading, so subsequent operations don’t incur this overhead.

Call this once after moving model to GPU.

Parameters:

device (str) – Device to warm up. Default is “cuda”.

class torchref.base.kernels.optimized_ops.CachedRadiusMask[source]

Bases: object

Cache the radius mask computation to avoid recomputing for every atom batch.

This eliminates redundant computation when processing multiple atoms with the same voxel size and radius.

Usage

>>> cache = CachedRadiusMask()
>>> offsets = cache.get_offsets(voxel_size, radius_angstrom, device)
param None:

_cache

Internal cache storing computed offsets.

Type:

dict

__init__()[source]
get_offsets(voxel_size, radius_angstrom, device)[source]

Get cached offset grid for given parameters.

Parameters:
  • voxel_size (torch.Tensor) – Voxel dimensions, shape (3,).

  • radius_angstrom (float) – Radius in Angstroms.

  • device (torch.device) – Device for the output tensor.

Returns:

Voxel offsets within radius, shape (N_voxels, 3).

Return type:

torch.Tensor

torchref.base.kernels.optimized_ops.get_cached_radius_offsets(voxel_size, radius_angstrom, device)[source]

Get cached radius offsets to avoid recomputation.

This eliminates redundant computation when processing multiple atoms with the same voxel size and radius.

Parameters:
  • voxel_size (torch.Tensor) – Voxel dimensions, shape (3,).

  • radius_angstrom (float) – Radius in Angstroms.

  • device (torch.device) – Device for the output tensor.

Returns:

Voxel offsets within radius, shape (N_voxels, 3).

Return type:

torch.Tensor

torchref.base.kernels.optimized_ops.vectorized_add_to_map_optimized(surrounding_coords, voxel_indices, map, xyz, b, inv_frac_matrix, frac_matrix, A, B, occ)[source]

Optimized version of vectorized_add_to_map using fused Gaussian calculation.

This is a drop-in replacement that uses the fused_gaussian_density function to reduce kernel launches.

Parameters:
  • surrounding_coords (torch.Tensor) – Cartesian coordinates of voxels, shape (N_atoms, N_voxels, 3).

  • voxel_indices (torch.Tensor) – Indices of voxels in the map, shape (N_atoms, N_voxels, 3).

  • map (torch.Tensor) – Electron density map to update, shape (nx, ny, nz).

  • xyz (torch.Tensor) – Atom positions in Cartesian coordinates, shape (N_atoms, 3).

  • b (torch.Tensor) – Isotropic B-factors, shape (N_atoms,).

  • inv_frac_matrix (torch.Tensor) – Inverse fractionalization matrix, shape (3, 3).

  • frac_matrix (torch.Tensor) – Fractionalization matrix, shape (3, 3).

  • A (torch.Tensor) – ITC92 amplitude coefficients, shape (N_atoms, 5).

  • B (torch.Tensor) – ITC92 width coefficients, shape (N_atoms, 5).

  • occ (torch.Tensor) – Atomic occupancies, shape (N_atoms,).

Returns:

Updated electron density map.

Return type:

torch.Tensor