torchref.refinement.optimizers package

Optimizers for crystallographic refinement.

This module provides custom optimizers and optimization functions: - AdamWithAdaptiveNoise: Adam with scale-invariant noise injection - optimize_simulated_annealing: Simulated annealing optimization - optimize_stochastic_sa: Stochastic SA for internal coordinates (per-parameter) - optimize_stochastic_sa_batch: Stochastic SA for internal coordinates (batch) - optimize_internal_coord_sa: Universal SA for internal coordinates with auto-calibration - optimize_gradient_sa: Gradient-based SA with per-parameter acceptance - refine_sa_lbfgs: Combined Metropolis SA + LBFGS pipeline - optimize_momentum_sa: Phenix-style SA (gradient descent + momentum + noise) - refine_momentum_sa_lbfgs: Combined Phenix-style SA + LBFGS pipeline - ExploratoryLBFGS: LBFGS with automatic landscape exploration via Lanczos - LangevinSA: BAOAB Langevin dynamics with simulated annealing

class torchref.refinement.optimizers.AdamWithAdaptiveNoise(params, lr=0.001, alpha=0.1, eps=1e-08, update_weight=0.05, **kwargs)[source]

Bases: Adam

Drop-in replacement for torch.optim.Adam with adaptive, scale-invariant noise injection.

Injects Gaussian noise into gradients scaled by the overfitting ratio between training and test NLL to prevent overfitting.

Parameters:
  • params (iterable) – Model parameters to optimize.

  • lr (float, optional) – Learning rate. Default is 1e-3.

  • alpha (float, optional) – Scaling factor for how much noise to inject per unit overfitting ratio. Default is 0.1.

  • eps (float, optional) – Small constant for numerical stability. Default is 1e-8.

  • update_weight (float, optional) – Weight for exponential moving average of noise scale. Default is 0.05.

  • **kwargs – Additional arguments passed to Adam optimizer.

alpha

Noise scaling factor.

Type:

float

eps

Numerical stability constant.

Type:

float

noise_scale

Current noise scale (dynamically updated).

Type:

float

update_weight

EMA weight for noise scale updates.

Type:

float

__init__(params, lr=0.001, alpha=0.1, eps=1e-08, update_weight=0.05, **kwargs)[source]

Initialize AdamWithAdaptiveNoise.

Parameters:
  • params (iterable) – Model parameters to optimize.

  • lr (float, optional) – Learning rate. Default is 1e-3.

  • alpha (float, optional) – Scaling factor for how much noise to inject per unit overfitting ratio. Default is 0.1.

  • eps (float, optional) – Small constant for numerical stability. Default is 1e-8.

  • update_weight (float, optional) – Weight for exponential moving average of noise scale. Default is 0.05.

  • **kwargs – Additional arguments passed to Adam optimizer.

inject_noise()[source]

Inject scale-invariant Gaussian noise into gradients.

The noise standard deviation is proportional to the gradient and parameter norms, scaled by the current noise_scale and alpha.

step()[source]

Perform a single optimization step with optional noise injection.

Injects noise into gradients before the Adam update if noise_scale > 0.

update_noise_scale(train_nll, test_nll)[source]

Update the noise scale based on the ratio of test to training NLL.

If ratio > 1, the model is overfitting and noise is increased.

Parameters:
  • train_nll (torch.Tensor) – Training set negative log-likelihood.

  • test_nll (torch.Tensor) – Test set negative log-likelihood.

class torchref.refinement.optimizers.ExploratoryLBFGS(params, lr=1.0, max_iter=20, history_size=100, m_modes=10, m_lanczos_iter=None, eigenvalue_threshold=0.01, participation_threshold=0.05, scan_points=20, scan_step_size=0.1, max_exploration_cycles=5, hvp_epsilon=0.0001, convergence_grad_threshold=1e-05, convergence_loss_threshold=1e-07, convergence_param_threshold=1e-06, n_stable=3, verbose=1)[source]

Bases: Optimizer

LBFGS optimizer with automatic landscape exploration via Lanczos analysis.

Composes with (rather than subclasses) torch.optim.LBFGS. After the internal LBFGS converges, performs eigenanalysis of the Hessian to find degenerate/flat directions, scans along them, and hops to better basins if found.

Parameters:
  • params (iterable) – Parameters to optimize.

  • lr (float) – LBFGS learning rate. Default: 1.0.

  • max_iter (int) – LBFGS max line search iterations per step. Default: 20.

  • history_size (int) – LBFGS Hessian approximation memory. Default: 100.

  • m_modes (int) – Number of lowest eigenmodes to compute. Default: 10.

  • m_lanczos_iter (int, optional) – Lanczos iterations. Default: 2*m_modes + 10.

  • eigenvalue_threshold (float) – Mode is degenerate if eigenvalue < threshold * median(positive). Default: 0.01.

  • participation_threshold (float) – Parameter participates if |component| > threshold * ||mode||. Default: 0.05.

  • scan_points (int) – Evaluation points per scan direction. Default: 20.

  • scan_step_size (float) – Step size in parameter space units. Default: 0.1.

  • max_exploration_cycles (int) – Cap on explore-hop cycles. Default: 5.

  • hvp_epsilon (float) – Finite-difference epsilon for Hessian-vector products. Default: 1e-4.

  • convergence_grad_threshold (float) – Gradient norm convergence threshold. Default: 1e-5.

  • convergence_loss_threshold (float) – Loss change convergence threshold. Default: 1e-7.

  • convergence_param_threshold (float) – Parameter change convergence threshold. Default: 1e-6.

  • n_stable (int) – Consecutive converged steps required. Default: 3.

  • verbose (int) – Verbosity level: 0=silent, 1=summary, 2=detailed. Default: 1.

__init__(params, lr=1.0, max_iter=20, history_size=100, m_modes=10, m_lanczos_iter=None, eigenvalue_threshold=0.01, participation_threshold=0.05, scan_points=20, scan_step_size=0.1, max_exploration_cycles=5, hvp_epsilon=0.0001, convergence_grad_threshold=1e-05, convergence_loss_threshold=1e-07, convergence_param_threshold=1e-06, n_stable=3, verbose=1)[source]
property phase: OptimizerPhase

Current optimizer phase.

step(closure)[source]

Perform one step of the state machine.

Parameters:

closure (callable) – A closure that re-evaluates the model loss. Should call optimizer.zero_grad(), compute the loss, call loss.backward(), and return the loss.

Returns:

The loss value.

Return type:

float or None

class torchref.refinement.optimizers.LangevinSA(params, dt=0.01, friction=10.0, T_initial=2500.0, T_final=0.01, total_steps=1000, cooling_schedule='exponential', adaptive_masses=True, mass_beta=0.999, mass_eps=1e-08, gradient_clip=None, max_step_size=0.1)[source]

Bases: Optimizer

BAOAB Langevin dynamics integrator with simulated annealing.

Implements the BAOAB splitting scheme (Leimkuhler & Matthews, 2013) for gradient-guided exploration with thermodynamically correct noise. One gradient evaluation per step via staggered B steps.

Adaptive masses from EMA of squared gradients provide automatic scale invariance across all parameter types (xyz, B-factors, occupancies, torsions, etc.).

Call calibrate() before the main loop to probe parameter stiffness and warm up the adaptive masses without moving the structure.

Args:

params: Iterable of parameters or param groups. dt: Integration timestep. friction: Friction coefficient gamma. Controls thermalization speed. T_initial: Starting temperature. T_final: Final temperature. total_steps: Total number of annealing steps. cooling_schedule: ‘exponential’ or ‘linear’. adaptive_masses: Use EMA of grad² as per-element masses. mass_beta: EMA decay for adaptive masses. mass_eps: Floor for adaptive masses (numerical stability). gradient_clip: Optional max gradient norm (per-parameter). max_step_size: Maximum displacement per element per full step.

Velocities are clamped so |v * dt| <= max_step_size.

__init__(params, dt=0.01, friction=10.0, T_initial=2500.0, T_final=0.01, total_steps=1000, cooling_schedule='exponential', adaptive_masses=True, mass_beta=0.999, mass_eps=1e-08, gradient_clip=None, max_step_size=0.1)[source]
property temperature

Current temperature from the annealing schedule.

property current_step
property total_steps
property kinetic_energy

Sum of 0.5 * m * v^2 over all parameters (diagnostic).

calibrate(closure, n_steps=10)[source]

Probe parameter stiffness over n_steps, then rollback.

Runs small random perturbations to collect gradient statistics, sets the adaptive masses from the observed grad², then restores all parameters to their original values and initialises velocities from Maxwell-Boltzmann with correctly scaled masses.

Args:
closure: Same closure as for step() — must zero_grad,

compute loss, call backward, and return loss.

n_steps: Number of probing steps.

step(closure)[source]

Perform one BAOAB Langevin dynamics step.

Tracks the best-loss configuration and rolls back to it when the loss exceeds loss_rollback_factor times the best loss seen so far. This prevents the dynamics from permanently damaging the structure while still allowing uphill exploration.

Args:
closure: A callable that re-evaluates the model and returns the

loss. The closure must call loss.backward() before returning.

Returns:

The loss value from the closure evaluation.

class torchref.refinement.optimizers.MomentumStochasticSA(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, T_initial=1.0, T_final=0.01, total_steps=1000)[source]

Bases: Adam

Adam-based SA where noise is scaled by the adaptive learning rate, giving automatic scale invariance across parameters.

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, T_initial=1.0, T_final=0.01, total_steps=1000)[source]
step(closure=None)[source]

Perform a single optimization step.

Args:
closure (Callable, optional): A closure that reevaluates the model

and returns the loss.

Submodules