pallatom.helpers.atom_utils

Attributes

Classes

Protein

Protein structure representation.

Functions

atom37_to_atom5(→ tuple[jaxtyping.Float[torch.Tensor, ...)

Extract the 5 backbone+Cβ atoms from atom37 representation.

atom37_to_cb(→ tuple[jaxtyping.Float[torch.Tensor, ...)

Full pipeline: atom37 → atom5 → Cβ / pseudo-Cβ.

center_positions(np_example)

Center 'atom_positions' on CA center of mass.

get_cb_coords(→ tuple[jaxtyping.Float[torch.Tensor, ...)

Extract Cβ (slot 4) from atom5, replacing missing Cβ (Gly) with pseudo-Cβ.

make_fixed_size(np_example[, max_seq_length])

Pad features to fixed sequence length, i.e. currently axis=0.

make_np_example(coords_dict)

Make a dictionary of non-batched numpy protein features.

protein_from_pdb(→ Protein)

Parse a PDB file (ATOM records only) into a Protein using the atom37 layout.

pseudo_cb(→ jaxtyping.Float[torch.Tensor, ... 3])

Compute a virtual Cβ from backbone geometry (Gly-safe).

to_pdb(→ str)

Converts a Protein instance to a PDB string.

Module Contents

class pallatom.helpers.atom_utils.Protein

Protein structure representation.

aatype: jaxtyping.Int[numpy.ndarray, num_res]
atom_mask: jaxtyping.Float[numpy.ndarray, num_res num_atom_type]
atom_positions: jaxtyping.Float[numpy.ndarray, num_res num_atom_type 3]
b_factors: jaxtyping.Float[numpy.ndarray, num_res num_atom_type]
chain_index: jaxtyping.Int[numpy.ndarray, num_res]
residue_index: jaxtyping.Int[numpy.ndarray, num_res]
pallatom.helpers.atom_utils.atom37_to_atom5(atom37_positions: jaxtyping.Float[torch.Tensor, B N_res 37 3], atom37_mask: jaxtyping.Float[torch.Tensor, B N_res 37]) tuple[jaxtyping.Float[torch.Tensor, B N_res 5 3], jaxtyping.Float[torch.Tensor, B N_res 5]]

Extract the 5 backbone+Cβ atoms from atom37 representation.

Returns:

  • atom5_positions ((B, N_res, 5, 3))

  • atom5_mask ((B, N_res, 5))

pallatom.helpers.atom_utils.atom37_to_cb(atom37_positions: jaxtyping.Float[torch.Tensor, B N_res 37 3], atom37_mask: jaxtyping.Float[torch.Tensor, B N_res 37]) tuple[jaxtyping.Float[torch.Tensor, B N_res 3], jaxtyping.Bool[torch.Tensor, B N_res]]

Full pipeline: atom37 → atom5 → Cβ / pseudo-Cβ.

Returns:

  • cb ((B, N_res, 3) — real Cβ where available, pseudo-Cβ otherwise)

  • pseudo_beta_mask ((B, N_res) — True where real Cβ present, False where pseudo-Cβ was used)

pallatom.helpers.atom_utils.center_positions(np_example)

Center ‘atom_positions’ on CA center of mass.

pallatom.helpers.atom_utils.get_cb_coords(atom5_positions: jaxtyping.Float[torch.Tensor, B N_res 5 3], atom5_mask: jaxtyping.Float[torch.Tensor, B N_res 5], fill_pseudo: bool = True) tuple[jaxtyping.Float[torch.Tensor, B N_res 3], jaxtyping.Bool[torch.Tensor, B N_res]]

Extract Cβ (slot 4) from atom5, replacing missing Cβ (Gly) with pseudo-Cβ.

Parameters:
  • atom5_positions ((B, N_res, 5, 3))

  • atom5_mask ((B, N_res, 5) — 1 where atom is present)

  • fill_pseudo (if True, compute pseudo-Cβ wherever Cβ is absent)

Returns:

  • cb ((B, N_res, 3) — real Cβ where available, pseudo-Cβ otherwise)

  • pseudo_beta_mask ((B, N_res) — True where real Cβ present, False where pseudo-Cβ was used)

pallatom.helpers.atom_utils.make_fixed_size(np_example, max_seq_length=500)

Pad features to fixed sequence length, i.e. currently axis=0.

pallatom.helpers.atom_utils.make_np_example(coords_dict)

Make a dictionary of non-batched numpy protein features.

pallatom.helpers.atom_utils.protein_from_pdb(pdb_path: str) Protein

Parse a PDB file (ATOM records only) into a Protein using the atom37 layout.

pallatom.helpers.atom_utils.pseudo_cb(n: jaxtyping.Float[torch.Tensor, ... 3], ca: jaxtyping.Float[torch.Tensor, ... 3], c: jaxtyping.Float[torch.Tensor, ... 3]) jaxtyping.Float[torch.Tensor, ... 3]

Compute a virtual Cβ from backbone geometry (Gly-safe).

Uses the standard ideal-geometry recipe:

b = Cα - N (N→Cα bond vector) d = C - Cα (Cα→C bond vector) Cross them, then combine with ideal tetrahedral offsets.

This matches the AlphaFold2 / ESMFold convention exactly.

pallatom.helpers.atom_utils.to_pdb(prot: Protein) str

Converts a Protein instance to a PDB string.

Parameters:

prot – The protein to convert to PDB.

Returns:

PDB string.

pallatom.helpers.atom_utils.ATOM37_C = 2
pallatom.helpers.atom_utils.ATOM37_CA = 1
pallatom.helpers.atom_utils.ATOM37_CB = 4
pallatom.helpers.atom_utils.ATOM37_N = 0
pallatom.helpers.atom_utils.ATOM37_O = 3
pallatom.helpers.atom_utils.ATOM5_C = 2
pallatom.helpers.atom_utils.ATOM5_CA = 1
pallatom.helpers.atom_utils.ATOM5_CB = 4
pallatom.helpers.atom_utils.ATOM5_ELEMENTS
pallatom.helpers.atom_utils.ATOM5_N = 0
pallatom.helpers.atom_utils.ATOM5_NAMES = ['N', 'CA', 'C', 'O', 'CB']
pallatom.helpers.atom_utils.ATOM5_O = 3
pallatom.helpers.atom_utils.PDB_CHAIN_IDS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
pallatom.helpers.atom_utils.PDB_MAX_CHAINS = 62
pallatom.helpers.atom_utils.atom_types = ['N', 'CA', 'C', 'CB', 'O', 'CG', 'CG1', 'CG2', 'OG', 'OG1', 'SG', 'CD', 'CD1', 'CD2', 'ND1',...
pallatom.helpers.atom_utils.restype_1to3
pallatom.helpers.atom_utils.restype_3to1
pallatom.helpers.atom_utils.restype_num = 21
pallatom.helpers.atom_utils.restype_order
pallatom.helpers.atom_utils.restypes = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'X']
pallatom.helpers.atom_utils.rigid_group_atom_positions