Analysis Module

Analysis Module

Unified MD trajectory analysis toolkit for the SciKit package.

All eight analyses are registered as sub-commands of the scical CLI entry-point (powered by Typer). Each command reads a topology and (optionally) a trajectory file, runs the analysis in parallel where supported, and writes results to plain-text .dat or NumPy .npy files.

Get started — list all commands and options:

scical --help

Per-command help:

scical msd --help
scical rg --help
scical dssp --help
scical distance --help
scical distance-acf --help
scical vector-acf --help
scical contacts --help
scical aggr --help
scical time-s2 --help

Available commands

Command

Description

msd

Per-segment Cα mean squared displacement (FFT, parallel)

rg

Per-segment radius of gyration time series

dssp

Per-residue DSSP helicity and β-sheet content

distance

Cα–Cα distances for user-defined residue pairs over a trajectory

distance-acf

Normalised fluctuation ACF of inter-Cα distances

vector-acf

End-to-end Cα vector autocorrelation function

contacts

Intra- and inter-chain heavy-atom contact maps (parallel)

aggr

Aggregation analysis: clustering, PBC recentering, radial density

time-s2

Time-dependent S² order parameter

Dependencies

mdanalysis, typer, numpy, scipy


CLI Commands

All functions below are registered as sub-commands of the scical entry-point, defined in pyproject.toml as:

[project.scripts]
scical = "SciKit.Analysis:app"

After installation, run scical --help for a full command summary, or scical <command> --help for the option list of any individual command:

scical --help
scical rg --help

Mean Squared Displacement

SciKit.Analysis.cmd_msd(top='conf.psf', traj='system.xtc', resid=None, outdir='./diffusion', start=0, stop=-1, stride=1, per_residue=False, max_tau=None, nproc=4)[source]

Calculate per-segment Cα mean squared displacement (FFT method).

Iterates over all segments in the topology that contain matching Cα atoms and dispatches each segment to a subprocess worker. The MSD is computed via the Einstein relation using MDAnalysis’ EinsteinMSD with FFT acceleration.

Parameters:
  • top (str) – Path to the topology file (PSF, PDB, GRO, …).

  • traj (str) – Path to the trajectory file (XTC, DCD, …).

  • resid (Optional[str]) – Residue range to restrict the Cα selection, e.g. "1:38". When omitted the full segment is used.

  • outdir (str) – Output directory; created automatically if absent.

  • start (int) – Index of the first trajectory frame to include.

  • stop (int) – Index of the last trajectory frame; -1 means end.

  • stride (int) – Step between analysed frames.

  • per_residue (bool) – When True, also write per-Cα MSD files.

  • max_tau (Optional[float]) – Truncate the analysis at this lag time (ps) by limiting the number of frames passed to EinsteinMSD. The maximum achievable lag equals (n_frames - 1) × dt, so only ceil(max_tau / dt) + 1 frames are ever read from disk. None (default) uses all available frames.

  • nproc (int) – Number of parallel worker processes.

Output:

<outdir>/<segid>_msd.dat — two columns: lag time (ps) and MSD (Ų). <outdir>/<segid>_CA_msd.dat — per-Cα MSD table (--per-residue only).

Example:

scical msd --top conf.psf --traj system.xtc --resid 1:50 \
           --max-tau 5000 --outdir ./msd_results --nproc 8

Radius of Gyration

SciKit.Analysis.cmd_rg(top='conf.psf', traj='system.xtc', sel=None, resid=None, out='rg.dat', start=0, stop=-1, stride=10, nproc=4)[source]

Calculate per-segment radius of gyration time series.

Computes Rg(t) for every protein-containing segment in parallel and writes a single multi-column .dat file with one column per segment, indexed by frame number.

Parameters:
  • top (str) – Path to the topology file.

  • traj (str) – Path to the trajectory file.

  • sel (Optional[str]) – Segment selection expression. Accepts a single segment ("R001"), a range ("R001-R010"), or a comma-separated combination ("R001,R003-R006,R010"). Omit to use all segments.

  • resid (Optional[str]) – Residue range to restrict the selection (e.g. "1:100"). Omit to use the full protein.

  • out (str) – Path for the output .dat file.

  • start (int) – Index of the first trajectory frame.

  • stop (int) – Index of the last trajectory frame; -1 means end.

  • stride (int) – Step between analysed frames.

  • nproc (int) – Number of parallel worker processes.

Output:

<out> — space-separated columns: frame  <segid1>  <segid2>  Units: frame index (dimensionless), Rg in ångström.

Example:

scical rg --top conf.psf --traj system.xtc --sel R001-R010 --out rg.dat --stride 5 --nproc 4

DSSP Secondary Structure

SciKit.Analysis.cmd_dssp(top='conf.psf', traj='system.xtc', hout='helicity.dat', bout='beta.dat', start=0, stop=-1, stride=10, nproc=4)[source]

Calculate per-residue DSSP helicity and beta-sheet content.

Assigns DSSP secondary-structure codes to every protein residue in every analysed frame using MDAnalysis’ built-in DSSP implementation. Per-residue fractions are averaged over time and written to two separate files — one for helicity ('H' codes) and one for β-sheet content ('E' codes).

Parameters:
  • top (str) – Path to the topology file.

  • traj (str) – Path to the trajectory file.

  • hout (str) – Output path for the per-residue helicity table.

  • bout (str) – Output path for the per-residue β-sheet table.

  • start (int) – Index of the first trajectory frame.

  • stop (int) – Index of the last trajectory frame; -1 means end.

  • stride (int) – Step between analysed frames.

  • nproc (int) – Number of parallel worker processes.

Output:

<hout> and <bout> — space-separated tables: residue  <segid1>  <segid2>  Values are per-residue helix / β-strand occupancy fractions in [0, 1].

Example:

scical dssp --top conf.psf --traj system.xtc --hout helicity.dat --bout beta.dat --stride 10 --nproc 4

Pairwise Cα Distances

SciKit.Analysis.cmd_dist(top='conf.psf', traj='system.dcd', pair_files=['./residue_pairs.dat'], start=None, stop=None, stride=1, workers=2)[source]

Calculate Cα–Cα distances for residue pairs over a trajectory.

Reads one or more pair-list files, resolves the corresponding Cα atom indices, and distributes the trajectory frames across parallel workers. Results are assembled and written to a _distance.dat file placed alongside each input pair file.

Pair file format (comma- or whitespace-separated, # comments allowed):

resid1 segid1 resid2 segid2 10 PROA 45 PROA 10 PROA 45 PROB

Parameters:
  • top (str) – Path to the topology file.

  • traj (str) – Path to the trajectory file.

  • pair_files (List[str]) – One or more residue-pair list files. Duplicate pairs across files are computed only once.

  • start (Optional[int]) – First trajectory frame to include (None = frame 0).

  • stop (Optional[int]) – Last trajectory frame, exclusive (None = last frame).

  • stride (int) – Step between analysed frames.

  • workers (int) – Number of parallel worker processes; defaults to the number of logical CPU cores.

Output:

<pair_file_stem>_distance.dat — one row per pair, one column per frame, prefixed by resid_1, segid_1, resid_2, segid_2.

Example:

scical distance --top conf.psf --traj system.dcd -f pairs.dat --stride 2 --workers 8

Distance Autocorrelation Function

SciKit.Analysis.cmd_dist_acf(top='conf.psf', traj='system.xtc', pairs='pairs.dat', out='distance_acf.dat', start=0, stop=-1, stride=10, nproc=4)[source]

Compute inter-Cα distance autocorrelation for residue pairs.

For each pair in pairs, extracts the Cα–Cα distance time series and computes the normalised fluctuation ACF via FFT (Wiener–Khinchin theorem). All ACF curves are written to a single .dat file.

Parameters:
  • top (str) – Path to the topology file.

  • traj (str) – Path to the trajectory file.

  • pairs (str) – Path to the residue-pair list file.

  • out (str) – Output file path for the ACF table.

  • start (int) – First trajectory frame to include.

  • stop (int) – Last trajectory frame; -1 means end.

  • stride (int) – Step between analysed frames (also scales the lag-time axis).

  • nproc (int) – Number of parallel worker processes.

Output:

<out> — columns: lag time (ps), normalised ACF per pair. The zero-lag value is normalised to 1.

Example:

scical distance-acf --top conf.psf --traj system.xtc --pairs pairs.dat --out distance_acf.dat --stride 10 --nproc 4

Vector Autocorrelation Function

SciKit.Analysis.cmd_vector_acf(top='conf.psf', traj='system.xtc', pairs='pairs.dat', out='vector_acf.dat', start=0, stop=-1, stride=10, nproc=4)[source]

Compute end-to-end Cα vector autocorrelation for residue pairs.

For each pair in pairs, collects the inter-Cα displacement vector **R**(t) and computes the normalised vector ACF:

\[C(t) = \frac{\langle \mathbf{R}(0) \cdot \mathbf{R}(t) \rangle} {\langle |\mathbf{R}|^2 \rangle}\]

All ACF curves are written to a single .dat file.

Parameters:
  • top (str) – Path to the topology file.

  • traj (str) – Path to the trajectory file.

  • pairs (str) – Path to the residue-pair list file.

  • out (str) – Output file path for the ACF table.

  • start (int) – First trajectory frame to include.

  • stop (int) – Last trajectory frame; -1 means end.

  • stride (int) – Step between analysed frames.

  • nproc (int) – Number of parallel worker processes.

Output:

<out> — columns: lag time (ps), normalised vector ACF per pair.

Example:

scical vector-acf --top conf.psf --traj system.xtc --pairs pairs.dat --out vector_acf.dat --stride 10 --nproc 4

Contact Maps

SciKit.Analysis.cmd_contacts(top='system.psf', traj=None, cutoff=6.0, components=None, pairs=None, start=0, stop=None, stride=1, nproc=1, out='contact_map.npy')[source]

Calculate intra- and inter-chain contact maps for multi-component systems (heavy-atom, parallel).

Identifies contacts between heavy atoms (CA, CB, CC, CD, CE, CF) within a user-defined distance cutoff, accounting for periodic boundary conditions. Supports systems containing multiple distinct peptide/protein components (e.g. 100 copies of chain A + 100 copies of chain B).

Components are defined by grouping auto-detected segids:

--components "A:100 B:100 C:100"   # first 100 segids → A, next 100 → B, …

Contact pairs to compute:

--pairs "A-A A-B B-B"   # default: all same-component pairs

Contacts are classified as:

  • Intra — same chain copy, sequence separation ≥ 4 residues (same-component pairs only).

  • Inter — different chain copies of the same component, or cross-component contacts.

Normalisation: sum over all copy pairs / (n_copies_X × n_frames), giving the average contacts that residue i of one copy makes with residue j per frame.

Parameters:
  • top (str) – Path to the topology file (PSF).

  • traj (Optional[str]) – Path to the trajectory file. When omitted, only the single frame in top is analysed.

  • cutoff (float) – Heavy-atom distance cutoff in ångström.

  • components (Optional[str]) – Component definitions e.g. "A:100 B:100". When omitted all segids are assigned to a single component "A".

  • pairs (Optional[str]) – Contact pairs to compute e.g. "A-A A-B". When omitted, all same-component pairs are computed.

  • start (int) – Index of the first trajectory frame.

  • stop (Optional[int]) – Index of the last trajectory frame (exclusive); None means use all frames.

  • stride (int) – Step between analysed frames.

  • nproc (int) – Number of parallel worker processes.

  • out (str) –

    Output file stem. Per requested pair X-Y the following files are written:

    • <stem>_X-Y_intra.npy — intra-copy map (same-component only).

    • <stem>_X-Y_inter.npy — inter-copy map (same-component only).

    • <stem>_X-Y_total.npy — combined (or cross-component) map.

    • <stem>_X-Y_resids.npy / <stem>_X-Y_resids_X.npy / _resids_Y.npy.

Example:

# Single-component system (original behaviour)
scical contacts --top system.psf --traj traj.dcd --cutoff 8.0 --stride 5 --nproc 8

# Multi-component: 100 A-chains + 100 B-chains, compute A-A and A-B maps
scical contacts --top system.psf --traj traj.dcd \
    --components "A:100 B:100" --pairs "A-A A-B" --nproc 8

Aggregation Analysis

SciKit.Analysis.cmd_aggr(top='conf.psf', traj='system.xtc', out='aggr.dat', outtraj='recentered.xtc', profile='density_profile.dat', ref='CA', rcut=8.0, castep=1, start=None, stop=None, stride=1, recenter=False, density=False, dr=2.0, n_frames_avg=50, nproc=1, sel=None)[source]

Analyse protein aggregation: cluster detection, PBC recentering, and radial density.

Processes each trajectory frame to:

  1. Cluster — detect multi-chain aggregates using kd-tree contact search on reference atoms (default: Cα), with a union-find merging scheme. Only the segments selected by --sel are considered.

  2. Recenter (optional, ``–recenter``) — unwrap the largest cluster across PBC and translate it to the box centre; write as a new trajectory. Always applied to the full system.

  3. Density profile (optional, ``–density``, requires ``–recenter``) — compute and average the radial monomer concentration (mM) over the last --n-frames-avg frames. Always computed from the full system.

Parallelism is over frames (--nproc): each worker process handles a contiguous chunk of the trajectory and writes its own temporary XTC fragment, which are merged in order at the end.

Example:

# Serial — cluster statistics only, select by segment name
scical aggr --top conf.psf --traj system.xtc --rcut 8.0 --sel R001-R099

# Multiple segment types
scical aggr --top conf.psf --traj system.xtc --rcut 8.0 \
    --sel R001-R099,K001-K010

# Single segment
scical aggr --top conf.psf --traj system.xtc --rcut 8.0 --sel R001

# 8 workers — with PBC recentering and radial density
scical aggr --top conf.psf --traj system.xtc --rcut 8.0 --sel R001-R099 \
    --recenter --density --dr 2.0 --n-frames-avg 100 --nproc 8