Entropy-Weighted Collapse Likelihood (EWCL)
EWCL quantifies local ensemble breadth directly from amino-acid sequence using residue patterning, charge–hydropathy context, and neighborhood weighting, without requiring structural inputs. In the sequence-only formulation (EWCLv1), each residue is assigned an ensemble-breadth score HEWCL ∈ [0,1], where larger values reflect sequence contexts statistically associated with broader, ensemble-like behavior, and lower values reflect a bias toward localized collapse or structural consolidation. A complementary index is defined as CEWCL = 1 − HEWCL, interpreted as a collapse-bias surrogate. These scores are not literal probabilities but sequence-derived surrogates for local organizational tendencies.
The structural extension, EWCL-P3, aligns the EWCL signal to experimentally resolved or predicted coordinates (PDB/AlphaFold). When structural coverage is partial or discontinuous, EWCL-P3 is evaluated on the structure-resolved subsequence and re-mapped back to residue coordinates; resulting values may differ from EWCLv1 due to context truncation, windowing near chain breaks, or missing regions.
This enables per-residue comparison of sequence-encoded bias with geometry-driven confidence measures (e.g., pLDDT, B-factors). Concordance or disagreement between the two can highlight: ensemble-favored segments in regions of high structural confidence, collapse-biased segments appearing structurally diffuse or low-confidence, candidate locations for stabilizing or destabilizing mutations (e.g., protein engineering or disorder tuning), and regions of sequence–structure mismatch that may warrant experimental validation.
EWCL is intended as an analytical and exploratory tool, not a replacement for structural or biophysical characterization.
Entropy-Weighted Collapse Likelihood (EWCL)
EWCL quantifies local ensemble breadth directly from amino-acid sequence using residue patterning, charge–hydropathy context, and neighborhood weighting, without requiring structural inputs. In the sequence-only formulation (EWCLv1), each residue is assigned an ensemble-breadth score HEWCL ∈ [0,1], where larger values reflect sequence contexts statistically associated with broader, ensemble-like behavior, and lower values reflect a bias toward localized collapse or structural consolidation. A complementary index is defined as CEWCL = 1 − HEWCL, interpreted as a collapse-bias surrogate. These scores are not literal probabilities but sequence-derived surrogates for local organizational tendencies.
The structural extension, EWCL-P3, aligns the EWCL signal to experimentally resolved or predicted coordinates (PDB/AlphaFold). When structural coverage is partial or discontinuous, EWCL-P3 is evaluated on the structure-resolved subsequence and re-mapped back to residue coordinates; resulting values may differ from EWCLv1 due to context truncation, windowing near chain breaks, or missing regions.
This enables per-residue comparison of sequence-encoded bias with geometry-driven confidence measures (e.g., pLDDT, B-factors). Concordance or disagreement between the two can highlight: ensemble-favored segments in regions of high structural confidence, collapse-biased segments appearing structurally diffuse or low-confidence, candidate locations for stabilizing or destabilizing mutations (e.g., protein engineering or disorder tuning), and regions of sequence–structure mismatch that may warrant experimental validation.
EWCL is intended as an analytical and exploratory tool, not a replacement for structural or biophysical characterization.