Generalized Adduct Intervals Problem: A Formal Mathematical Proof

April 23, 2025

A rigorous mathematical framework establishing optimal mass spacing strategies for mass spectrometry-based detection of combinatorial cyclic peptide libraries, providing provable upper bounds for multi-objective evolutionary algorithms in drug discovery.

Generalized Adduct Intervals Problem

Abstract

This work presents a rigorous mathematical framework for the Generalized Adduct Intervals Problem, establishing optimal mass spacing strategies for mass spectrometry-based detection of combinatorial cyclic peptide libraries. The framework provides provable upper bounds on library size that are essential for multi-objective evolutionary algorithms in drug discovery applications. By bridging algorithmic optimization theory with analytical chemistry constraints, this work enables principled design of large-scale peptide libraries while guaranteeing unique MS identification of each library member.

Interactive Exploration

Explore the Adduct Intervals Problem interactively below. Adjust ionization modes, mass ranges, and resolution to see how parameter choices affect library size and interval spacing. This visualizer demonstrates the core optimization challenge addressed by the mathematical framework presented in this work.

Ionization Mode

Ionization Method

Active Adducts

H⁺(1.008 Da)Na⁺(22.990 Da)K⁺(38.964 Da)NH₄⁺(18.034 Da)

Lower Bound (L): 100 Da

Upper Bound (U): 1000 Da

Resolution (T): 0.50 Da

Computed Parameters

Number of peptides (n):0

Spacing (δ):0.000 Da

Critical separation (κ):0

Valid configuration:✓ No overlaps

1. Background and Motivation

1.1 The Cyclic Peptide Library Challenge

Cyclic peptides have emerged as powerful therapeutic candidates due to their high biological activity, selectivity, target affinity, and proteolytic stability. Large combinatorial libraries of cyclic peptides can be synthesized using split-and-pool methods and screened against biological targets. However, a critical bottleneck exists: cyclic peptides produce complex fragment ions in mass spectrometry, making post-screening sequence determination problematic.

The fundamental question this work addresses is: Given mass spectrometry detection constraints, what is the maximum number of distinct cyclic peptides that can be included in a combinatorial library while ensuring each can be uniquely identified?

1.2 Mass Spectrometry Detection: The Physical Foundation

1.2.1 Fundamental Principle

Mass spectrometry detects charged species by measuring their mass-to-charge ratios in electromagnetic fields. Neutral molecules cannot be detected—ionization is mandatory, not optional.

Critical Implication: For a molecule with bare (neutral) mass $m$ :

Probability of observing bare mass: $P(m) = 0$
Observable masses: Only ionized adduct forms $m + a_j$ for $j \in \{1, \ldots, k\}$
Design parameter vs. observable: Bare mass $m$ is synthesized but never measured

1.2.2 Adduct Formation Process

During ionization, molecules acquire charge by forming adducts—complexes with charged species from the ionization environment. A single molecule with bare mass $m$ can form multiple distinct adduct species simultaneously, each appearing at a different mass-to-charge ratio.

General phenomenon:

For adduct set $A = \{a_1, a_2, \ldots, a_k\}$ :

Each peptide produces up to $k$ observable peaks
Peak $j$ appears at $m/z = m + a_j$
Adduct formation is stochastic and uncontrollable
Multiple adducts form simultaneously from the same molecule

Example for clarification:

Consider two peptides with H⁺ adduct ( $a_1 = 1.008$ Da):

Peptide A: bare mass $m_A = 500.0$ Da → observed at $501.008$ Da ✓
Peptide B: bare mass $m_B = 501.0$ Da → observed at $502.008$ Da ✓
Bare masses $m_A, m_B$ : never observed (neutral) ✗

Even though $m_B = 501.0$ numerically equals the observable peak $m_A + a_1 = 501.008$ , no ambiguity arises because $m_B$ itself is invisible to the instrument.

1.2.3 Detection Uncertainty

Each observed peak has intrinsic width $\pm T$ due to:

Instrumental resolution limits
Natural isotope distributions
Thermal/kinetic energy distributions
Digital sampling and signal processing

1.2.4 The Library Identification Challenge

For a combinatorial library with $n$ peptides having bare masses $M = \{m_0, m_1, \ldots, m_{n-1}\}$ :

Observable peaks: up to $kn$ (k adducts × n peptides)
Design variables: the $n$ bare masses $m_i$
Observables: the $kn$ peak intervals $I_j(m_i)$
Constraint: All $kn$ observable intervals must be non-overlapping

Challenge: Maximize $n$ subject to non-overlapping detection intervals for all adduct peaks.

1.2.5 Ionization Method Examples

Different ionization techniques produce characteristic adduct sets:

Method	Common Adducts	Typical k
ESI	H⁺, Na⁺, K⁺, NH₄⁺	3-4
APCI	H⁺, (H₂O)H⁺	1-2
MALDI	H⁺, Na⁺, K⁺, matrix	3-5
Multiply-charged	Various z states	Variable

The framework presented applies to any adduct set $A$ , regardless of ionization method or chemical composition.

1.3 Multi-Objective Optimization Context

In designing combinatorial peptide libraries, multiple competing objectives must be balanced. Example objectives include:

Maximize library diversity (number of unique peptides)
Maximize chemical space coverage (structural diversity)
Maximize synthetic accessibility (cost and feasibility)
Optimize cheminformatic descriptors (e.g., logP, TPSA, rotatable bonds, hydrogen bond donors/acceptors)
Maximize sequence diversity (residue composition, stereochemistry)

The specific objectives and their relative importance depend on the therapeutic target, screening strategy, and development stage. For beyond Rule of Five (BRoF) drugs like cyclic peptides, objectives often emphasize chemical space exploration over traditional small molecule drug-likeness metrics. This mathematical framework provides the theoretical upper bound for library size (objective 1), which serves as a critical constraint in multi-objective evolutionary algorithms (MOEAs) such as NSGA-III, regardless of which additional objectives are included.

1.4 Novel Contributions

This work uniquely:

Prevents adduct overlap through optimal design rather than post-hoc correction
Generalizes beyond specific adduct sets to arbitrary ionization conditions
Proves optimality within uniform-spacing constructions
Enables principled MOEA fitness functions with normalized objectives

2. Problem Statement and Theoretical Foundation

The core theoretical innovation of this work is the derivation of a formal mathematical proof for the Generalized Adduct Intervals Problem, which provides a complete mathematical foundation for mass differentiation optimization with arbitrary adduct sets.

2.1 Given Inputs

Range $R = [L, U] \subset \mathbb{R}$ where $L < U$
Set of adducts $A = \{a_1, a_2, \ldots, a_k\}$ where $a_1 < a_2 < \cdots < a_k$
Interval half-width $T > 0$
Critical Assumption: Adducts are well-separated: $\forall j < j', \; a_{j'} - a_j > 2T$

Physical Interpretation of Inputs:

Range $R = [L, U]$ : The instrument’s detectable mass-to-charge window, determined by instrument design and settings
Adduct set $A = \{a_1, \ldots, a_k\}$ : Mass shifts caused by ionization, ordered $a_1 < a_2 < \cdots < a_k$ $a_{1} < a_{2} < \dots < a_{k}$
- Examples: $\{1.008, 22.990, 38.964\}$ Da for H⁺/Na⁺/K⁺ in ESI
Interval half-width $T$ : Instrumental mass resolution representing peak width
- Example: $T = 0.5$ Da corresponds to ~500 ppm at 1000 Da
Critical Assumption: $\forall j < j', \; a_{j'} - a_j > 2T$ $\forall j < j^{'}, a_{j^{'}} - a_{j} > 2 T$
- Physical justification: Different adduct peaks of the same peptide must not overlap
- Ensures $I_j(m_i) \cap I_{j'}(m_i) = \emptyset$ for same mass, different adducts

2.2 Mathematical Formulation

2.2.1 Design Variables and Observables

Design Variables: Bare (neutral) peptide masses $M = \{m_0, m_1, \ldots, m_n\}$

Controllable through peptide sequence selection
Never observed (neutral species undetectable)
Theoretical constructs for library design

Observable Quantities: Peak intervals $I_j(m_i)$ for all $(i,j)$ pairs

Uncontrollable (adduct formation is stochastic)
Actually measured by the instrument
The only signals in experimental mass spectra

2.2.2 Objective and Constraints

Objective: Construct mass set $M$ to maximize $|M|$ (library size)

Constraint 1 (Interval Definition): Each designed mass $m_i$ produces $k$ observable peaks. Peak $j$ from mass $i$ creates interval:

$I_j(m_i) = (m_i + a_j - T, m_i + a_j + T)$

Physical meaning: Any signal detected in this interval could be peptide $i$ with adduct $j$ .

Constraint 2 (Non-Overlap): All observable intervals must be pairwise disjoint:

$\forall (i,j) \neq (i',j'), \quad I_j(m_i) \cap I_{j'}(m_{i'}) = \emptyset$

Physical meaning: Unique identification—no ambiguity about which peptide-adduct pair produced each observed peak.

Constraint 3 (Range Coverage): All observable peaks must fall within detection range:

$\forall i \in \{0, \ldots, n\}, \; \forall j \in \{1, \ldots, k\}, \quad I_j(m_i) \subseteq R$

Physical meaning: All adduct forms of all peptides are detectable by the instrument.

Note on Bare Mass Overlaps:

The non-overlap constraint applies only to the $kn$ observable intervals, not the $n$ bare masses. Bare masses can overlap with:

Other bare masses (both never observed)
Adduct peaks from other peptides (bare mass never observed)

This is permissible because $P(\text{observing } m_i) = 0$ for all $i$ .

2.3 Validity Conditions

For the construction to accommodate at least one mass, we require: $U - L > a_k - a_1 + 2T$

Proof of Necessity:

If $U - L \leq a_k - a_1 + 2T$ , consider any mass $m$ that could fit both its smallest and largest adduct intervals within $R$ .

The span required for all adduct intervals of mass $m$ is: $\text{Span} = [m + a_1 - T, m + a_k + T]$ $\text{Width} = (m + a_k + T) - (m + a_1 - T) = a_k - a_1 + 2T$

For this to fit in $R = [L, U]$ : $U - L \geq a_k - a_1 + 2T$

Strict inequality ensures at least one mass can be accommodated: $U - L > a_k - a_1 + 2T \quad \square$

2.4 Applicability

This formulation generalizes beyond specific cases (such as three-adduct systems with $[\text{M+H}]^+$ , $[\text{M+Na}]^+$ , $[\text{M+K}]^+$ ) to provide optimal solutions for arbitrary adduct sets, applicable to diverse mass spectrometry ionization conditions and instrumental configurations.

3. Fundamental Theorems

Theorem 3.1 (Optimal Spacing for Same-Adduct Non-Overlap)

Statement: For intervals of the same adduct type from consecutive masses to be non-overlapping, the minimum spacing between consecutive masses is $\delta = 2T$ .

Proof :

Consider intervals $I_j(m_i)$ and $I_j(m_{i+1})$ for some fixed $j \in \{1, 2, \ldots, k\}$ .

$I_j(m_i) = (m_i + a_j - T, m_i + a_j + T)$ $I_j(m_{i+1}) = (m_{i+1} + a_j - T, m_{i+1} + a_j + T)$

For non-overlap, we require: $m_i + a_j + T \leq m_{i+1} + a_j - T$

Simplifying: $m_i + T \leq m_{i+1} - T$ $m_{i+1} - m_i \geq 2T$

Let $\delta = m_{i+1} - m_i$ be the spacing between consecutive masses. Then $\delta \geq 2T$ .

For optimal density (maximum number of masses), we choose the minimum allowable spacing: $\boxed{\delta = 2T}$

Optimality Argument:

The spacing $\delta = 2T$ is not merely necessary but also sufficient:

Necessity: Any $\delta' < 2T$ causes consecutive same-adduct intervals to overlap
Sufficiency: With $\delta = 2T$ , same-adduct intervals exactly touch at boundaries
Uniqueness: Any $\delta' > 2T$ wastes space, reducing maximum library size

Therefore, $\delta = 2T$ is the unique optimal spacing for maximizing density under same-adduct constraints.

■

Theorem 3.2 (Critical Constraint for Cross-Adduct Non-Overlap)

Statement: The critical constraint determining the maximum number of consecutive masses occurs between $I_k(m_i)$ and $I_1(m_{i+\kappa})$ for some $\kappa \geq 1$ .

Proof :

Consider any two intervals $I_j(m_i)$ and $I_{j'}(m_{i'})$ where $(i,j) \neq (i',j')$ and $i < i'$ .

For non-overlap: $m_i + a_j + T < m_{i'} + a_{j'} - T$

This gives us: $m_{i'} - m_i > a_j - a_{j'} + 2T$

To find the most restrictive constraint, we need to maximize the right-hand side. This occurs when:

$a_j$ is maximized: $a_j = a_k$ (largest adduct)
$a_{j'}$ is minimized: $a_{j'} = a_1$ (smallest adduct)
$i' - i$ is minimized subject to satisfying the constraint

Therefore, the critical constraint is: $m_{i+\kappa} - m_i > a_k - a_1 + 2T$

where $\kappa$ is the minimum integer such that this constraint is satisfied.

■

Lemma 3.3 (Determination of Critical Parameter )

Statement: With uniform spacing $\delta = 2T$ , the critical parameter is: $\kappa = \left\lceil\frac{a_k - a_1}{2T} + 1\right\rceil$

Proof :

With uniform spacing $\delta = 2T$ , we have: $m_{i+\kappa} = m_i + \kappa \cdot 2T$

The critical constraint from Theorem 2.2 becomes: $m_i + \kappa \cdot 2T - m_i > a_k - a_1 + 2T$ $\kappa \cdot 2T > a_k - a_1 + 2T$ $\kappa > \frac{a_k - a_1 + 2T}{2T} = \frac{a_k - a_1}{2T} + 1$

Since $\kappa$ must be a positive integer: $\boxed{\kappa = \left\lceil\frac{a_k - a_1}{2T} + 1\right\rceil}$

Physical Interpretation: $\kappa$ represents the minimum separation (in number of masses) required between two masses to ensure their most extreme adduct intervals don’t overlap.

■

Theorem 3.4 (Optimal Offset for Boundary Alignment)

Statement: To maximize range utilization by ensuring the leftmost interval starts at $L$ , the optimal offset is: $\varepsilon = T - a_1$

Proof :

Let $m_i = L + i \cdot \delta + \varepsilon$ for $i = 0, 1, 2, \ldots$

The leftmost interval is $I_1(m_0) = (m_0 + a_1 - T, m_0 + a_1 + T)$ .

For this interval to start exactly at $L$ : $m_0 + a_1 - T = L$ $L + \varepsilon + a_1 - T = L$ $\boxed{\varepsilon = T - a_1}$

■

Theorem 3.5 (Explicit Mass Construction Formula)

Statement: With optimal parameters $\delta = 2T$ and $\varepsilon = T - a_1$ , the mass values are: $M = \{L + T - a_1, L + 3T - a_1, L + 5T - a_1, \ldots, L + (2n+1)T - a_1\}$

Or equivalently: $m_i = L + (2i+1)T - a_1 \quad \text{for } i = 0, 1, 2, \ldots, n$

Proof :

Substituting the optimal parameters into the general form: $m_i = L + i \cdot \delta + \varepsilon$ $m_i = L + i \cdot 2T + (T - a_1)$ $m_i = L + 2iT + T - a_1$ $\boxed{m_i = L + (2i+1)T - a_1}$

■

Theorem 3.6 (Maximum Number of Masses)

Statement: The maximum number of masses that can be accommodated is: $n_{\max} = \left\lfloor\frac{U - L - a_k + a_1 - 2T}{2T}\right\rfloor$

with the total number of masses being $n_{\max} + 1$ (since we index from 0).

Proof :

For all intervals to lie within $R = [L, U]$ , the rightmost interval $I_k(m_n)$ must satisfy: $m_n + a_k + T \leq U$

Substituting $m_n = L + (2n+1)T - a_1$ : $L + (2n+1)T - a_1 + a_k + T \leq U$ $L + 2nT + T - a_1 + a_k + T \leq U$ $L + 2nT + a_k - a_1 + 2T \leq U$ $2nT \leq U - L - a_k + a_1 - 2T$ $n \leq \frac{U - L - a_k + a_1 - 2T}{2T}$

Since $n$ must be a non-negative integer: $\boxed{n_{\max} = \left\lfloor\frac{U - L - a_k + a_1 - 2T}{2T}\right\rfloor}$

Note: If $n_{\max} < 0$ , the range is too small to accommodate any mass with all its adduct intervals.

4. Correctness and Optimality Guarantees

■

Theorem 4.1 (Greedy Interval Packing Algorithm)

Definition 4.1

(Non-overlap Constraint): Intervals $I_j(m_i)$ and $I_{j'}(m_{i'})$ are non-overlapping if and only if: $m_i + a_j + T < m_{i'} + a_{j'} - T \quad \text{or} \quad m_{i'} + a_{j'} + T < m_i + a_j - T$

Equivalently: $m_i < m_{i'} + a_{j'} - a_j - 2T \quad \text{or} \quad m_i > m_{i'} + a_{j'} - a_j + 2T$

Definition 4.2

(Forbidden Zone): Given previously placed mass $m_\ell$ and adduct indices $j, j' \in \{1, \ldots, k\}$ , the forbidden zone is: $\mathcal{F}_{\ell,j,j'} = [m_\ell + a_{j'} - a_j - 2T, m_\ell + a_{j'} - a_j + 2T]$

A candidate mass $m$ violates non-overlap with $I_{j'}(m_\ell)$ if $I_j(m) \cap I_{j'}(m_\ell) \neq \emptyset$ , which occurs precisely when $m \in \mathcal{F}_{\ell,j,j'}$ .

Definition 4.3

(Cumulative Forbidden Set): After placing masses $\mathcal{M}_i = \{m_0, m_1, \ldots, m_{i-1}\}$ , the cumulative forbidden set is: $\mathcal{F}_i = \bigcup_{\ell=0}^{i-1} \bigcup_{j=1}^k \bigcup_{j'=1}^k \mathcal{F}_{\ell,j,j'}$

Lemma 4.4

(Valid Position Characterization): A mass $m$ is valid for the $i$ -th position if and only if:

$m \in [L, U]$ (within mass range)
$m \geq m_{i-1} + 2T$ (minimum spacing for same-mass intervals)
$m \notin \mathcal{F}_i$ (avoids all forbidden zones)

Proof of Lemma 4.4

Conditions (1) and (2) are immediate requirements. Condition (3) follows from Definition 4.2: $m \notin \mathcal{F}_i$ if and only if $m$ satisfies the non-overlap constraint with all previously placed intervals.

Definition 4.5

(Greedy Mass Sequence): The greedy sequence is defined recursively:

$m_0 = L + T - a_1$

$m_i = \inf\{m \in [L, U] : m \geq m_{i-1} + 2T \text{ and } m \notin \mathcal{F}_i\}$

with termination when $m_i + a_k + T > U$ or no valid $m_i$ exists.

Lemma 4.6

(Well-definedness): The infimum in Definition 4.5 is attained and yields $m_i \in [L, U]$ for all $i$ before termination.

Proof of Lemma 4.6

Since $\mathcal{F}_i$ is a finite union of closed intervals, the set $S_i = [m_{i-1} + 2T, U] \setminus \mathcal{F}_i$ is either empty or a finite union of intervals.

If $S_i$ is empty, the algorithm terminates.

If $S_i$ is non-empty, the infimum equals the left endpoint of the leftmost available gap. Since forbidden zones are closed intervals, this infimum is attained and belongs to $S_i$ .

Theorem 4.7

(Greedy Optimality): The greedy algorithm produces a sequence of maximum cardinality among all valid sequences satisfying the non-overlap constraint.

Proof :

Let $\mathcal{G} = (g_0, g_1, \ldots, g_{n_g-1})$ be the greedy sequence and $\mathcal{O} = (o_0, o_1, \ldots, o_{n_o-1})$ be any other valid sequence. We prove $n_g \geq n_o$ by showing $g_i \leq o_i$ for all $i < \min(n_g, n_o)$ .

Base case: By construction, $g_0 = L + T - a_1$ is the minimum valid starting position. Thus $g_0 \leq o_0$ .

Inductive step: Assume $g_j \leq o_j$ for all $j < i$ .

For any $\ell < i$ , since $g_\ell \leq o_\ell$ , the forbidden zones generated by the greedy sequence are left-shifted or equivalent compared to those generated by $\mathcal{O}$ . Therefore, valid positions for $\mathcal{O}$ at step $i$ include or precede valid positions for $\mathcal{G}$ .

Since $g_i$ is the earliest valid position for $\mathcal{G}$ and $o_i$ is a valid position for $\mathcal{O}$ , we have $g_i \leq o_i$ .

By induction, $g_i \leq o_i$ for all $i < \min(n_g, n_o)$ .

If $n_g < n_o$ , then after placing $g_{n_g-1}$ , no valid position exists for greedy. But since $g_i \leq o_i$ for all $i < n_g$ and $\mathcal{O}$ successfully places $o_{n_g}$ , this contradicts the greedy termination condition.

Therefore, $n_g \geq n_o$ , proving optimality. ✓

■

Remark

: This greedy algorithm is provably correct by construction—it explicitly verifies all non-overlap constraints through forbidden zone checking. The previous approach using fixed $\kappa$ -based stepping fails to prevent overlaps when $i' - i < \kappa$ for certain adduct configurations.

Theorem 4.2 (Asymptotic Density)

Statement: For the greedy sequence with $U - L \gg a_k - a_1$ , the achieved spacing approaches $\delta = 2T$ and the number of masses is asymptotically: $n \approx \frac{U - L - (a_k - a_1 + 2T)}{2T} + 1$

Proof :

When $U - L$ is large compared to $a_k - a_1$ , boundary effects at $L$ and $U$ become negligible.

In the interior of $[L, U]$ , if no forbidden zones beyond the minimum $2T$ spacing interfere, the greedy algorithm places masses at uniform spacing $m_i - m_{i-1} = 2T$ , which is the minimum required for same-mass intervals to not overlap (Theorem 3.1).

For generic adduct sets where cross-adduct forbidden zones do not create long-range obstructions, the greedy placement maintains this uniform spacing, yielding the asymptotic density formula. The floor function in practice accounts for discrete positions and boundary constraints. ✓

Note on Correctness vs. Density: The key advantage of the greedy algorithm over previous approaches is correctness—it guarantees zero overlaps by construction through explicit forbidden zone checking. The density achieved depends on the specific adduct configuration and whether forbidden zones create gaps, but the greedy algorithm provably achieves the maximum possible density for any given adduct set (Theorem 4.7).

5. Implementation Summary

■

5.1 Greedy Algorithm Implementation

Given inputs $R = [L, U]$ , $A = \{a_1, \ldots, a_k\}$ , and $T$ :

Verify Validity: Check that $U - L > a_k - a_1 + 2T$ and $\forall j < j', \; a_{j'} - a_j > 2T$
Initialize: Set $m_0 = L + T - a_1$ and $i = 0$
Greedy Placement Loop:
- Compute Forbidden Zones: For all $\ell < i$ , $j, j' \in \{1, \ldots, k\}$ : $\mathcal{F}_{\ell,j,j'} = [m_\ell + a_{j'} - a_j - 2T, m_\ell + a_{j'} - a_j + 2T]$
- Merge Overlapping Zones: Construct $\mathcal{F}_i = \bigcup_{\ell,j,j'} \mathcal{F}_{\ell,j,j'}$ by sorting and merging overlapping intervals
- Find Next Valid Position: Search for: $m_i = \min\{m \geq m_{i-1} + 2T : m \notin \mathcal{F}_i \text{ and } m + a_k + T \leq U\}$
- Termination: If no valid $m_i$ exists or $m_i + a_k + T > U$ , stop. Otherwise, increment $i$ and repeat.
Output: Return sequence $(m_0, m_1, \ldots, m_{n-1})$ with $n$ total masses

5.2 Computational Complexity

Per-iteration Complexity: $O(i \cdot k^2)$ to compute forbidden zones for the $i$ -th mass
Merging Complexity: $O(i \cdot k^2 \cdot \log(i \cdot k^2))$ to sort and merge zones
Total Time Complexity: $O(n^2 \cdot k^2 \cdot \log(n \cdot k^2))$ where $n$ is the final number of masses
Space Complexity: $O(n \cdot k^2)$ for storing forbidden zones
Optimizations: Incremental zone updates and spatial indexing can reduce complexity in practice

Note: While asymptotically slower than the fixed-stepping approach, the greedy algorithm guarantees correctness. For typical problem sizes in combinatorial library design ( $n \sim 10^2$ - $10^3$ , $k \sim 3$ - $5$ ), the computational cost remains negligible.

5.3 Application Examples Across Ionization Methods

The algorithm applies uniformly to any adduct set. We demonstrate with multiple realistic scenarios.

Interactive Exploration: The scenarios below can be explored using the interactive visualizer in the Interactive Exploration section above. Toggle between ionization modes and adjust parameters to see real-time updates.

The following sections provide detailed worked examples corresponding to common experimental scenarios.

5.3.1 Standard ESI-MS: The Three-Adduct Case

Experimental Context: Electrospray ionization time-of-flight mass spectrometry for cyclic peptide libraries

Given Parameters:

Adduct set: $A = \{1.008, 22.990, 38.964\}$ Da (H⁺, Na⁺, K⁺)
Interval half-width: $T = 0.5$ Da (typical TOF resolution)
Mass range: $R = [100, 1000]$ Da (peptide detection window)
Derived: $a_1 = 1.008$ , $a_k = 38.964$ , $L = 100$ , $U = 1000$

Step 1: Verify Validity Conditions

Range check: $U - L > a_k - a_1 + 2T$ $1000 - 100 > 38.964 - 1.008 + 2(0.5)$ $900 > 37.956 + 1.0 = 38.956 \quad \checkmark$

Adduct separation check: $\min_{j<j'}(a_{j'} - a_j) = \min\{21.982, 15.974\} = 15.974 > 2T = 1.0 \quad \checkmark$

Step 2: Compute Algorithm Parameters

Spacing: $\delta = 2T = 2(0.5) = 1.0 \text{ Da}$

Offset: $\varepsilon = T - a_1 = 0.5 - 1.008 = -0.508 \text{ Da}$

Maximum count: $n_{\max} = \left\lfloor\frac{U - L - a_k + a_1 - 2T}{2T}\right\rfloor$ $= \left\lfloor\frac{1000 - 100 - 38.964 + 1.008 - 1.0}{1.0}\right\rfloor$ $= \left\lfloor\frac{861.044}{1.0}\right\rfloor = 861$

Total masses: $n_{\max} + 1 = 862$ (indexed from $i=0$ to $i=861$ )

Critical separation: $\kappa = \left\lceil\frac{a_k - a_1}{2T} + 1\right\rceil = \left\lceil\frac{37.956}{1.0} + 1\right\rceil = \lceil 38.956 \rceil = 39$

Step 3: Generate Mass Values

Formula: $m_i = L + (2i+1)T - a_1 = 100 + (2i+1)(0.5) - 1.008$

Index $i$	Calculation	Mass $m_i$ (Da)
0	$100 + 1(0.5) - 1.008$	99.492
1	$100 + 3(0.5) - 1.008$	100.492
2	$100 + 5(0.5) - 1.008$	101.492
3	$100 + 7(0.5) - 1.008$	102.492
…	…	…
38	$100 + 77(0.5) - 1.008$	137.492
39	$100 + 79(0.5) - 1.008$	138.492
…	…	…
860	$100 + 1721(0.5) - 1.008$	959.492
861	$100 + 1723(0.5) - 1.008$	960.492

Step 4: Demonstrate Observable Intervals

For $m_0 = 99.492$ Da, the three adduct peaks appear at:

$I_1(m_0) = (99.492 + 1.008 - 0.5, 99.492 + 1.008 + 0.5) = (100.000, 101.000)$ $I_2(m_0) = (99.492 + 22.990 - 0.5, 99.492 + 22.990 + 0.5) = (121.982, 122.982)$ $I_3(m_0) = (99.492 + 38.964 - 0.5, 99.492 + 38.964 + 0.5) = (137.956, 138.956)$

For $m_1 = 100.492$ Da:

$I_1(m_1) = (101.000, 102.000)$ $I_2(m_1) = (122.982, 123.982)$ $I_3(m_1) = (138.956, 139.956)$

Step 5: Verify Non-Overlap (Sample Cases)

Case A: Same adduct, consecutive masses

Compare $I_1(m_0)$ and $I_1(m_1)$ :

Upper bound of $I_1(m_0)$ : 101.000
Lower bound of $I_1(m_1)$ : 101.000
Separation: 0 (boundary touch, non-overlapping) ✓

Case B: Different adducts, same mass

Compare $I_1(m_0)$ and $I_2(m_0)$ :

Gap: $121.982 - 101.000 = 20.982 > 0$ ✓

Case C: Critical separation—extreme adducts

Compare $I_3(m_0)$ and $I_1(m_{39})$ (at critical $\kappa = 39$ steps):

$I_3(m_0)$ upper: $99.492 + 38.964 + 0.5 = 138.956$
$I_1(m_{39})$ lower: $138.492 + 1.008 - 0.5 = 139.000$
Gap: $139.000 - 138.956 = 0.044$ Da ✓

This confirms masses separated by $\kappa$ steps have their extreme adduct intervals just barely non-overlapping.

Physical Interpretation:

862 distinct peptides uniquely identifiable
2,586 observable peaks (862 × 3 adducts)
1.0 Da spacing optimal for 0.5 Da resolution
39-step separation required for K⁺/H⁺ distinction
899.956 Da coverage of 900 Da available range (99.995% utilization)

5.3.2 Reduced Adduct Set: APCI with Two Adducts

Experimental Context: Atmospheric pressure chemical ionization (fewer adduct types)

Given:

$A = \{1.008, 18.034\}$ Da (H⁺, NH₄⁺ only)
$T = 0.3$ Da (higher resolution)
$R = [200, 800]$ Da

Calculation: $\delta = 2(0.3) = 0.6 \text{ Da}$ $\varepsilon = 0.3 - 1.008 = -0.708 \text{ Da}$ $n_{\max} = \left\lfloor\frac{600 - 17.026 - 0.6}{0.6}\right\rfloor = 970$ $\kappa = \left\lceil\frac{17.026}{0.6} + 1\right\rceil = 30$

Result: 971 peptides (13% increase over ESI case)

Analysis: Fewer adducts + better resolution → larger library capacity

5.3.3 Ultra-High Resolution: Orbitrap with Multiple Adducts

Experimental Context: Orbitrap mass spectrometry with extended adduct set

Given:

$A = \{1.008, 18.034, 22.990, 38.964, 54.938\}$ Da (H⁺, NH₄⁺, Na⁺, K⁺, 2Na⁺-H⁺)
$T = 0.01$ Da (R = 100,000 at m/z 1000)
$R = [500, 1500]$ Da

Calculation: $\delta = 2(0.01) = 0.02 \text{ Da}$ $n_{\max} = \left\lfloor\frac{1000 - 53.930 - 0.02}{0.02}\right\rfloor = 47,303$

Result: 47,304 peptides (55× improvement over standard ESI!)

Analysis: Resolution dominates capacity—ultra-high resolution enables massive libraries despite more adducts

5.3.4 Comparative Analysis

Method	k	T (Da)	Range (Da)	n_max	Peaks	Limiting Factor
ESI-TOF	3	0.5	900	862	2,586	Resolution
APCI	2	0.3	600	971	1,942	Range
Orbitrap	5	0.01	1000	47,304	236,520	Range

Key Insights:

Resolution impact: Halving $T$ approximately doubles capacity (inverse linear relationship)
Adduct count: More adducts reduce capacity, but effect is sublinear
Range size: Direct linear impact on capacity
Optimization trade-off: High resolution compensates for many adducts

Practical Implications:

Standard ESI-TOF: ~1,000 peptide libraries feasible
High-resolution instruments: ~50,000 peptide libraries achievable
Method selection: Balance resolution, range, and adduct complexity
Cost-benefit: Instrument resolution is the highest-leverage parameter

6. Connections to Existing Theory

6.1 Relationship to Classical Interval Scheduling

This problem extends the classical interval scheduling problem from computer science in several novel ways:

Classical Interval Scheduling	Generalized Adduct Intervals Problem
Given set of intervals	Generates intervals from masses
One interval per task	$k$ intervals per mass (one per adduct)
Select subset to maximize count	Construct masses to maximize count
Arbitrary interval positions	Uniform spacing required for synthesis
Greedy algorithm optimal	Constructive algorithm with proven bounds

While classical interval scheduling uses a greedy algorithm selecting intervals by earliest finishing time, our problem requires a constructive approach that generates optimally-spaced masses.

6.2 Gap in Mass Spectrometry Literature

Existing MS tools focus on post-acquisition handling of adduct overlaps:

SWARM: Corrects ESI mass spectra for signal overlap after data collection
AdductHunter: Identifies protein-metal complex adducts using constraint optimization
MzAdan: Annotates adducts in existing spectra

This approach is fundamentally different: it prevents overlap by design through optimal mass spacing, enabling larger libraries with guaranteed MS resolution.

6.3 Bridge to Combinatorial Library Design

Combinatorial cyclic peptide libraries face unique challenges:

Complex fragmentation patterns in MS
Need for post-screening sequence determination
Trade-off between library size and analytical feasibility

This framework provides the missing theoretical foundation for determining maximum library size given MS constraints.

7. Applications to Multi-Objective Optimization

7.1 Integration with Multi-Objective Evolutionary Algorithms

The theoretical upper bound serves multiple critical functions in MOEAs:

7.1.1 Fitness Function Normalization

For a multi-objective problem with objectives:

$f_1$ : Number of peptides in library
$f_2$ : Chemical diversity metric
$f_3$ : Synthesis feasibility score

The normalized fitness for $f_1$ becomes: $\hat{f}_1 = \frac{n_{actual}}{n_{\max}} \in [0,1]$

where $n_{\max}$ is computed using our formula.

7.1.2 Constraint Handling

Solutions proposing more than $n_{\max}$ peptides are automatically infeasible: $g(x) = n_{proposed}(x) - n_{\max} \leq 0$

This hard constraint prevents the algorithm from exploring impossible regions of the search space.

7.1.3 Reference Point Generation (NSGA-III)

NSGA-III uses reference points to maintain diversity. The theoretical maximum helps define the aspiration level for the “number of peptides” objective, ensuring reference points are realistically achievable.

7.2 Example: NSGA-III Implementation

def calculate_upper_bound(mass_range, adducts, resolution)
    """
    Calculate maximum number of peptides using our formula

    Args:
        mass_range: tuple (L, U) defining mass range
        adducts: list of adduct masses [a1, a2, ..., ak]
        resolution: half-width T of detection intervals

    Returns:
        n_max: maximum number of distinguishable peptides
    """
    L, U = mass_range
    a_min, a_max = min(adducts), max(adducts)
    n_max = floor((U - L - a_max + a_min - 2*resolution) / (2*resolution))
    return max(0, n_max)

def fitness_function(solution, upper_bound)
    """
    Multi-objective fitness evaluation

    Returns:
        objectives: [normalized_count, diversity, feasibility]
        constraints: [count_constraint]
    """
    n_peptides = len(solution.peptides)

    # Objective 1: Normalized peptide count
    f1 = n_peptides / upper_bound

    # Objective 2: Chemical diversity (example metric)
    f2 = calculate_diversity(solution.peptides)

    # Objective 3: Synthesis feasibility
    f3 = evaluate_synthesis_feasibility(solution)

    # Constraint: Cannot exceed theoretical maximum
    g = n_peptides - upper_bound

    return [f1, f2, f3], [g]

7.3 Dynamic Bound Adjustment

As the MOEA explores different conditions:

Different adduct sets (varying ionization conditions)
Different mass ranges (instrument capabilities)
Different resolution requirements (instrument settings)

The upper bound can be dynamically recalculated to provide accurate constraints for each scenario.

7.4 Performance Metrics

The theoretical bound enables rigorous performance assessment:

$\text{Efficiency} = \frac{\text{Achieved Library Size}}{\text{Theoretical Maximum}} \times 100\%$

This metric allows fair comparison across different:

Algorithms (NSGA-II vs. NSGA-III vs. MOEA/D)
Problem instances (different mass ranges or adduct sets)
Design strategies (uniform vs. adaptive spacing)

8. Conclusions

This mathematical framework provides:

Rigorous Foundation: Complete proofs for all constraints and optimality claims within uniform-spacing constructions
General Applicability: Works for arbitrary adduct sets beyond standard configurations
Practical Implementation: Direct translation to efficient algorithms with O(n) generation complexity
MOEA Integration: Essential upper bounds for fitness normalization and constraint handling
Extensibility: Framework can be adapted for non-uniform spacing or additional constraints

Significance for Drug Discovery

This work bridges a critical gap between:

Theoretical computer science (interval scheduling algorithms)
Analytical chemistry (mass spectrometry constraints)
Drug discovery (combinatorial library design)
Optimization theory (multi-objective evolutionary algorithms)

By providing provable upper bounds on library size, this framework enables:

Realistic goal-setting for library synthesis projects
Efficient resource allocation for high-throughput screening
Principled comparison of different library design strategies
Guaranteed MS-resolvability of all library members

Future Directions

Extension to non-uniform spacing: Investigate whether relaxing uniform spacing can increase $n_{\max}$
Multiple charge states: Extend framework to handle $z > 1$ charge states
Tandem MS constraints: Incorporate MS/MS fragmentation patterns
Machine learning integration: Use bounds to guide neural architecture search for library design
Experimental validation: Synthesize optimally-spaced libraries and verify MS resolution

The theoretical foundation presented here directly informs optimization algorithms for mass spectrometry experimental design, enabling maximum differentiation of molecular species while guaranteeing no spectral overlaps. This represents a significant advance in the rational design of large-scale cyclic peptide libraries for drug discovery applications.

References

[1] K. Deb and H. Jain, “An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 577-601, 2014.

[2] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182-197, 2002.

[3] I. Das and J. E. Dennis, “Normal-boundary intersection: A new method for generating the Pareto surface in nonlinear multicriteria optimization problems,” SIAM Journal on Optimization, vol. 8, no. 3, pp. 631-657, 1998.

[4] J. Kleinberg and E. Tardos, Algorithm Design. Boston, MA: Addison-Wesley, 2005.

[5] “Interval Scheduling,” Algorithm Notes. [Online]. Available: https://stumash.github.io/Algorithm_Notes/greedy/intervals/intervals.html

[6] S. H. Joo et al., “High-throughput sequence determination of cyclic peptide library members by partial Edman degradation/mass spectrometry,” Journal of the American Chemical Society, vol. 128, no. 39, pp. 13000-13009, 2006.

[7] J. E. Redman et al., “Automated mass spectrometric sequence determination of cyclic peptide library members,” Journal of Combinatorial Chemistry, vol. 5, no. 1, pp. 33-40, 2003.

[8] P. I. Kitov et al., “Sliding Window Adduct Removal Method (SWARM) for enhanced electrospray ionization mass spectrometry binding data,” Journal of The American Society for Mass Spectrometry, vol. 30, no. 8, pp. 1446-1454, 2019.

[9] A. F. M. Gavriilidou et al., “AdductHunter: identifying protein-metal complex adducts in mass spectra,” Journal of Cheminformatics, vol. 16, no. 1, p. 15, 2024.

[10] C. C. Coello, G. B. Lamont, and D. A. Van Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems. New York: Springer, 2007.

[11] K. Miettinen, Nonlinear Multiobjective Optimization. Boston, MA: Kluwer Academic Publishers, 1999.