KL-DRO

class dro.linear_model.kl_dro.KLDRO(input_dim, model_type='svm', fit_intercept=True, solver='MOSEK', kernel='linear', eps=0.0)

Bases: BaseLinearDRO

Kullback-Leibler divergence-based Distributionally Robust Optimization (KL-DRO) model.

This model minimizes a KL-robust loss function for both regression and classification.

Reference: <https://optimization-online.org/wp-content/uploads/2012/11/3677.pdf>

Initialize KL-divergence Distributionally Robust Optimization model.

Inherits from BaseLinearDRO and configures KL ambiguity set parameters. The ambiguity set is defined by:

\[\mathcal{Q} = \{ Q \ll P \, | \, D_{KL}(Q\|P) \leq \epsilon \}\]

where \(D_{KL}\) is Kullback–Leibler divergence.

Parameters:
  • input_dim (int) – Dimension of input features. Must match training data features.

  • model_type (str) –

    Base model architecture. Supported:

    • 'svm': Hinge loss (classification)

    • 'logistic': Logistic loss (classification)

    • 'ols': Least squares (regression)

    • 'lad': Least absolute deviation (regression)

  • fit_intercept (bool) – If True, adds intercept term \(b\) to linear model: \(\theta^T X + b\) Disable when data is pre-centered.

  • solver (str) –

    Convex optimization solver, supported values:

    • 'MOSEK' (recommended): Requires academic/commercial license

  • kernel (str) – the kernel type to be used in the optimization model, default = ‘linear’

  • eps (float) –

    KL divergence bound (ε ≥ 0). Special cases:

    • ε = 0: Reduces to standard empirical risk minimization (no distributional robustness)

    • ε → ∞: Approaches worst-case distribution (maximally conservative). Typical practical range: 0.01 ≤ ε ≤ 5.0

Raises:

ValueError

  • If input_dim ≤ 0

  • If eps < 0

  • If unsupported solver specified

Attribute Initialization:

  • self.dual_variable: Stores optimal dual variable λ* after calling fit()

  • self._p: Internal probability vector of shape (n_samples,)

  • self._solver_opts: Solver-specific options parsed from global config

Example:
>>> model = KLDRO(input_dim=5, model_type='logistic', eps=0.1)
>>> model.input_dim  # 5
>>> model.eps  # 0.1
>>> model.dual_variable  # None (until fit is called)
update(config)

Update KL-DRO model configuration parameters dynamically.

Primarily handles robustness parameter (eps) updates while maintaining optimization problem structure. Preserves existing dual variables until next fit() call.

Parameters:

config (Dict[str, Any]) –

Configuration dictionary containing parameters to update. Recognized keys:

  • eps: (float) New KL divergence bound (ε ≥ 0). Other keys are silently ignored.

Raises:
  • KLDROError

    • If eps value is invalid (not float/int or negative)

    • If provided eps > 100.0 (empirical stability threshold)

  • TypeError – If config is not a dictionary

Return type:

None

Example:
>>> model = KLDRO(eps=0.5)
>>> model.update({"eps": 0.8})
>>> model.eps  # 0.8
>>> model.update({"invalid_key": 1.0})  # No-op
fit(X, y)

Solve KL-constrained distributionally robust optimization problem.

Constructs and solves the convex optimization problem:

\[\min_{\theta,b} \quad \sup_{Q \in \mathcal{Q}} \mathbb{E}_Q[\ell(\theta,b;X,y)],\quad \text{s.t.} \quad D_{KL}(Q\|P) \leq \epsilon\]

where \(\mathcal{Q}\) is the ambiguity set defined by KL divergence constraint.

Parameters:
  • X (numpy.ndarray) – Training feature matrix of shape (n_samples, n_features). Must satisfy n_features == self.input_dim.

  • y (numpy.ndarray) –

    Target values of shape (n_samples,). Format requirements:

    • Classification: Binary labels in {-1, +1}

    • Regression: Continuous real values

Returns:

Solution dictionary containing:

  • theta: Weight vector of shape (n_features,)

  • b: Intercept term (present if fit_intercept=True)

  • dual: Optimal dual variable for KL constraint (λ*)

Return type:

Dict[str, Any]

Raises:
  • KLDROError

    • If problem is infeasible with current parameters

    • If solver fails to converge

  • ValueError

    • If X.shape[1] != self.input_dim

    • If X.shape[0] != y.shape[0]

    • If classification labels not in {-1, +1}

Example:
>>> model = KLDRO(input_dim=3, eps=0.1)
>>> X = np.random.randn(100, 3)
>>> y = np.sign(np.random.randn(100))  # Binary classification
>>> solution = model.fit(X, y)
>>> print(solution["theta"].shape)  # (3,)
>>> print(f"Dual variable: {solution['dual']:.4f}")
worst_distribution(X, y)

Compute the worst-case distribution under KL divergence constraint.

The worst-case distribution weights are computed via exponential tilting:

\[w_i = \frac{\exp(\ell(\theta^*;x_i,y_i)/\lambda^*)}{\sum_j \exp(\ell(\theta^*;x_j,y_j)/\lambda^*)}\]

where \(\theta^*\) is the optimal model parameter and \(\lambda^*\) is the optimal dual variable.

Parameters:
  • X (numpy.ndarray) – Feature matrix of shape (n_samples, n_features). Must match self.input_dim and training data dimension.

  • y (numpy.ndarray) –

    Target vector of shape (n_samples,). Format constraints:

    • Classification: Labels in {-1, +1}

    • Regression: Continuous values

Returns:

Worst-case distribution specification containing:

  • sample_pts: Original samples [X, y] (reference to inputs)

  • weight: Probability vector of shape (n_samples,)

  • entropy: KL divergence \(D_{KL}(Q^*\|P)\)

Return type:

Dict[str, Any]

Raises:
  • KLDROError

    • If inner optimization via fit() fails

    • If \(\lambda^* \leq 0\) (invalid dual variable)

    • If weight normalization fails (sum → 0)

  • ValueError

    • If input dimensions mismatch

    • If classification labels violate binary constraints

Example:
>>> model = KLDRO(input_dim=3).fit(X_train, y_train)
>>> dist = model.worst_distribution(X_test, y_test)
>>> dist["sample_pts]
>>> dist["weight]