KL-DRO¶

class dro.linear_model.kl_dro.KLDRO(input_dim, model_type='svm', fit_intercept=True, solver='MOSEK', kernel='linear', eps=0.0)¶

Bases: BaseLinearDRO

Kullback-Leibler divergence-based Distributionally Robust Optimization (KL-DRO) model.

This model minimizes a KL-robust loss function for both regression and classification.

Reference: <https://optimization-online.org/wp-content/uploads/2012/11/3677.pdf>

Initialize KL-divergence Distributionally Robust Optimization model.

Inherits from BaseLinearDRO and configures KL ambiguity set parameters. The ambiguity set is defined by:

\[\mathcal{Q} = \{ Q \ll P \, | \, D_{KL}(Q\|P) \leq \epsilon \}\]

where \(D_{KL}\) is Kullback–Leibler divergence.

Parameters:

input_dim (int) – Dimension of input features. Must match training data features.
model_type (str) –
Base model architecture. Supported:
- 'svm': Hinge loss (classification)
- 'logistic': Logistic loss (classification)
- 'ols': Least squares (regression)
- 'lad': Least absolute deviation (regression)
fit_intercept (bool) – If True, adds intercept term \(b\) to linear model: \(\theta^T X + b\) Disable when data is pre-centered.
solver (str) –
Convex optimization solver, supported values:
- 'MOSEK' (recommended): Requires academic/commercial license
kernel (str) – the kernel type to be used in the optimization model, default = ‘linear’
eps (float) –
KL divergence bound (ε ≥ 0). Special cases:
- ε = 0: Reduces to standard empirical risk minimization (no distributional robustness)
- ε → ∞: Approaches worst-case distribution (maximally conservative). Typical practical range: 0.01 ≤ ε ≤ 5.0

Raises:

ValueError –

If input_dim ≤ 0
If eps < 0
If unsupported solver specified

Attribute Initialization:

self.dual_variable: Stores optimal dual variable λ* after calling fit()

self._p: Internal probability vector of shape (n_samples,)

self._solver_opts: Solver-specific options parsed from global config

Example:

>>> model = KLDRO(input_dim=5, model_type='logistic', eps=0.1)
>>> model.input_dim  # 5
>>> model.eps  # 0.1
>>> model.dual_variable  # None (until fit is called)

update(config)¶

Update KL-DRO model configuration parameters dynamically.

Primarily handles robustness parameter (eps) updates while maintaining optimization problem structure. Preserves existing dual variables until next fit() call.

Parameters:

config (Dict[str, Any]) –

Configuration dictionary containing parameters to update. Recognized keys:

eps: (float) New KL divergence bound (ε ≥ 0). Other keys are silently ignored.

Raises:

KLDROError –
- If eps value is invalid (not float/int or negative)
- If provided eps > 100.0 (empirical stability threshold)
TypeError – If config is not a dictionary

Return type:

None

Example:

>>> model = KLDRO(eps=0.5)
>>> model.update({"eps": 0.8})
>>> model.eps  # 0.8
>>> model.update({"invalid_key": 1.0})  # No-op

fit(X, y)¶

Solve KL-constrained distributionally robust optimization problem.

Constructs and solves the convex optimization problem:

\[\min_{\theta,b} \quad \sup_{Q \in \mathcal{Q}} \mathbb{E}_Q[\ell(\theta,b;X,y)],\quad \text{s.t.} \quad D_{KL}(Q\|P) \leq \epsilon\]

where \(\mathcal{Q}\) is the ambiguity set defined by KL divergence constraint.

Parameters:

X (numpy.ndarray) – Training feature matrix of shape (n_samples, n_features). Must satisfy n_features == self.input_dim.
y (numpy.ndarray) –
Target values of shape (n_samples,). Format requirements:
- Classification: Binary labels in {-1, +1}
- Regression: Continuous real values

Returns:

Solution dictionary containing:

theta: Weight vector of shape (n_features,)
b: Intercept term (present if fit_intercept=True)
dual: Optimal dual variable for KL constraint (λ*)

Return type:

Dict[str, Any]

Raises:

KLDROError –
- If problem is infeasible with current parameters
- If solver fails to converge
ValueError –
- If X.shape[1] != self.input_dim
- If X.shape[0] != y.shape[0]
- If classification labels not in {-1, +1}

Example:

>>> model = KLDRO(input_dim=3, eps=0.1)
>>> X = np.random.randn(100, 3)
>>> y = np.sign(np.random.randn(100))  # Binary classification
>>> solution = model.fit(X, y)
>>> print(solution["theta"].shape)  # (3,)
>>> print(f"Dual variable: {solution['dual']:.4f}")

worst_distribution(X, y)¶

Compute the worst-case distribution under KL divergence constraint.

The worst-case distribution weights are computed via exponential tilting:

\[w_i = \frac{\exp(\ell(\theta^*;x_i,y_i)/\lambda^*)}{\sum_j \exp(\ell(\theta^*;x_j,y_j)/\lambda^*)}\]

where \(\theta^*\) is the optimal model parameter and \(\lambda^*\) is the optimal dual variable.

Parameters:

X (numpy.ndarray) – Feature matrix of shape (n_samples, n_features). Must match self.input_dim and training data dimension.
y (numpy.ndarray) –
Target vector of shape (n_samples,). Format constraints:
- Classification: Labels in {-1, +1}
- Regression: Continuous values

Returns:

Worst-case distribution specification containing:

sample_pts: Original samples [X, y] (reference to inputs)
weight: Probability vector of shape (n_samples,)
entropy: KL divergence \(D_{KL}(Q^*\|P)\)

Return type:

Dict[str, Any]

Raises:

KLDROError –
- If inner optimization via fit() fails
- If \(\lambda^* \leq 0\) (invalid dual variable)
- If weight normalization fails (sum → 0)
ValueError –
- If input dimensions mismatch
- If classification labels violate binary constraints

Example:

>>> model = KLDRO(input_dim=3).fit(X_train, y_train)
>>> dist = model.worst_distribution(X_test, y_test)
>>> dist["sample_pts]
>>> dist["weight]