KL-DRO¶
- class dro.linear_model.kl_dro.KLDRO(input_dim, model_type='svm', fit_intercept=True, solver='MOSEK', kernel='linear', eps=0.0)¶
Bases:
BaseLinearDRO
Kullback-Leibler divergence-based Distributionally Robust Optimization (KL-DRO) model.
This model minimizes a KL-robust loss function for both regression and classification.
Reference: <https://optimization-online.org/wp-content/uploads/2012/11/3677.pdf>
Initialize KL-divergence Distributionally Robust Optimization model.
Inherits from BaseLinearDRO and configures KL ambiguity set parameters. The ambiguity set is defined by:
\[\mathcal{Q} = \{ Q \ll P \, | \, D_{KL}(Q\|P) \leq \epsilon \}\]where \(D_{KL}\) is Kullback–Leibler divergence.
- Parameters:
input_dim (int) – Dimension of input features. Must match training data features.
model_type (str) –
Base model architecture. Supported:
'svm'
: Hinge loss (classification)'logistic'
: Logistic loss (classification)'ols'
: Least squares (regression)'lad'
: Least absolute deviation (regression)
fit_intercept (bool) – If True, adds intercept term \(b\) to linear model: \(\theta^T X + b\) Disable when data is pre-centered.
solver (str) –
Convex optimization solver, supported values:
'MOSEK'
(recommended): Requires academic/commercial license
kernel (str) – the kernel type to be used in the optimization model, default = ‘linear’
eps (float) –
KL divergence bound (ε ≥ 0). Special cases:
ε = 0: Reduces to standard empirical risk minimization (no distributional robustness)
ε → ∞: Approaches worst-case distribution (maximally conservative). Typical practical range: 0.01 ≤ ε ≤ 5.0
- Raises:
If input_dim ≤ 0
If eps < 0
If unsupported solver specified
Attribute Initialization:
self.dual_variable
: Stores optimal dual variable λ* after callingfit()
self._p
: Internal probability vector of shape (n_samples,)self._solver_opts
: Solver-specific options parsed from global config
- Example:
>>> model = KLDRO(input_dim=5, model_type='logistic', eps=0.1) >>> model.input_dim # 5 >>> model.eps # 0.1 >>> model.dual_variable # None (until fit is called)
- update(config)¶
Update KL-DRO model configuration parameters dynamically.
Primarily handles robustness parameter (eps) updates while maintaining optimization problem structure. Preserves existing dual variables until next
fit()
call.- Parameters:
config (Dict[str, Any]) –
Configuration dictionary containing parameters to update. Recognized keys:
eps
: (float) New KL divergence bound (ε ≥ 0). Other keys are silently ignored.
- Raises:
KLDROError –
If
eps
value is invalid (not float/int or negative)If provided
eps
> 100.0 (empirical stability threshold)
TypeError – If config is not a dictionary
- Return type:
- Example:
>>> model = KLDRO(eps=0.5) >>> model.update({"eps": 0.8}) >>> model.eps # 0.8 >>> model.update({"invalid_key": 1.0}) # No-op
- fit(X, y)¶
Solve KL-constrained distributionally robust optimization problem.
Constructs and solves the convex optimization problem:
\[\min_{\theta,b} \quad \sup_{Q \in \mathcal{Q}} \mathbb{E}_Q[\ell(\theta,b;X,y)],\quad \text{s.t.} \quad D_{KL}(Q\|P) \leq \epsilon\]where \(\mathcal{Q}\) is the ambiguity set defined by KL divergence constraint.
- Parameters:
X (numpy.ndarray) – Training feature matrix of shape (n_samples, n_features). Must satisfy n_features == self.input_dim.
y (numpy.ndarray) –
Target values of shape (n_samples,). Format requirements:
Classification: Binary labels in {-1, +1}
Regression: Continuous real values
- Returns:
Solution dictionary containing:
theta
: Weight vector of shape (n_features,)b
: Intercept term (present if fit_intercept=True)dual
: Optimal dual variable for KL constraint (λ*)
- Return type:
Dict[str, Any]
- Raises:
KLDROError –
If problem is infeasible with current parameters
If solver fails to converge
If X.shape[1] != self.input_dim
If X.shape[0] != y.shape[0]
If classification labels not in {-1, +1}
- Example:
>>> model = KLDRO(input_dim=3, eps=0.1) >>> X = np.random.randn(100, 3) >>> y = np.sign(np.random.randn(100)) # Binary classification >>> solution = model.fit(X, y) >>> print(solution["theta"].shape) # (3,) >>> print(f"Dual variable: {solution['dual']:.4f}")
- worst_distribution(X, y)¶
Compute the worst-case distribution under KL divergence constraint.
The worst-case distribution weights are computed via exponential tilting:
\[w_i = \frac{\exp(\ell(\theta^*;x_i,y_i)/\lambda^*)}{\sum_j \exp(\ell(\theta^*;x_j,y_j)/\lambda^*)}\]where \(\theta^*\) is the optimal model parameter and \(\lambda^*\) is the optimal dual variable.
- Parameters:
X (numpy.ndarray) – Feature matrix of shape (n_samples, n_features). Must match
self.input_dim
and training data dimension.y (numpy.ndarray) –
Target vector of shape (n_samples,). Format constraints:
Classification: Labels in {-1, +1}
Regression: Continuous values
- Returns:
Worst-case distribution specification containing:
sample_pts
: Original samples [X, y] (reference to inputs)weight
: Probability vector of shape (n_samples,)entropy
: KL divergence \(D_{KL}(Q^*\|P)\)
- Return type:
Dict[str, Any]
- Raises:
KLDROError –
If inner optimization via
fit()
failsIf \(\lambda^* \leq 0\) (invalid dual variable)
If weight normalization fails (sum → 0)
If input dimensions mismatch
If classification labels violate binary constraints
- Example:
>>> model = KLDRO(input_dim=3).fit(X_train, y_train) >>> dist = model.worst_distribution(X_test, y_test) >>> dist["sample_pts] >>> dist["weight]