Wasserstein DRO

class dro.src.linear_model.wasserstein_dro.WassersteinDRO(input_dim, model_type='svm', fit_intercept=True, solver='MOSEK', kernel='linear')

Bases: BaseLinearDRO

Wasserstein Distributionally Robust Optimization (WDRO) model

This model minimizes a Wasserstein-robust loss function for both regression and classification.

The Wasserstein distance is defined as the minimum probability coupling of two distributions for the distance metric:

\[d((X_1, Y_1), (X_2, Y_2)) = (\|\Sigma^{1/2} (X_1 - X_2)\|_p)^{square} + \kappa |Y_1 - Y_2|,\]

where parameters are:

  • \(\Sigma\): cost matrix, (a PSD Matrix);

  • \(\kappa\);

  • \(p\);

  • square (notation depending on the model type), where square = 2 for ‘svm’, ‘logistic’, ‘lad’; square = 1 for ‘ols’.

Reference:

[1] OLS: <https://www.cambridge.org/core/journals/journal-of-applied-probability/article/robust-wasserstein-profile-inference-and-applications-to-machine-learning/4024D05DE4681E67334E45D039295527>

[2] LAD / SVM / Logistic: <https://jmlr.org/papers/volume20/17-633/17-633.pdf>

Initialize Mahalanobis-Wasserstein DRO model.

Parameters:
  • input_dim (int) – Dimension of feature space. Must satisfy :math:` ext{input_dim} geq 1`

  • model_type (str) –

    Base model architecture. Supported:

    • 'svm': Hinge loss (classification)

    • 'logistic': Logistic loss (classification)

    • 'ols': Least squares (regression)

    • 'lad': Least absolute deviation (regression)

  • fit_intercept (bool) – Whether to learn intercept term \(b\). Set to False for pre-centered data. Defaults to True.

  • solver (str) –

    Convex optimization solver. Valid options:

    • 'MOSEK' (commercial, recommended)

  • kernel (str) – the kernel type to be used in the optimization model, default = ‘linear’

Raises:

ValueError

  • If input_dim < 1

  • If unsupported solver is selected

Example:
>>> model = WassersteinDRO(
...     input_dim=5,
...     model_type='svm',
...     solver='MOSEK'
... )
>>> model.cost_matrix.shape  # (5, 5)

Note

  • Changing cost_matrix after initialization requires calling update()

update(config)

Update Wasserstein-DRO model parameters dynamically.

Parameters:

config (dict[str, Any]) –

Configuration dictionary with keys:

  • 'cost_matrix': Mahalanobis metric matrix \(\Sigma^{-1} \succ 0\)

    • Shape: (input_dim, input_dim)

    • Type: numpy.ndarray

  • 'eps': Wasserstein radius \(\epsilon \geq 0\)

  • 'p': Wasserstein order \(p \geq 1\) or 'inf'

  • 'kappa': Y-ambiguity radius \(\kappa \geq 0\) or 'inf'

Raises:
  • ValueError

    • If cost_matrix is not positive definite

    • If eps < 0

    • If p < 1 and p ≠ ‘inf’

    • If kappa < 0 and kappa ≠ ‘inf’

  • TypeError

    • If cost_matrix is not numpy array

    • If numeric parameters are not float/int

Return type:

None

Example:
>>> model = WassersteinDRO(input_dim=3)
>>> new_config = {
...     'eps': 0.5,
...     'p': 2,
...     'cost_matrix': np.diag([1, 2, 3])
... }
>>> model.update(new_config)
>>> model.p  # 2.0
fit(X, y)

Fit the model using CVXPY to solve the WDRO problem.

Parameters:
  • X (numpy.ndarray) – Training feature matrix of shape (n_samples, n_features). Must satisfy n_features == self.input_dim.

  • Y (numpy.ndarray) –

    Target values of shape (n_samples,). Format requirements:

    • Classification: ±1 labels

    • Regression: Continuous values

  • y (ndarray)

Returns:

Dictionary containing trained parameters:

  • theta: Weight vector of shape (n_features,)

  • b

Return type:

Dict[str, Any]

.raises: WassersteinDROError: If the optimization problem fails to solve.

worst_distribution(X, y, compute_type, gamma=0)

Compute worst-case distribution under Wasserstein ambiguity set.

Parameters:
  • X (numpy.ndarray) – Input feature matrix. Shape: (n_samples, n_features) Must satisfy n_features == input_dim

  • y (numpy.ndarray) –

    Target vector. Shape: (n_samples,)

    • Classification: binary labels (-1/1)

    • Regression: continuous values

  • compute_type (str) –

    Computation methodology. Options:

    • 'asymp': Asymptotic approximation (faster, less accurate)

      Supported models: ['svm', 'logistic', 'lad']

    • 'exact': Exact dual solution (slower, precise)

  • gamma (float) – Regularization parameter for asymptotic method. Must satisfy \(\gamma > 0\) when compute_type='asymp'. Defaults to 0.

Returns:

Dictionary containing:

  • 'sample_pts': Worst-case sample locations. Shape: (m, n_features)

  • 'weights': Probability weights. Shape: (m,) with \(\sum w_i = 1\)

Return type:

dict[str, Any]

Raises:
  • ValueError

    • If compute_type='asymp' with model_type='ols'

    • If compute_type='asymp' and kappa == 'inf'

    • If gamma ≤ 0 when required

  • TypeError

    • If input dimensions mismatch

Example:
>>> X, y = np.random.randn(100, 3), np.random.randint(0,2,100)
>>> model = WassersteinDRO(model_type='svm', input_dim=3)
>>> wc_dist = model.worst_distribution(X, y, 'asymp', gamma=0.1)
>>> wc_dist['weights'].sum()  # Approximately 1.0

Note

  • Asymptotic method ignores curvature regularization (κ=infty)

  • Exact method requires solver='MOSEK' for conic constraints

Reference of Worst-case Distribution:

[1] SVM / Logistic / LAD: Theorem 20 (ii) in https://jmlr.org/papers/volume20/17-633/17-633.pdf, where eta is the theta in eq(27) and gamma = 0 in that equation.

[2] In all cases, we use a reduced dual case (e.g., Remark 5.2 of https://arxiv.org/pdf/2308.05414) to compute their worst-case distribution.

[3] General Worst-case Distributions can be found in: https://pubsonline.informs.org/doi/abs/10.1287/moor.2022.1275, where norm_theta is lambda* here.

class dro.src.linear_model.wasserstein_dro.WassersteinDROsatisficing(input_dim, model_type, fit_intercept=True, solver='MOSEK', kernel='linear')

Bases: BaseLinearDRO

Robust satisficing version of Wasserstein DRO

This model minimizes the subject to (approximated version) of the robust satisficing constraint of Wasserstein DRO. The Wasserstein Distance is defined as the minimum probability coupling of two distributions for the distance metric:

\[d((X_1, Y_1), (X_2, Y_2)) = (\|\Sigma^{1/2} (X_1 - X_2)\|_p)^{square} + \kappa |Y_1 - Y_2|,\]

Reference: <https://pubsonline.informs.org/doi/10.1287/opre.2021.2238>

Initialize Robust satisficing version of Wasserstein DRO.

Parameters:
  • input_dim (int) – Feature space dimension. Must satisfy \(d \geq 1\)

  • model_type (str) –

    Base model architecture. Supported:

    • 'svm'

    • 'logistic'

    • 'ols'

    • 'lad'

  • fit_intercept (bool) – Whether to learn intercept \(b\). Disable for standardized data. Defaults to True.

  • solver (str) –

    Convex optimization solver. Options:

    • 'MOSEK' (commercial, recommended)

  • kernel (str) – the kernel type to be used in the optimization model, default = ‘linear’

Raises:

ValueError

  • If input_dim < 1

  • If invalid solver selected

Initialization Defaults:
  1. Cost matrix initialized as identity \(I_d\)

  2. Target ratio :math:` au = 1/0.8` (20% performance margin)

  3. Wasserstein order \(p=1\) (earth mover’s distance)

Example:
>>> model = WassersteinDROsatisficing(
...     input_dim=5,
...     model_type='svm',
...     solver='ECOS'
... )
>>> model.cost_matrix.shape  # (5, 5)
update(config)

Update model parameters based on configuration.

Parameters:

config (dict) – The model configuration

Return type:

None

fit(X, y)

Fit model to data by solving an optimization problem.

Parameters:
Return type:

Dict[str, Any]

fit_oracle(X, y)

Depreciated, find the optimal that given the ambiguity constraint.

Args:

X (np.ndarray): Input feature matrix with shape (n_samples, n_features).

y (np.ndarray): Target vector with shape (n_samples,).

Returns:

float: robust objective value

worst_distribution(X, y)