DRO with Mixed Distances¶
In this part, we discuss Holistic DRO, Outlier-Robust Wasserstein DRO, Sinkhorn DRO,MOT DRO, and MMD DRO.
Sinkhorn DRO¶
[2]:
from dro.src.linear_model.sinkhorn_dro import *
from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge
# Data generation
X, y = make_regression(n_samples=1000, n_features=10, noise=1, random_state=42)
# Model training
dro_model = SinkhornLinearDRO(input_dim=10, output_dim=1, k_sample_max=4, reg_param=.001, lambda_param=100, max_iter=1000, model_type='lad')
params = dro_model.fit(X, y, optimization_type="SG")
print("Sinkhorn DRO Parameters:", params)
print(dro_model.score(X, y))
# Baseline comparison
lr_model = Ridge()
lr_model.fit(X, y)
print("Sklearn Coefficients:", lr_model.coef_)
print(lr_model.score(X,y))
Sinkhorn DRO Parameters: {'theta': array([33.60147 , 32.046486, 29.364248, 75.13656 , 7.143059, 10.22598 ,
76.198784, 9.39397 , 5.147702, 58.21924 ], dtype=float32), 'bias': array([-0.07683522], dtype=float32)}
0.9808581183352392
Sklearn Coefficients: [33.63648075 32.05820144 29.38822328 75.17312611 7.16263766 10.24411887
76.22903811 9.4250155 5.12332423 58.27112617]
0.9999457911466745
MOT-DRO¶
Based on Theorem 5.2, the current MOT-DRO does not support OLS (not satisfying Assumption 5.1) and does not allow when the uncertainties in Y also change.
[3]:
from sklearn.datasets import make_regression
# Data generation
X, y = make_regression(n_samples=100, n_features=10, random_state=42)
from dro.src.linear_model.mot_dro import *
mot_dro_model = MOTDRO(input_dim = 10, model_type = 'lad', fit_intercept = True)
mot_dro_model.update({'eps': 1, 'square':2})
mot_dro_model.fit(X, y)
optimal
[3]:
{'theta': [76.21548985345878,
53.29365202998623,
5.247024494762593,
52.46410461035137,
71.74006738420996,
1.214010364770905,
63.72428598909404,
14.06587595353978,
2.997652104182758,
44.93666589113934],
'b': array(1.29102914)}