Florian LALANDE (Okinawa) le 20 janvier 2023 à 11h30
par
Numerical data imputation algorithms consist in replacing missing values by estimates to allow extensive use of incomplete datasets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. We will see how this strategy can create artifacts leading to poor imputation in the presence of multimodal distributions. To tackle this problem, we introduce the kNNxKDE algorithm : a hybrid method tailored for numerical data imputation using nearest-neighbours (kNN) for conditional density estimation with Gaussian kernels (KDE). We qualitatively and quantitatively show that this method preserves the original data structure, yields great imputation results and provides better conditional distribution than current methods.