Robin Genuer (Univ. Bordeaux) le 8 octobre 2021 11h30
par
Random forests are a statistical learning method widely used in many areas of scientific research essentially for its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional data. However, current random forest approaches are not flexible enough to handle heterogeneous data such as curves, images and shapes. In this talk, we present Fréchet trees and Fréchet random forests, which allow to manage data for which input and output variables take values in general metric spaces. To this end, a new way of splitting the nodes of trees is introduced and the prediction procedures of trees and forests are generalized. Then, random forests out-of-bag error and variable importance score are naturally adapted. The method is illustrated through several simulation scenarios on heterogeneous data combining longitudinal, image and scalar data. Finally, a real dataset from an HIV vaccine trials is analyzed with the proposed method.