Feature-Selective Oblique Trees for Regression: Application to STEM Graduate Wage Prediction in Italy

Traditional decision trees are widely used but are limited by their axis-aligned splits, which can lead to large and complex models when handling high-dimensional or correlated data. Oblique decision trees attempt to overcome these issues by using linear combinations of predictors to build, at each node, the splitting hyperplane. Nevertheless, the majority of oblique tree methods are focused on classification tasks or use intensive computational optimization processes that prevent the interpretability and scalability of the trees [1]. In this work, we introduce a novel approach for constructing oblique decision trees for regression tasks. This method is called Selection variable weighted support vector machine Oblique Regression decision Tree (SORT) and addresses the limitations of traditional and oblique trees by integrating a variable selection process and a weighted support vector machine (SVM) [2] with a linear kernel into the decision tree framework. At each node SORT selects the most correlated features with the target variable y, then transforms y into a dichotomous variable using the quantiles of its distribution. For each quantile, a weighted linear SVM is applied to find a splitting hyperplane, with the best one chosen based on deviance reduction. In particular, the weights assigned in the SVM to observations are computed as the absolute value of the scaled elements of y, ensuring that extreme values have a stronger influence on the splitting process. The process is repeated recursively until a stopping criterion is met. We carried out a simulation study to assess SORT’s performance under different data scenarios, such as noisy features, non-normal transformed variables, and the use of categorical variables. By analysing 3,840 simulated datasets, we found that selecting just two features at each split and using the median as the dichotomization threshold increases the tree's performance and computational efficiency. Moreover, this parametrization of SORT also increases the tree interpretability, as the resulting trees can be visualized and understood. In addition to this, SORT, consistently outperformed five other decision tree methods across the simulated datasets, including traditional CART and oblique trees such as ODT, CO2, HHCART, and BUTIA [3- 7], in terms of predictive accuracy. We also apply SORT to investigate a real-world problem, specifically focusing on the prediction of wages for graduates in Science, Technology, Engineering and Mathematics (STEM) fields in Italy. The results indicate that SORT outperforms other oblique tree methods. Furthermore, using only two predictors at each node, there is an increase in interpretability, allowing us to determine the main factors impacting salary levels, potentially revealing structural issues such as the gender wage gap and Italy's North-South divide. In conclusion, SORT offers a powerful and interpretable alternative to traditional regression trees, particularly in settings with complex feature interactions. Future work may investigate non-linear