Symulacja DEA za pomocą uczenia maszynowego: Wykorzystanie metod klastrowania do wyboru zbioru uczącego [Simulating DEA with machine learning: A clustering-based data preprocessing technique for training set selection] Barbara Kamińska (Doctoral School - Management, KSZiRO, PWr, Wrocław) Data Envelopment Analysis (DEA) is a non-parametric technique for measuring the relative efficiency of a set of decision making units (DMUs), on the basis of multiple inputs and outputs. Performing a typical analysis with DEA requires solving a series of linear programs, one for each DMU. Therefore, DEA suffers from the curse of dimensionality, i.e., on big data the computational load is very high. This issue is commonly treated in the literature with the adoption of Machine Learning (ML) algorithms. Nevertheless, even though the selection of the training dataset is of crucial importance in such algorithms, in the DEA literature this factor is neglected and all methods rely on random sampling. In this paper, we built on the existing literature and we introduce a clustering-based data preprocessing technique to select the training dataset in a way that it represents the entire dataset as much as possible. We use simulated data to test this new technique against random sampling under different ML algorithms, number of netputs and standard DEA models. We further test it on a network DEA model for two-stage series structures in which the efficiency scores are represented in a two-dimensional vector. In all cases, the results are statistically significant, highlighting that the proposed technique increases the accuracy of the ML algorithms, whereas it may even decrease the required computational load.