Random forest for toxicity of chemical emissions: Features selection and uncertainty quantification
A. Marvuglia, M. Leuenberger, M. Kanevski, and E. Benetto
Journal of Environmental Accounting and Management, vol. 3, no. 3, pp. 229-241, 2015
Toxicity characterization of chemicals’ emissions is a complex task which proceeds via multimedia fate and exposure models attached to models of dose–response relationships. Several different environmental multimedia models exist, but in any case a vast amount of data on the properties of the chemical compounds being assessed is required. This paper deals with the selection of informative variables in the problem of deriving characterization factors for eco-toxicology and human toxicology of chemical compounds starting from molecular-based properties. The Random Forest algorithm has been applied to single out the most relevant variables when modelling one toxicity factor at the time. The set of variables retained varies according to the modeled output factor, but certain variables are almost always retained among the top three most important ones, regardless the output factor taken into consideration. The modelling performed in this paper is one of the first applications of nonlinear techniques to the database of organic substances made available by the multimedia fate and exposure model USEtox, largely used by the Life Cycle Assessment (LCA) community.