Machine learning for toxicity characterization of organic chemical emissions using USEtox database: Learning the structure of the input space

Auteurs

A. Marvuglia, M. Kanevski, and E. Benetto

Référence

Environment International, vol. 83, pp. 72-85, 2015

Description

Toxicity characterization of chemical emissions in Life Cycle Assessment (LCA) is a complex task which usually proceeds via multimedia (fate, exposure and effect) models attached to models of dose–response relationships to assess the effects on target. Different models and approaches do exist, but all require a vast amount of data on the properties of the chemical compounds being assessed, which are hard to collect or hardly publicly available (especially for thousands of less common or newly developed chemicals), therefore hampering in practice the assessment in LCA. An example is USEtox, a consensual model for the characterization of human toxicity and freshwater ecotoxicity. This paper places itself in a line of research aiming at providing a methodology to reduce the number of input parameters necessary to run multimedia fate models, focusing in particular to the application of the USEtox toxicity model. By focusing on USEtox, in this paper two main goals are pursued: 1) performing an extensive exploratory analysis (using dimensionality reduction techniques) of the input space constituted by the substance-specific properties at the aim of detecting particular patterns in the data manifold and estimating the dimension of the subspace in which the data manifold actually lies; and 2) exploring the application of a set of linear models, based on partial least squares (PLS) regression, as well as a nonlinear model (general regression neural network — GRNN) in the seek for an automatic selection strategy of the most informative variables according to the modelled output (USEtox factor). After extensive analysis, the intrinsic dimension of the input manifold has been identified between three and four. The variables selected as most informative may vary according to the output modelled and the model used, but for the toxicity factors modelled in this paper the input variables selected as most informative are coherent with prior expectations based on scientific knowledge of toxicity factors modelling. Thus the outcomes of the analysis are promising for the future application of the approach to other portions of the model, affected by important data gaps, e.g., to the calculation of human health effect factors.

Lien

doi:10.1016/j.envint.2015.05.011

Partager cette page :