Abstract
Neural networks are well-known for their powerful capability in producing high prediction accuracy. However, due to the non-linear calculations in the network, it is very difficult for users to understand which input features are important in leading to final predictions. In this study, we propose a two-step pipeline approach that uses two sets of linear models to estimates feature importance in the input dataset X that leads to the class prediction specified in Y. More specifically, the first linear regression model derives the feature importance in X in explaining the Z-code that was extracted from any hidden layer of a trained neural network. The second linear classification model captures the importance in the Z-code in predicting the target class Y. We then combine the first X to Z importance with the second Z to Y importance together to approximate the non-linear importance from X to Y. The experiments conducted in this study also show that our method is sound and stable in selecting the truly important features from input datasets regardless how a neural network was constructed with different parameters such as activation functions or the number of hidden layers.