Learning Understandable Neural Networks With Nonnegative Weight Constraints

People can understand complex structures if they relate to more isolated yet understandable concepts. Despite this fact, popular pattern recognition tools, such as decision tree or production rule learners, produce only flat models which do not build intermediate data representations. On the other hand, neural networks typically learn hierarchical but opaque models.

We show how constraining neurons’ weights to be nonnegative improves the interpretability of a network’s operation. We analyze the proposed method on large data sets: the MNIST digit recognition data and the Reuters text categorization data. The patterns learned by traditional and constrained network are contrasted to those learned with principal component analysis and nonnegative matrix factorization.