Strong identifiability and parameter learning in regression with heterogeneous response
Published in ArXiv, 2023
Mixtures of regression are useful for regression learning with respect to an uncertain and heterogeneous response variable of interest. In addition to being a rich predictive model for the response given some covariates, the model parameters provide meaningful information about the heterogeneity in the data population, which is represented by the conditional distributions for the response given the covariates associated with a number of distinct but latent subpopulations. In this paper, we investigate conditions of strong identifiability, MLE rates of convergence for the conditional density and model parameters, and the Bayesian posterior contraction behavior arising in finite mixture of regression models, under exact-fitted and over-fitted settings and when the number of components is unknown. This theory is applicable to common choices of link functions and families of conditional distributions employed by practitioners. We provide simulation studies and data illustrations, which shed some light on the parameter learning behavior found in several popular regression mixture models reported in the literature.