In this paper, we study endogeneity problems in algorithmic decision-making where data and actions are interdependent. When there are endogenous covariates in a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the covariates spills over to the actions. We propose a class of algorithms to correct for the bias by incorporating instrumental variables into leading online learning algorithms. These algorithms also attain regret levels that match the best known lower bound for the cases without endogeneity. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.