Where training sets were once scraped freely from the web or collected from low-paid annotators, companies are looking to proprietary training data as a competitive advantage.