by Michael Zietz, Kathleen LaRow Brown, Undina Gisladottir, Nicholas P. TatonettiComplex diseases are a major challenge, and genetics underlie a large fraction of the risk for these diseases. Observational data are helpful for this research due to large scale, cost-effectiveness, information on many different conditions, and future scalability, but they reflect factors such as healthcare processes, access to care, and broader societal effects like systemic biases. Here, we introduce MaxGCP, a phenotyping method designed to purify the genetic signal in observational data. MaxGCP optimizes a phenotype definition to maximize its coheritability—the genetic covariance between two traits normalized by their phenotypic standard deviations—with the complex trait of interest. Unlike previous phenotype-combination methods, MaxGCP is phenotype-specific, has linear computational complexity in the number of features, and does not require manual feature selection. In an analysis of stroke, we found that MaxGCP boosts study power by more than 13 percent compared to conventional, single-code phenotype definitions. MaxGCP is a powerful tool for genetic discovery in observational data, and we anticipate that it will be broadly useful for studying complex diseases using observational data.