Large-scale molecular endotype discovery in synovial fluid reveals osteoarthritis as a single biological continuum

Wait 5 sec.

INTRODUCTIONOsteoarthritis (OA) of the knee is common, affecting up to a third of adults aged 60 years or older1. Characterised by failure of the synovial joint, OA is a major contributor to healthcare costs and is a leading cause of disability, largely through chronic pain and limitations in function. Age and obesity are important risk factors, both of which have contributed to increasing disease burden across global populations2,3,4. There are currently no approved treatments for knee OA that effectively target structural disease and those that target symptomatic disease have modest efficacy and are associated with adverse events5,6. There remains, therefore, a major unmet clinical need.Limited understanding of disease pathogenesis coupled with a failure to translate findings from basic research to clinical settings has hampered clinical translation in OA7,8. Another significant challenge is the broad clinical spectrum of disease that has led many to question whether OA is one disease, or whether it is driven by multiple different pathways that converge on a common joint pathology9,10. Multiple clinical phenotypes have been suggested in the literature11,12,13, but these have not been validated as clinically useful stratification tools either when testing treatment responses or as predictors of disease progression14,15,16. Endotypes, defined by distinct molecular signatures, may have higher value, and could in part explain observable characteristics of a phenotype17. This is an important hypothesis that has never been formally assessed.Recent advances in understanding complex disease have been greatly enhanced by the application of multi-omic approaches to disease-relevant tissues11,18. The strengths of these approaches are the focus on human disease cohorts at scale, the unbiased and systematic nature of molecular identification, the ability to map molecules to shared pathways, and the ability to replicate results across independent cohorts. Technological advances in genomics, transcriptomics, and proteomics have enabled such studies to be carried out with low tissue volumes and at an affordable cost.To date, the majority of studies that have attempted to identify molecular subgroups in OA have used blood samples (serum or plasma)19,20,21. The synovial fluid (SF), in contrast, offers a promising alternative discovery biofluid, as it is close to the diseased joint tissues and is enriched with locally derived biomolecules. Thus, SF is likely to represent more accurately the disease in a given joint. We have also previously shown that proteins in knee OA or after knee injury are readily detected in the SF but correlate poorly in paired blood22,23,24,25. Furthermore, we have confirmed the utility of large-scale protein measurements in SF using the SomaScanTM platform, an aptamer-based assay26,27. The SomaScan platform v4.1 measures 6596 distinct human proteins.The Synovial fluid To Detect Endotypes by Unbiased Proteomics in OA (STEpUP OA) Consortium was established to test the primary hypothesis that there are detectable, distinct molecular endotypes in knee OA. We set out to perform an unsupervised analysis of a single SF sample from 1361 individuals with established OA, where cross-sectional clinical data were also available. The standardised protocol, which describes the cohorts in detail, and includes how we adjusted for pre-defined technical and other confounding factors is available elsewhere27. Here we present the primary analysis of STEpUP OA, in which we determine whether protein molecular endotypes exist in the SF of participants with established knee OA, and further explore the relationship between proteomic signatures and structural and symptomatic disease.RESULTSEndotype detection in OA SFTo search for molecular endotypes in OA using SF protein profiles, the f(K) cluster metric was employed. We had previously reported that a large contributor of variance in the initial processed data (principal component 1, accounting for 48% of variance), was due to intracellular proteins27. Appreciating that the intracellular protein signature could obscure subtle clustering patterns within the data, we performed cluster analyses with and without regression adjustment for intracellular protein27, using an intracellular protein score (IPS) that correlated highly with principal component 1 (r = 0.94)27. Cluster analysis revealed 2 clusters that were evident within the Discovery, Replication and Combined datasets for the non-IPS regressed analysis (Fig. 1A, left panel). In contrast, no clusters were detected in the IPS-regressed dataset (Fig. 1A, right panel). Visualisation of the proteomic data structure in two-dimensional space showed that the two clusters were indistinct and could be defined by dichotomising the continuous IPS, a feature that was lost after IPS regression (Fig. 1B).Fig. 1: Endotype discovery by cluster analysis in Discovery, Replication and Combined datasets.The alternative text for this image may have been generated using AI.Full size imageA f(K) metric for non-IPS and IPS regressed analyses. Significant clustering was observed (f(K)  < 0.85) across all three datasets (green = Discovery, pink = Replication, blue = Combined dataset) for non-IPS-regressed analyses only (left panel). B Visualisation of data structure and IPS on UMAP by dataset, stratified by non-IPS (top panel) and IPS regressed (bottom panel) analyses. f(K) metric plots for Combined dataset stratified by C biological sex (green = female, pink = male), D advanced radiographic status (KL grades: 0-2 as ‘Non-advanced OA’ (green) and ≥3 as ‘Advanced OA’ (pink)) or E blood staining (visual blood staining: 1 as ‘No blood staining’ (green) and ≥ 2 as ‘With blood staining’ (pink)) for non-IPS and IPS regressed analyses. OA osteoarthritis, IPS intracellular protein score, UMAP Uniform Manifold Approximation and Projection, KL Kellgren Lawrence.Association testing of IPS with pre-defined clinical and technical features (N = 1134, spun OA samples only) demonstrated that IPS was significantly, but modestly, greater in females, greater in advanced radiographic disease, and was greater in SF samples with visual blood staining scores ≥2 (Table 1). We therefore repeated the cluster analysis, using IPS and non-IPS regressed datasets, but stratified by biological sex (Fig. 1C), radiographic disease severity (Fig. 1D), and presence of blood staining (Fig. 1E). As with our non-stratified analyses, clusters (again indistinct) were only identified in non-IPS regressed data. Collectively, these data suggest that there are two potential endotypes in the non-IPS regressed data, but they are on a continuum, defined by the IPS, and are not distinct. Furthermore, the cluster structure is independent of the stage of disease, biological sex, and visible blood staining.Table 1 Baseline characteristics of participants, their SF samples and association of these factors with IPSFull size tableSynovial fluid protein associations with radiographic OAWe next examined which SF proteins were associated with radiographic disease severity. Over 1000 proteins were significantly associated with advanced radiographic disease severity (advanced (KL 3-4) vs. non-advanced (KL 0-2)) in each of the Discovery (N = 1021, 96.0% upregulated) and Replication datasets (N = 2524, 98.6% upregulated), with 688 (24.1%) proteins replicating across both datasets. Figure 2A shows the combined dataset where 3815 proteins were associated with radiographic disease severity. Top associated proteins that replicated (across Discovery and Replication cohorts) and that remained significant in the Combined dataset after cohort adjustment are labeled in orange. Protein abundance profiles for a selection of the labelled proteins were also significantly associated with ordinal KL grade, either significantly decreasing with worsening radiographic disease severity (LYVE1, IGFPB-6, FGFP1, sFRP-3) or increasing (TSG-6, sTREM-1, Activin A, RSPO2) (Fig. 2B). Two additional proteins, previously linked to OA, MMP-1328 and COL229, followed this latter pattern. Using the Hallmark gene set repository, nine differentially expressed pathways were significantly enriched across at least one of the three datasets (Fig. 2C). Of these, “Epithelial Mesenchymal Transition (EMT)”, “Complement” and “Angiogenesis” were significantly associated with advanced radiographic OA across all datasets. These remained significantly enriched in the Combined dataset after adjustment for haemoglobin A, a surrogate marker for blood in the SF27. Protein-protein interactions within each of the enriched pathways are shown in Fig. 2D-F. “EMT” contained a number of molecules previously associated with matrix remodelling in OA30 including, but not limited to, TIMP1, TIMP3, MMP-2, TGFβ1 and VEGFA. The correlation between protein associations within the Discovery and Replication datasets was r = 0.49 (p