Text-only LLMs may already know enough about sound to predict downstream audio model performance before an encoder is ever attached.