Inception, a Palo Alto-based AI company founded by Stanford professor Stefano Ermon, claims to have developed a novel diffusion-based large language model (DLM) that significantly outperforms traditional LLMs in speed and efficiency. "Inception's model offers the capabilities of traditional LLMs, including code generation and question-answering, but with significantly faster performance and reduced computing costs, according to the company," reports TechCrunch. From the report: Ermon hypothesized generating and modifying large blocks of text in parallel was possible with diffusion models. After years of trying, Ermon and a student of his achieved a major breakthrough, which they detailed in a research paper published last year. Recognizing the advancement's potential, Ermon founded Inception last summer, tapping two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company. [...] "What we found is that our models can leverage the GPUs much more efficiently," Ermon said, referring to the computer chips commonly used to run models in production. "I think this is a big deal. This is going to change the way people build language models." Inception offers an API as well as on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10x faster than traditional LLMs while costing 10x less. "Our 'small' coding model is as good as [OpenAI's] GPT-4o mini while more than 10 times as fast," a company spokesperson told TechCrunch. "Our 'mini' model outperforms small open-source models like [Meta's] Llama 3.1 8B and achieves more than 1,000 tokens per second."Read more of this story at Slashdot.