Microsoft Targets AI ‘Holy Grail’ With Windows ML 2.0

Wait 5 sec.

Windows old-timers may remember PC gaming in the ’80s and ’90s. Games wouldn’t load in MS-DOS without something as rudimentary as the right sound card.It’s no different with AI on Windows PCs today. Models don’t load without the right software tools, drivers, neural networks or relevant PC hardware.But Microsoft is on the brink of solving this AI problem, much like it solved a gaming problem to transform Windows 95 into a PC gaming powerhouse.The DirectX technology introduced in Windows 95 was a breakthrough. Games just worked, regardless of the hardware. Developers were sold on Windows 95’s ease of use, and DirectX revolutionized game development. Now PC gaming is bigger than console gaming.Similarly, Microsoft hopes it has a breakthrough on its hands to run AI on PCs with Windows ML 2.0, a core AI runtime that will make AI models just run, regardless of hardware. The technology was announced at Microsoft’s Build developer show last week.Windows ML 2.0 — which is based on the ONNX runtime — is a wrapper that allows developers to bring their own AI models and create AI apps for PCs. The runtime can compile reasonably sized models in a matter of minutes.Developers can create apps and not worry about what’s under the hood. It’s like a version of Microsoft’s plug-and-play technology, in which hardware — old or new — just works.Previous Mistakes and ProblemsMicrosoft relied on DirectML — a descendant of DirectX mostly for GPU acceleration — to do the job, but it wasn’t fast enough.The company discovered weaknesses in DirectML when developing a feature called “Click to Do,” which can identify text and images on screen and take action on them.“We’ve come to the realization that we need something faster,” said Ryan Demopoulos, principal product manager at Microsoft, during a Build session.Windows ML makes sure AI apps automatically run on CPUs and neural-processing units (NPUs) in addition to GPUs.A Closer LookMicrosoft has clearly learned from past problems of offering multiple versions of Windows for x86 and Arm.The AI chip ecosystem is even more diverse, with Windows 11 PCs supporting AI chips, CPUs and GPUs from Intel, AMD and NVIDIA.That’s where Windows ML 2.0 steps in. The Windows ML runtime handles the heavy lifting of identifying the hardware and automating hardware support for AI models, while also extracting the maximum performance from AI chips.Windows ML 2.0 also figures out dependencies, procurement and update management, and includes them in installers. That is typically a lot of manual labor in fragmented environments.An experimental version of Windows ML 2.0 is now available for developers to try out.“It’s not yet ready or meant for production apps. Please don’t use it in your production apps,” Demopoulos said.The ‘Holy Grail’Microsoft reached out to Reincubate, which develops the popular Camo webcam app, to take an early look at Windows ML.Windows ML 2.0 meets Reincubate’s vision and desire for AI models to just work on silicon without the need to consistently quantize, tune, test and deploy for a bunch of frameworks, Aidan Fitzpatrick, CEO of Reincubate, told The New Stack.“The holy grail is being able to take a single high-precision model and have it JIT — or ‘just work’ — seamlessly across Windows silicon with different drivers, different capabilities and different precision,” Fitzpatrick said.Having Windows manage versioning and retrieval of frameworks and models dynamically makes sense, Fitzpatrick explained.“It’ll make our lives easier. What’s wrong with smaller, faster downloads, installs and updates? Users appreciate that,” Fitzpatrick added.An emerging Camo feature is a real-time, adjustable retoucher, so users can tweak appearances in meetings, streams and recordings. With Windows ML and models including feature and landmark detection, “we can make it work,” Fitzpatrick said.“Windows ML has a lot of existing, robust components behind it such as ORT (ONNX runtime), and that’s made it a lot more straightforward than it otherwise might have been — in adopting it, we’ve not had to blow things up or start over,” Fitzpatrick said.“Windows ML should be a powerful tool in … helping us to move at the speed of silicon innovation,” Fitzpatrick said.How It WorksDevelopers can bring in their own models via Windows ML public APIs. The runtime identifies the hardware and manages and updates the dependencies.AI models talk to silicon through an “execution provider,” which brings in the necessary hardware support. The layer identifies the hardware and scales AI performance accordingly.Developers don’t have to create multiple copies of executables for different hardware configurations. Microsoft services updates to the runtime, so developers can focus on developing models.“Once your app is installed and initializes Windows ML, then we will scan the current hardware and download any execution providers applicable for this device,” said Xiaoxi Han, senior software engineer at Microsoft.Digging DeeperXiaoxi Han demonstrated how developers could get their AI models running on Windows 11 PCs. The demonstration used the VS Code extension toolkit to convert a preselected ResNet model to evaluate the image of a puppy.She initiated a new “conversion” tool to optimize the preselected ResNet model to open source ONNX format, optimize it, and quantize it. Models downloaded from Hugging Face could also be converted to ONNX format.“If you have a PyTorch model, if you’ve trained one or if you’ve obtained one, you can convert it into ONNX format and run it with Windows ML to run your on-device workloads,” Demopoulos said.The conversion feature optimizes AI models to run locally on NPUs from Intel, Qualcomm and AMD. Over time, Microsoft will get rid of this step as conversion will support all chips.Clicking “Run” in the interface converted the Hugging Face model to ONNX format. A small ResNet model took about 30 seconds to convert.Han next created the app in Visual Studio by starting a console project. The .NET version and the target OS of Windows version were set in project properties. Then a NuGet package called Microsoft.AI.Windows.MachineLearning was installed for the Windows ML runtime package, which also includes the ONNX runtime bits and ML Layer APIs for execution providers.NuGet automatically sets up the dependencies between the app code and Windows ML runtime. Other NuGet packages may be required for large language models.Han created an entry point for the console app by bringing up the namespace and creating a program class. She initialized an ONNX runtime environment, and then Windows ML.Han also created an infrastructure object, which needs to stay alive for the app process. That scans the hardware and downloads relevant ‘execution provider’ packages. One execution provider example is QNN, which helps AI models take advantage of NPUs in laptops with Qualcomm’s Snapdragon chip.Then came standard AI code writing, which includes setting up the file path to the model, the label file and the image file. An ONNX session for inferencing was set up and configured, which included loading the image and setting up policies based on type of chip or power consumption.Running inference fed the processed image tensor into the ONNX session, which analyzed the image and returned raw prediction scores.The output results were processed to convert them to probabilities, then translated to human-readable format, showing high confidence that the image was indeed that of a golden retriever.Coders can specify the type of performance extracted from hardware. A line with “MAX_PERFORMANCE” indicates top-line performance, or “PREFER_CPU” or “PREFER_NPU” or “PREFER_GPU” may be for AI models running consistently in the background. Another instruction can set up AI models to run at minimal speed to save battery life.“In the not-too-distant future, we also want to add … ‘workload splitting.’ You can have a single AI workload that is split across multiple different types of processors to get even greater performance,” Demopoulos said.The full codebase from the demonstration is available on GitHub.What APIs?The main Windows ML layer includes “initialization” APIs — Microsoft.Windows.AI.MachineLearning — which keep the runtime up to date and download the necessary elements for the model to talk to the hardware.The main ML Layer includes generative AI APIs that are designed to help in generative AI loops for LLMs, including Microsoft.ML.OnnxRuntimeGenAI.WinML. A runtime API layer gives developers fine-grained control over execution of the AI model.The layers are exposed in WinRT, but Microsoft is also providing flat C wrappers with managed projections as a convenience so developers don’t need to learn WinRT.The post Microsoft Targets AI ‘Holy Grail’ With Windows ML 2.0 appeared first on The New Stack.