Developers coordinate code across README files, issue threads, and pull request discussions. Much of that exchange happens in English, and a large share happens in other languages. GitHub has released a dataset built to help researchers and developers locate public repositories that carry non-English natural-language content. The GitHub Multilingual Repositories Dataset is available on GitHub under the CC0-1.0 license. The release follows a commitment GitHub made in 2025 as part of Microsoft’s European Digital Commitments … More →The post GitHub releases an open dataset for multilingual developer content appeared first on Help Net Security.