B-vac a robust software package for bacterial vaccine design

Wait 5 sec.

B-vac a robust software package for bacterial vaccine designDownload PDF Download PDF ArticleOpen accessPublished: 28 August 2025Amjad Ali1,2 na1,Muhammad Hurrarah Bin Hamid1 na1,Samavi Nasir1 na1,Zaara Ishaq1 &…Farha Anwer1 Scientific Reports volume 15, Article number: 31745 (2025) Cite this articleSubjectsFunctional clusteringHigh-throughput screeningInfectionInfectious diseasesProtein analysisProtein function predictionsProteome informaticsSoftwareVaccinesVirtual drug screeningAbstractReverse Vaccinology (RV) has revolutionized vaccine discovery, utilizing bioinformatics to surpass traditional methods in identifying genes and proteins. By analyzing pathogen genomic data, RV pinpoints proteins with key traits such as immunogenicity, surface localization, and conservation across strains. Despite its advantages, current RV tools face challenges like prediction accuracy, computational demands, and accessibility. To address these challenges, we introduce B-vac, an executable pipeline designed to streamline bacterial vaccine design. B-vac features a user-friendly interface and robust algorithms for high-throughput proteomics data analysis, covering modules like Localization, Non-host Homolog, Virulence Factor, and Epitope Mapping. It operates offline, enhancing accessibility for researchers with limited computational resources. B-vac is equipped with epitope libraries, bacterial proteomes and virulence factor database which helps the program process the protein sequences locally and feeds data back to users with the ability to set variables and toggles for cut-off and filter values. The B-vac pipeline uses a string-based matching approach to match proteomes supplied by users with the pipeline’s curated database. This approach aligns and compares pathogen protein sequences by string similarity and enables the researchers to easily identify motifs important for immunogenic function. Evaluation of the pipeline by employing the Helicobacter pylori proteome revealed B-vac’s effectiveness in identifying vaccine candidates. B-vac offers a user-friendly, standalone solution for bacterial vaccine development, eliminating the need for external libraries and enabling offline usability, addressing key gaps in convenience and accessibility compared to existing RV tools. B-vac can be downloaded from: https://mgbio.tech/tools/.IntroductionBacterial infections and antibiotic resistance have now become one of the biggest global health challenges of the 21st century. The Centers for Disease Control and Prevention (CDC) reports that over two million people in the United States are affected by antibiotic-resistant infections annually, resulting in approximately 23,000 deaths. This alarming trend is compounded by the overuse and misuse of antibiotics, resulting to their ineffectiveness and thereby fueling multidrug resistance among bacterial pathogens1,2. Bacteria have evolved various mechanisms to resist antibiotics, such as genetic mutations, acquisition of resistance genes, and alterations in gene expression3,4. These mechanisms continuously evolve, posing critical challenges to existing treatment strategies5. Antimicrobial Resistance (AMR) has been identified as a high-priority public health concern by the World Health Organization since it causes several impacts on human health and the economy such as longer hospital stays and higher healthcare costs. Addressing Combating AMR requires cooperation across borders to rationalise antibiotic consumption, create new approaches to fighting infections, and promote equal access to potent medications6,7.Vaccines are emerging as promising alternatives to antibiotics in the fight against bacterial infections. They reduce the need for antibiotics by preventing infections, and consequently slow down the development of antibiotic resistance8,9. Vaccines targeting bacterial pathogens are particularly vital in regions with limited healthcare resources, as they are designed to be affordable, stable without refrigeration, and administrable orally or intranasally. These features make them suitable for widespread global use10. Moreover, vaccines can prevent infections caused by multidrug-resistant (MDR) bacteria, which are hard to treat with existing antibiotics11,12. While vaccines for extracellular bacteria like tetanus and diphtheria have been successful, developing vaccines against intracellular bacteria remains a complex task requiring advanced technologies9. Innovative vaccine technologies, including reverse vaccinology and novel adjuvants are being explored to enhance vaccine efficacy against multidrug-resistant bacteria8.Reverse vaccinology (RV) can be described as revolutionary approach to vaccine development, that uses pathogen’s genomic insights to identify potential vaccine candidates (PVCs) quickly and precisely as compared to traditional vaccinology methods. The approach that was initially introduced in the post-genomic era, started by sequencing the pathogen’s genome, which allowed researchers to analyze its whole antigenic repertoire. Unlike conventional methods which often required cultivation of the pathogen in vitro, RV relies on in silico methods for the analysis of pathogen’s genomic data. These tools look for genes that code for proteins with favorable characteristics for a vaccine and includes immunogenicity, exposure on the surface and/or conservation among different pathogens. This approach greatly accelerated and reduced the costs of identifying vaccine targets, making the journey from identifying a pathogen to developing a vaccine much faster13,14.Traditionally, vaccine development was based on principles pioneered by Louis Pasteur, who introduced key techniques such as isolating, inactivating, and injecting pathogens to induce protective immunity. This approach resulted in production of vaccines for diseases such as rabies, typhoid, diphtheria, tetanus among others using attenuated pathogens, or simply components of microbes that can trigger immune response15,16. As time went on, advancements in molecular biology and biotechnology brought new techniques including genetic engineering, purification of microbial elements, and the use of live vectors to express vaccine proteins17. These improvements made the production of vaccines much more accurate and safer, however the use of these methods was limited by the amount of empirical testing that was still required. The advent of genomic technologies brought about a new era in vaccine development known as reverse vaccinology. This method not only overcame the challenges associated with traditional methods but also allowed the development of vaccines for pathogens that were previously considered intractable18,19.The first successful application of reverse vaccinology was in developing a vaccine against serogroup B Neisseria meningitidis (MenB), a significant cause of sepsis and meningitis20. The 4CMenB vaccine, includes three recombinant antigens (fHbp, NadA, and NHBA) combined with outer membrane vesicles. This multicomponent vaccine has shown effectiveness in enhancing immune response across various age groups21,22,23. The 4CMenB vaccine underwent extensive clinical trials to evaluate its safety and efficacy. It was approved in Europe in 2013 and included in the UK’s National Immunization Program in 2015, showing an effectiveness of 83% against invasive MenB disease22,23. Research continues to refine MenB vaccines, exploring new antigens and formulations to enhance coverage and effectiveness. The use of reverse vaccinology remains a promising strategy for developing vaccines against other pathogens as well24,25,26.Since then, several tools have been developed on principles of reverse vaccinology, each with unique features and methodologies. NERVE was designed to be user-friendly having integrated multiple algorithms for protein analysis. It ranks vaccine candidates and maintains comprehensive data for further analysis. NERVE is noted for its high recall of known protective antigens, making it efficient in identifying safe and experimentally viable candidates27. The authors of NERVE have since published an updated version, NERVE 2.0 (https://nerve-bio.org/home), which we have included in our benchmarking to evaluate its performance against other state-of-the-art tools28. Vaxign was the first web-based RV tool, and Vaxign2 enhances this with machine learning capabilities. Vaxign and Vaxign2 (https://violinet.org/vaxign2) offers comprehensive framework for vaccine design, including predictive and post-prediction analysis components29. Furthermore, known for its application in predicting vaccine candidates for various pathogens, VaxiJen (https://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) is widely used in RV30. It has been particularly applied to SARS-CoV-2, although experimental validation of its predictions is limited31. VacSol (https://sourceforge.net/projects/vacsol/) automates the prediction of vaccine candidates using a high-throughput approach. It efficiently screens bacterial proteomes and reduces false positives, making it a cost-effective tool for vaccine candidate identification32. Jenner-Predict focuses on host-pathogen interactions and pathogenesis, using functional domains to predict vaccine candidates. It has demonstrated better prediction accuracy compared to other tools, particularly in identifying non-cytosolic proteins involved in host-pathogen interactions33. Despite all of these pros, the above-mentioned current RV tools also face several technical and scientific limitations. Many RV tools, including VaxiJen and Jenner Predict, have low prediction accuracy, which limits their application in vaccine development. Only a small fraction of predicted candidates undergo experimental validation, which is crucial for confirming their potential as vaccines31,33,34. Some tools, such as NERVE, are designed to be user-friendly but still require significant expertise to install, run and interpret results effectively. This complexity can be a barrier for broader adoption27. Many tools focus on limited criteria, such as adhesin-likeliness, without considering other functional classes of proteins that may be involved in host-pathogen interactions and pathogenesis33. Tools like VacSol aim to reduce computational costs and time, but the efficiency of these processes can still be improved32. Moreover, most of the current RV tools like NERVE, Vaxign, and VacSol integrate various open-source bioinformatics tools and algorithms for protein analysis for screening of pathogen proteomes to identify potential vaccine candidates. Despite their utility, these tools often require internet access, local installations, and heavy computational resources, making them less accessible for researchers without advanced computational expertise or infrastructure.To address these limitations, we developed B-vac, an executable program that integrates a series of internally designed algorithms for protein sequence processing, comparison and vaccine target analysis. Unlike existing tools described earlier, B-vac is designed to improve prediction accuracy by employing a streamlined, specialized approach to vaccine targets prediction and analysis, reducing reliance on broad, less accurate criteria. It also prioritizes ease of use, requiring no internet connection, command-line execution, or advanced computational expertise. B-vac’s self-contained architecture utilizing Python in its core framework, and user-friendly interface make it accessible to a broader range of researchers, including those without extensive bioinformatics experience. By focusing on practical, efficient workflows and eliminating the need for external dependencies, B-vac facilitates the identification of potential vaccine candidates with greater reliability and accessibility.The predicted features in B-vac include protein subcellular localization, virulence factors, and epitope mapping among pathogen genomes, and sequence similarity to host (human) proteomes. Surface-exposed proteins, such as secreted proteins, fimbrial proteins, and outer membrane proteins, are crucial for vaccine development as they are accessible to the immune system. Studies have identified various surface proteins in pathogens like Streptococcus pneumoniae and Leptospira interrogans, which are promising vaccine targets due to their role in virulence and immune response elicitation35,36,37. In contrast, non-surface proteins are less suitable as they do not interact directly with host cells. Moreover, vaccine candidates should include virulence factors to elicit strong immune responses. Proteins that contribute to a pathogen’s virulence, such as adhesins, exoenzymes, and toxins, are essential for effective vaccines. These factors ensure a strong immune response, making them ideal candidates for vaccine development35,36,38. Additionally, effective vaccine targets should also avoid sequence similarity to host proteins to prevent autoimmunity. Identifying unique antigens that do not share homology with host proteins is critical to avoid autoimmunity. For instance, the Cp-P34 protein in Cryptosporidium is unique to the parasite and elicits immune responses, making it a potential vaccine candidate. These considerations are integral to the B-vac pipeline39. The overall architecture of B-vac pipeline is given in Fig. 1.Fig. 1Overall architecture of B-vac pipeline.Full size imageB-vac implementationB-vac is written in Python v3.10.8, with its graphical user interface (GUI) developed using the Tkinter v8.6.12 library, which is a standard Python library for creating simple and user-friendly desktop interfaces. To ensure compatibility and ease of use on Windows and Linux (Ubuntu) platforms, it is compiled using PyInstaller v6.10.0, a tool that packages Python applications into standalone executables, allowing them to run without requiring a separate Python installation. The pipeline integrates extensive pre-saved datasets critical for reverse vaccinology. These datasets include protein FASTA files for each bacterial strain, specifically containing secreted, outer membrane, and fimbrial proteins, downloaded from the LocTree3 (http://www.rostlab.org/services/loctree3)40, for protein localization filtering, 916 CD4 + epitopes and 1659 CD8 + epitopes across multiple HLA alleles, stored in CSV format obtained from IEDB database v3 (accessed on March 13, 2025, https://www.iedb.org/), and 27,502 virulence factors obtained from the Virulence Factors Database (https://www.mgc.ac.cn/VFs/) with their corresponding IDs and protein fasta sequences (accessed on September 12, 2022)41,42,43. Additionally, it includes 67,297 B-cell linear epitopes in FASTA format obtained from IEDB and the human reference proteome downloaded from Uniprot (accessed on October 5, 2022, https://www.uniprot.org/) for non-host homologs analysis41,43.B-vac is optimized for local execution without internet dependency. Testing was performed on two systems; an Intel i5-8350U CPU (1.70 GHz base / 1.90 GHz max) quadcore processor with 8 GB RAM running Windows 11, and an Intel i5-4570 CPU (3.20 GHz) quadcore processor with 4 GB RAM running Ubuntu 22.04.2 LTS. The pipeline supports batch processing of multiple protein sequences, with processing times averaging 20 min for 100 proteins under default parameters. B-vac’s architecture utilizes pre-saved datasets to enable local, resource-efficient processing of protein data. The GUI provides adjustable parameters (e.g., sequence identity thresholds, epitope lengths) and dynamically displays results, including filtered proteins, virulence factors, and mapped epitopes. By eliminating cloud dependencies and offering offline compatibility, B-vac streamlines strain-specific vaccine candidate identification while maintaining low memory overhead (