Documentation
Your complete guide to installing Virus-Mapper, preparing input files, and understanding the annotated results.
Installation via Docker
The recommended way to run Virus-Mapper is with Docker. This method bundles all dependencies into a single package, ensuring it runs reliably on any system (Windows, macOS, or Linux) without complex setup.
Step 1: Install Docker Desktop
If you don't already have it, you'll need to install Docker Desktop. It's a free application that manages containers on your local machine.
Download it from the official Docker website and follow the installation instructions for your operating system.
Step 2: Open a Terminal
The next steps involve running commands in a terminal (also known as a command line or shell). Here’s how to open one:
On Windows: Press the Windows Key, type
PowerShell
orcmd
, and press Enter.On macOS: Open Spotlight Search (Cmd + Spacebar), type
Terminal
, and press Enter. You can also find it inApplications > Utilities
.On Linux: Press Ctrl + Alt + T or search for "Terminal" in your applications menu.
Step 3: Create a Persistent Data Folder (Volume)
With your terminal open, run this command. This creates a dedicated, safe location for the application's data, such as your license key and downloaded databases. This ensures your data is not lost when you update the application.
docker volume create pangenome_app_data
Step 4: Run the Virus-Mapper Container
Copy and paste the following command into your terminal and press Enter. This will download the latest version of Virus-Mapper and start the web server.
docker run -d \
--restart unless-stopped \
-p 8787:8787 \
-v pangenome_app_data:/data/.pangenome_projector_webui \
--name pangenome-projector \
milan5678/pangenome-projector:latest
What this command does:
-d
: Runs the application in the background.--restart unless-stopped
: Ensures the app automatically restarts if your computer reboots.-p 8787:8787
: Makes the application available on your computer at port 8787.-v ...
: Connects the persistent data folder you created in Step 3.--name ...
: Gives the running container a memorable name.milan5678/...
: The name of the official application image on Docker Hub.
Step 5: Access the Web Interface
Once the command finishes, open your web browser and navigate to:
http://localhost:8787
Important: For this to work, you must have your Docker Desktop application running. The web interface is only available while the Docker container is active.
Early Access & Activation
Virus-Mapper is currently in an Early Access phase. During this period, full access to the application's analysis features requires an activation key.
How to Get a Key
Keys are provided on a request-only basis to early access users and collaborators. Please contact us directly via the support page to request your key.
How to Activate
Upon launching the application for the first time, you will be greeted with a license activation screen. To unlock the tool, simply paste your provided key into the input field and click "Activate".
Your key is saved securely in the persistent Docker volume you created, so you only need to perform this activation step once.
Managing the Application
Checking the Status
To see if the container is currently running, use this command in your terminal:
docker ps
You should see an entry with the name pangenome-projector
in the list.
Stopping the Application
To stop the server, run:
docker stop pangenome-projector
Updating to the Latest Version
Updating is easy and preserves all your data and settings.
- Stop the current container if it's running:
- Pull the latest version of the application image:
- Start the container again using the same
docker run
command from Step 4 above. Docker will automatically use the new version you just downloaded.
docker stop pangenome-projector
docker pull milan5678/pangenome-projector:latest
System Requirements & Performance
Virus-Mapper is designed for performance, but its memory usage depends on the size of your input files. Here are some guidelines to ensure a smooth experience.
Memory (RAM)
RAM is the most critical resource. The total memory required is a sum of the graph's size and the results from your VCF file.
Pangenome Graph: The entire graph is loaded into memory. As a rule of thumb, you should have 1.5x to 2x the size of your ODGI graph file as available RAM. Multi-gigabyte graphs are supported if your system has sufficient memory.
VCF File: While the VCF is streamed initially, all final results are collected in memory before being written to the output file. This can be significant. Based on our tests, a 50 MB VCF file can increase peak memory usage by over 1.5 GB. Please be mindful of this when processing very large VCFs.
CPU
Virus-Mapper will automatically use all available CPU cores to parallelize the analysis. More cores will result in a faster analysis time.
Input File Requirements
The tool requires three key inputs. The UI's "Validate Files" button runs a pre-flight check to ensure your files are compatible before you start a time-consuming analysis.
1. VCF File (Your Variants)
A VCF (Variant Call Format) file is a standard text file that lists genetic variants. Think of it as a list of differences found between your sample's DNA and a standard reference genome.
2. Pangenome Graph (The Advanced Reference)
Instead of a single linear reference, a pangenome graph represents the genetic variation from many individuals. This allows for a more comprehensive analysis of your variants.
3. Reference FASTA (The Baseline Genome)
A FASTA file contains the DNA sequence of a linear reference genome. This is used as the "ground truth" to interpret variants from the VCF file, especially complex structural variants.
Download Test Data
To get started right away, you can download this compatible set of COVID-19 test files. Use these files to familiarize yourself with the application's workflow.
Filtering Your Variants
Before running the main analysis, you can pre-filter your VCF file to focus only on the variants that are most relevant to your research. This saves time and simplifies the final results.
Using the Filter Builder
The user interface provides an intuitive rule builder. After you upload a VCF file, the "Select Field..." dropdown will populate with all available fields from your VCF header.
- Click "Add Filter Rule" to create a new line.
- Use the dropdowns to select a Field (e.g., `QUAL` or `INFO.DP`) and an Operator (e.g., `>=`).
- Enter a Value in the text box (e.g., `50`).
- You can add multiple rules, which will be combined with a logical "AND".
Common Filter Examples
- High-Quality Variants:
QUAL >= 50
- Sufficient Read Depth:
INFO.DP > 20
(DP must be a field in your VCF) - PASS Variants Only:
FILTER == "PASS"
The full expression is built for you in the text box at the bottom, which is then passed to the analysis engine.
Understanding Your Results
The output is an annotated VCF file. Your original VCF data is preserved, with new fields added to the INFO
column to provide the analysis results.
G_ALLELE_STATE
- This is the main classification. It will be
KNOWN
if the variant's exact alternative allele was found in the pangenome graph. If not, it will be marked asNovel
followed by a reason, such asNovel (Reference Mismatch)
orNovel (No valid alternate path found)
. G_HAPLOTYPES
- For
KNOWN
variants, this is a comma-separated list of all the samples or paths in the pangenome graph that also contain this exact variant. This helps you understand how common the variant is within the pangenome cohort. V_GENE
- (Optional) If viral annotation is enabled, this tag indicates which gene(s) the variant overlaps (e.g., "S", "NSP3").
V_FEAT
- (Optional) If annotation is enabled, this indicates the type of genomic feature the variant overlaps (e.g., "CDS" for a coding sequence, or "intergenic" if it's between genes).
About the Viral Annotation Feature
Data Source
The optional viral annotation feature uses data from the NCBI RefSeq Viral GenBank (GBFF) database. When you click the "Run One-Time Setup" button in the application, this comprehensive database is downloaded and stored locally in your persistent data volume.
How It Works
The process is designed to be fast and efficient:
- When the annotator is first used, it builds an in-memory index of all gene and feature locations from the downloaded GenBank file.
- For each variant in your VCF file, it checks if the variant's genomic position overlaps with any known features in the index.
- If a variant overlaps multiple features (e.g., both a gene and a CDS within that gene), it prioritizes the most functionally specific feature type (CDS is preferred over gene). If no features overlap, it is marked as "intergenic".
Future Development
This feature is currently in its initial version. We plan to expand its capabilities in the future based on user feedback, potentially including more detailed annotation types (like amino acid changes) or allowing users to provide their own custom annotation files. We welcome your suggestions!
Frequently Asked Questions (FAQ)
Is my data uploaded to a server?
No. Virus-Mapper runs 100% locally on your machine. Your VCF, GFA, and FASTA files are processed directly by the local server running in Docker and are never sent over the internet.
What does the "Novel (Reference Mismatch)" error mean?
This indicates an inconsistency in your input files. It means that the reference allele (REF) in your VCF file at a specific position does not not match the base found at that same position in your reference FASTA or the graph's reference path. Please ensure your VCF and FASTA are built against the same reference version.
Can I use this for non-viral genomes like human or plant data?
Yes, absolutely. While the tool is named Virus-Mapper, its core pangenome analysis engine is completely genome-agnostic and works powerfully for any species. The viral annotation feature is an optional add-on; you can simply leave it disabled for non-viral projects.