Genomic variants, such as single nucleotide polymorphisms (SNPs) and DNA insertions and deletions (also known as indels), are identified by various variant analysis pipelines with combinations between short read aligners and variant callers. The low concordance of variant-calling data analysis pipelines also prompts the clinical genomics community to seek for standardization of performance bench marking of the pipelines. Rasa’s variant-calling data analysis services consists a systematic comparison of variant calling performance is based on gold standard data set of reference variant calls for variant annotations which is the final stage of our variant-calling data analysis services .
Rasa’s variant-calling data analysis services along with simple variant analysis; compares multiple variant calling pipelines based on positive predictive value; (PPV; also known as precision) and sensitivity (also known as recall) for a single sequence data set.
Importantly, results from the variant-calling data analysis services can indicate significant variation across the pipelines, suggesting the need for a further detailed analysis. Rasa’s variant-calling data analysis services consists of two phase: First, previous studies are used for analyzing only a single data set. Thus, data-specific effects will not be excluded. Second, these studies specifically measures PPV and sensitivity, separately, to benchmark performance. Thus, a difference in false positive rate between high score variants and low score variants is not reflected in a single bench marking score, such as, the area under a precision-recall curve (APR), which reflects the intrinsic trade-off between precision (i.e., PPV) and recall (i.e., sensitivity), providing a more informative performance score.