All | Alignment | Assemblers | ChIP | Copy Number Variants | Exome | Mate Pair | Methylation | Microbiome | Pathway Analysis | Proteins | RNA | SNP/SNV | Structural Variants | Visualization | R Packages | SAS Macros | Survival Analysis
armitage
armitage trend test for trait and SNP dosage Authors: Jason Sinnwell (primary contact) Dan Schaid Link: armitage_0.2.1.tar.gz Language/Platform: R
arp.gee
Generalized Estimating Equations for Affected Relative Pairs Authors: Dan Schaid Jason Sinnwell Link: arp.gee_0.1.1.tar.gz Language/Platform: R
Arsenal
An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple […]
Attribrisk
Population Attributable Risk Estimates population (etiological) attributable risk for unmatched, pair-matched or set-matched case-control designs and returns a list containing the estimated attributable risk, estimates of coefficients, and their standard errors, from the (conditional, If necessary) logistic regression used for estimating the relative risk. Authors: Beth Atkinson (primary contact) Louis Schenck Cindy Crowson Terry Therneau […]
bdsmatrix
Routines for Block Diagonal Symmetric matrices This is a special case of sparse matrices, used by coxme Authors: Terry Therneau Available at: https://cran.r-project.org/web/packages/bdsmatrix/index.html Language/Platform: R
bilinear.fit
bilinear regression to 16O/18O isotope label experiments Authors: Doug Mahoney (primary contact) Jeanette Eckel-Passow Available at: bilinear.fit.tar.gz Language/Platform: R
BIMA
A mapping/alignment customized for mate-pair library next generation sequencing.
BioR Toolkit – Old Versions
BioR Toolkit – Old Versions Warning! These versions contain a critical tabix-related bug that cause a small percentage of regions to “miss” when using bior_overlap and bior_same_variant against some catalogs. Please use one of the fixed versions HERE. These old versions are maintained here for archive and re-creation purposes only, and should NOT be used […]
BioR: Rapid, Flexible System for Genomic Annotation
A toolkit and set of catalogs to retrieve genomic annotation for variants, genes, diseases, conditions, genetic tests, and drugs.
bnmlci
Exact confidence intervals for a proportion.
boot
Select bootstrap samples.
CAP-miRSEQ
miRNAs play a key role in normal physiology and various diseases such as cancer. However, analyzing miRNA sequencing data is challenging due to the requirement of significant computational resources and bioinformatics expertise. To address this, we present a comprehensive analysis pipeline for deep microRNA sequencing (CAP-miRSeq) that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibilities, and practical use in research and biomarker discovery.
ChIP-RNA-seqPRO
ChIP-RNA-seqPRO is a resource motivated by this current need and provides a strategy that enables the user to profile regulatory associations between epigenomic modifications and co/post-transcriptional processes.
Circ-Seq
Circ-Seq: A comprehensive bioinformatics workflow for detecting circular RNAs Circular RNAs (circRNAs) are recently discovered members of the noncoding RNA family that range in length from a few hundred to thousands of nucleotides. In contrast to linear RNA transcripts, which are normally spliced tail-to-head, circRNAs are formed by the covalent bonding of their 3´ and […]
CNVNator
Structural Variations (SVs) and Copy Number Variations (CNVs) are the major source of genomic variations. CNVnator is a tool for Copy Number Variation (CNV) discovery and genotyping from depth-of-coverage by mapped reads. It accepts .bam files as input and generates CNVs calls in less than 10 hours of calculations. The source code and extended descriptions […]
comprisk
Cumulative incidence in the presence of competing risks.
coxme
Mixed Effects Cox Models Cox proportional hazards models containing Gaussian random effects, also known as frailty models. Authors: Terry Therneau Available at: https://cran.r-project.org/web/packages/coxme/index.html Language/Platform: R
criskcox
Competing risk survival analysis with covariates.
deming
Deming, Thiel-Sen and Passing-Bablock Regression Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions. Authors: Terry Therneau Available at: https://cran.r-project.org/web/packages/deming/index.html Language/Platform: R
dist
Estimates the distance matrix between two groups (e.g. cases and potential controls) on the basis of a set of X’s.
eSNV-Detect
eSNV-Detect v1.0: Reliable Identification of Variants Using RNA-seq Data
Exogene
A workflow for detecting viral integrations and viral presence from sequencing data.
Ezimputer
Ezimputer is an impute2-based genotype imputation workflow that greatly simplifies the process of imputation and achieves a significant speedup of imputation using multiple CPUs on a computer cluster.
fastlo
Fast Loess Authors: Doug Mahoney Jeanette Eckel-Passow Ann Oberb Link: fastlo_1.3.tar.gz Language/Platform: R
findcut
Uses the method of Contal and O’Quigley (1999) to find the best cutpoint in a continuous variable with regards to a survival outcome.
Fusion-sense
A tool used to calculate the estimated sensitivity of fusion finding for an RNA-seq experiment. It plots the estimated sensitivity as a function of the distance to the 3’ end and also calculates the decay rate for the sample.
GeneSetScan
GeneSetScan is a pre-compiled binary for 64-bit linux systems. It offers a general approach to scan genome-wide SNP data for gene-set association analyses.
GenomeSmasher
GenomeSmasher is a set of tools used to create diploid FASTA files with containing snps, indels, duplications, deletions and translocations.
gmatch
Computerized matching of cases to controls using the greedy matching algorithm
Guide
Required Elements Name of project/tool Short description of project/tool (1-3 sentences) Authors, primary contact Suggested Tags Link to source code & data (tarred/gzipped if large) Other elements you may want to include Date last updated (?) Longer Description User manual Links to Publications for the software System requirements Licensing information If you want to deploy an […]
haplo.stats
Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous This software offers a suite of R routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (because of unknown linkage phase of the genetic markers). The genetic markers are […]
HGT-ID
HGT-ID v1.0: An efficient and sensitive program for detecting viral insertion sequences in the genome of human cancers
HiChIP Pipeline
HiChIP: A high-throughput pipeline for integrative analysis of ChIP-Seq data HiChIP pipeline is designed for performing comprehensive analysis of chromatin immunoprecipitation and sequencing (ChIP-Seq) data. It can be used to analyze profiles from transcription factor binding, histone modifications, histone variants, and chromatin regulators. Paired-end and single-end NexGen sequencing data from ChIP experiment with different antibodies, […]
hwe
Hardy-Weinberg Equilibrium Tests Test the fit of genotype frequencies to Hardy-Weinberg Equilibrium proportions for autosomes and the X chromosome. Different statistical tests are provided, as well as an option to evaluate statistical significance by either exact methods or simulations Authors: Jason Sinnwell (primary contact) Dan Schaid Dan Folie Link: hwe_0.3.1.tar.gz Language/Platform: R
Hybrid-Denovo
Microbiota pipeline that utilizes and integrates information from a mix of both paired-end and single-end reads.
ibdreg
Regression Methods for IBD Linkage With Covariates A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: https://cran.r-project.org/web/packages/ibdreg/index.html Language/Platform: R
ICQ-lincRNA
ICQ-lincRNA (Identification, Characterization, and Quantification of Long Intergenic Non-Coding RNAs), offers an end-to-end solution to identify and annotate expressed lincRNAs in next generation RNA sequencing data. Specifically, ICQ-lincRNA: Conducts ab-initio genome-wide transcript assembly by both Cufflinks and Scripture using Binary Alignment/Map (BAM) files Conducts downstream quantitative analyses including gene count, exon count, overlap with known […]
jitplot
Produce gplot of continuous variable(y-axis) vs a group variable(x-axis) in such a way that no points are hidden.
kinship2
Pedigree Functions {Pedigree Functions description} Authors: Jason Sinnwell (primary contact) Terry Therneau Beth Atkinson Dan Schaid Available at: https://cran.r-project.org/web/packages/kinship2/index.html Language/Platform: R
ld.pairs
LD calculations on multi-allele, and SNP variants, including the composite-LD measure.
logisuni
Univariate logistic regression model summaries with multiple dependent variables and predictors.
MAP-RSeq
The MAP-RSeq workflow integrates a suite of open source bioinformatics tools along with in-house developed methods to analyze paired-end RNA-Seq data.
mccc
Computes Lin’s concordance correlation coefficient (CCC) for any number of raters.
mend.err
Check Pedigrees for Mendelian Errors Check Pedigrees for Mendelian Errors and, when errors are found, systematically jackknifes every typed pedigree member to determine if eliminating this member will remove all Mendelian Errors from the pedigree Authors: Jason Sinnwell (primary contact) Dan Schaid Dan Folie Link: mend.err_1.3.tar.gz Language/Platform: R
multic
Quantitative Linkage Analysis Tools using the Variance Components Approach Calculate the polygenic and major gene models for quantitative trait linkage analysis using the variance components approach. Authors: Pat Votruba (primary contact) Beth Atkinson Mariza de Andrade Available at: https://cran.r-project.org/web/packages/multic/index.html Language/Platform: R
nesttest
Conducts likelihood ratio tests for nested logistic and Cox proportional hazards models.
newsurv
Uses Graph Template Language to create a highly customizabile Kaplan-Meier curve.
nobs
This macro creates a macro variable containing the number of observations in a SAS dataset.
outsumm
Creates a single RTF file containing multiple tables created by %SUMMARY.
PANDA
The PANDA (Pathway AND Annotation) Explorer is a data visualization tool capable of annotating genes with any data type and graphically displaying the result within the context of pathways.
Panoply
PANOPLY, a novel computational approach to integrate both germline and somatic data obtained from multi-omics platforms for an individual of interest and analyze that data in the context matched-control samples.
PatternCNV
A versatile tool for detecting copy number changes from exome sequencing data.
pedgene
Gene-Level Statistics for Pedigree Data Gene-level association tests with disease status for pedigree data: kernel and burden association statistics. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: https://cran.r-project.org/web/packages/pedgene/index.html Language/Platform: R
pleio
Pleiotropy Test for Multiple Traits on a Genetic Marker Perform tests for pleiotropy of multiple traits of various variable types on genotypes for a genetic marker. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: https://cran.r-project.org/web/packages/pleio/index.html Language/Platform: R
plotcorr
Proc Plot with correlation/regression statistics appended.
plotmat
Create a scatterplot matrix graphically displaying the bivariate relationships between a number of variables.
Protein Panoramic annoTation Tool (P2T2)
P2T2 is a web-based platform for the annotation of proteins using population variants; experimentally determined functional and phenotype-associated variants; literature-mined variant-phenotype relationships; and structural bioinformatics features such as linear motifs, domains and experimental structures.
RNASeqPower
Sample size for RNAseq studies {Short description of project/tool (1-3 sentences)} Authors: Terry Therneau (primary contact) Steven Hart JP Kocher Available at: https://bioconductor.org/packages/release/bioc/html/RNASeqPower.html Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/ Language/Platform: R
rocplus
Estimated Integrated Discrimination Index (IDI) and Net Reclassification Improvement (NRI) for comparison of a new risk model to an old model.
rpart
Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone. Authors: Beth Atkinson Terry Therneau Brian Ripley (primary contact) Available at: https://cran.r-project.org/web/packages/rpart/index.html Language/Platform: R
RVboost
RVboost v0.1: RNA-seq variant prioritization approach for Illumina next-generation sequencing data.
SAAP-RRBS
The Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting, and visualization. With this package, bioinformaticians or investigators can submit sequencing reads and quickly receive a fully annotated CpG methylation report.
schoen
Schoenfeld residuals for proportional hazards model.
SMART
Sequential Multiple Assignment Randomized Trial (SMART) design includes multiple stages of randomization, where participants are randomized to an initial treatment in the first stage and then subsequently re-randomized between treatments in the following stage. Includes methods for mean and variance as a function of specificity/sensitivity, and power calculations.
SnowShoes-FTD
SnowShoes-FTD is a bioinformatics tool to identify fusion transcripts from paired-end transcriptome sequencing data.
SNPPicker
A post-processor to optimize the selection of tag SNPs from common bin-tagging programs. SNPPicker uses a multi-step search strategy in combination with a statistical model to produce optimal genotyping panels.
SoftSearch
SoftSearch is a sensitive structural variant (SV) detection tool for Illumina paired-end next-generation sequencing data.
SpatialNorm6
SpatialNorm6 Spatial normalization of Affymetrix SNP 6.0 cel file, which adjusts for spatial bias based on wavelet decomposition Authors: Chai High Seng (primary contact) Link: SpatialNorm6_1.1.tar.gz
Stress.dfArray
Stress.dfArray Calculates normalization Stress and dfArray quality for a set of arrays. Authors: Doug Mahoney (primary contact) Jeanette Eckel-Passow Available at: Stress.dfArray_1.1.tar.gz Language/Platform: R
summary
Creates a table of variable summaries plus test statistics for the difference between two or more independent samples.
SuperLearner
Super Learner Prediction Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner. Authors: Eric Polley (primary contact) LeDell van der Laan Kennedy Lendle Available at: https://cran.r-project.org/web/packages/SuperLearner/index.html https://github.com/ecpolley/SuperLearner Language/Platform: R
surv
Complete Kaplan-Meier survival analysis with printing options and logrank statistic.
survcstd
Calculates the c-statistic (concordance, discrimination index) for survived data with time dependent covariates
survival
Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models. Authors: Terry Therneau (primary contact) Available at: https://cran.r-project.org/web/packages/survival/index.html https://github.com/therneau/survival Language/Platform: R
survlrk
Calculates logrank statatistics for the surv macro.
survlt
General survival statistics p(t), standard error, confidence limits, and median survival time, for the left-truncated survival analyses.
survplot
Creates high-quality and easily customized Kaplan-Meier plots.
symmchk
Checks for symmetry and suggests the best power transformation, if one exists, to make an asymmetric distribution symmetric
ToxT
ToxT is a combination of methods to analyze TOXicity data over Time. It can be used for adverse event data and for patient reported outcomes. It includes longitudinal mixed models, bug plots, time-to-event analyses, heatmaps, area under the curve analyses, profile analysis, comparisons at each time point, and plots over time.
Trace-RRBS
trace-rrbs v0.1: Targeted Alignment and Artificial Cytosine Elimination for RRBS for Illumina next-generation sequencing data.
TREAT
TREAT is a Targeted RE-sequencing Annotation Tool that offers a comprehensive, open framework, end-to-end solution for analyzing and interpreting targeted re-sequencing data.
trex
trex Package that calculates a truncated exact test for two-stage case-control studies for rare genetic variants. The first stage is for screening rare variants in only cases. If the number of case-carriers of any rare variants exceeds a user-specified threshold, then additional cases and controls are genotyped for the detected variants and carrier status of […]
uagreemt
Measures agreement, precision, accuracy, total deviation index and coverage probability.
UCLncR Pipeline
The Ultrafast and Comprehensive lncRNA detection (UClncRNA) pipeline leverages fast transcript assembly and parallel computing tools, multi-step filters for increased specificity to provide comprehensive lncRNA characterization.
VCF-Miner
The Variant Call Format (VCF) is the de facto standard for storing variant information from next-generation DNA sequencing experiments.
vmatch
Computerized matching of cases to controls using variable optimal matching.
WANDY
Wandy: A program for CNV/Aneuploidy detection from WGS sequencing data Introduction Wandy is designed for Copy Number Variation (CNV) and Aneuploidy detection from large genomes such as human. It takes a sorted BAM file as input and report predicted chromosome regions that have amplifications or deletions using LOG2 ratio, generate graphic reports. There are […]