My Specializations – drjibinjohn.com

My Specialization

/

Advanced bioinformatics and data analyses across genomics, transcriptomics, and population-level studies.

🖥️ R & Python Programming

Expert-level programming in R and Python for bioinformatics, statistical analysis, and pipeline development, with extensive use of specialized libraries and frameworks.
Key Methods & Techniques:

R (Bioconductor, tidyverse, Shiny)

Python (pandas, NumPy, scikit-learn, PyTorch, TensorFlow)

Workflow automation & reproducible research

Scripting for high-performance computing (HPC & cloud)

🧬 Exome & Whole-Genome Sequencing

Comprehensive analysis of coding and non-coding variants to identify clinically relevant findings.
Key Methods & Techniques:

Raw data QC, alignment & post-alignment QC

Somatic & germline SNVs, indels, and CNV calling

Variant QC & annotation

Variant interpretation (ACMG/AMP, ASCO/CAP)

HPO-based variant prioritization

Pedigree/segregation analysis, trio analysis, carrier screening

🌐 Genome-Wide Association Studies (GWAS)

Statistical analysis of genetic variants across the genome to identify loci associated with complex traits and diseases.
Key Methods & Techniques:

Data QC & preprocessing

Association testing (single variant & gene-based)

Covariate adjustment

Family-based GWAS methods

🧮 Polygenic Risk Scores (PRS)

Development and application of PRS models for predicting disease susceptibility and treatment response.
Key Methods & Techniques:

PRS calculation (PRSice, LDpred, PRS-CS)

Cross-ancestry PRS transferability

Validation in independent cohorts

Integration with clinical covariates

📖 De novo Transcriptome Assembly & Annotation

Reference-free reconstruction of transcriptomes for organisms lacking a genome assembly.
Key Methods & Techniques:

De novo transcriptome assembly (Trinity, Oases)

Isoform discovery & quantification

Transcriptome completeness assessment (BUSCO)

Functional annotation (BLAST, InterProScan, GO, KEGG)

🧫 Single-Cell RNA Sequencing

High-resolution analysis of gene expression at the single-cell level.
Key Methods & Techniques:

Cell type clustering & annotation

Trajectory/pseudotime analysis

Differential expression in single cells

Batch correction & dataset integration

🧩 Bacterial GWAS

Genome-wide association studies in bacteria using diverse genetic features.
Key Methods & Techniques:

SNP-based bacterial GWAS

Indels, k-mers, unitigs, orthologous genes

Alignment-based & k-mer–based approaches

Population structure correction

🧪 Epigenomics & Chromatin Analysis

Profiling epigenetic modifications and chromatin accessibility.
Key Methods & Techniques:

ChIP-seq, CUT&RUN, CUT&Tag analysis

ATAC-seq for chromatin accessibility

DNA methylation (arrays & WGBS)

Differential methylation analysis

⚙️ Automated Pipeline Development

Design and implementation of reproducible and scalable pipelines for omics data analysis.
Key Methods & Techniques:

Nextflow, Snakemake, Cromwell/WDL

Containerization (Docker, Singularity)

Workflow deployment on HPC & cloud systems

Continuous integration & automated testing

💻 Scientific Computing

High-performance and cloud-based computing for large-scale data analysis.
Key Methods & Techniques:

Parallel & distributed computing (MPI, Dask, Spark)

HPC job scheduling (SLURM, PBS)

Cloud platforms (AWS, GCP, Azure)

Scalable storage & resource optimization

📊 Data Visualization

Advanced visualization techniques for genomic and biological data using modern plotting libraries and interactive visualization tools.
Key Methods & Techniques:

ggplot2, plotly, seaborn, matplotlib

Interactive dashboards (Shiny, Dash, Streamlit)

Genome browser integration

Publication-ready visualizations

🔍 Rare Variant Burden Testing

Statistical methods to evaluate the cumulative effect of rare variants within genes or genomic regions on disease risk.
Key Methods & Techniques:

Burden tests, SKAT, SKAT-O

Collapsing methods

Family-based & population-based testing

📈 Post-GWAS Analysis

Downstream functional interpretation of GWAS findings to link genetic associations with biology.
Key Methods & Techniques:

Functional annotation of variants

Gene mapping & gene-based association testing

Pathway & gene set enrichment analysis

Meta-analysis

Cross-trait & pleiotropy analysis

Heritability estimation & partitioned heritability

Statistical fine-mapping & functional priors

eQTL & colocalization analysis

Mendelian Randomization (MR)

Visualization (Manhattan plots, LocusZoom)

🎶 RNA-seq Data Analysis

Comprehensive transcriptomic analysis from bulk RNA sequencing data.
Key Methods & Techniques:

Raw data QC, alignment, & post-alignment QC

Quantification & differential expression analysis

Pathway enrichment & over-representation analysis

Alternative splicing analysis

Batch correction & removal of unwanted variation

Gene fusion detection

Weighted gene co-expression analysis (WGCNA)

Allele-specific expression

🔗 Multi-Omics Integration

Integrative analysis of genomics, transcriptomics, epigenomics, proteomics, metabolomics and other omics layers to obtain a holistic view of biological systems. This approach helps uncover interactions between different molecular layers, improve disease classification, elucidate regulatory mechanisms, and increase power and resolution beyond single-omics studies.
Key Methods & Techniques:

Cross-omics correlation and network construction (co-expression, co-methylation, multi-omics networks)

Multi-view data integration methods (e.g. MOFA, DIABLO, joint NMF)

Dimensionality reduction and latent factor modelling

Regulatory inference (linking epigenetic marks to gene expression and downstream effects)

Multi-omics clustering and subtype discovery

🦠 Microbial Genome Assembly & Annotation

Complete microbial genome assembly, annotation, and comparative genomics.
Key Methods & Techniques:

De novo genome assembly & quality control

Genome annotation pipelines

Replicon, integron, transposon, prophage & plasmid identification

Phylogroup & MLST typing

Phylogenetic tree construction

Virulence & antimicrobial resistance gene detection

Stress/heat/salt resistance genes, biofilm genes, heavy metal resistance genes

🌍 16S/18S/ITS & Whole-Genome Metagenomics

Comprehensive microbial community analysis using amplicon and shotgun sequencing.
Key Methods & Techniques:

16S, 18S & ITS amplicon analysis

Shotgun metagenomics

Taxonomic profiling (Kraken, MetaPhlAn)

Functional profiling (HUMAnN, eggNOG)

Community structure & diversity metrics

🌳 Phylogenetic Analysis

Evolutionary inference using molecular sequences and genomes.
Key Methods & Techniques:

Sequence alignment & phylogenetic tree construction

Molecular clock models

Comparative genomics

Phylogenetic placement & species identification

🤖 Machine Learning

Application of machine learning for predictive modeling in genomics and biomedicine.
Key Methods & Techniques:

Supervised & unsupervised learning

Feature selection & dimensionality reduction

Deep learning (CNNs, RNNs, transformers)

Model validation & interpretation (SHAP, LIME)