My Specializations​ – drjibinjohn.com

drjibinjohn.com

My Specialization

/

Advanced bioinformatics and data analyses across genomics, transcriptomics, and population-level studies.

🖥️ R & Python Programming

Expert-level programming in R and Python for bioinformatics, statistical analysis, and pipeline development, with extensive use of specialized libraries and frameworks.
Key Methods & Techniques:
  • R (Bioconductor, tidyverse, Shiny)
  • Python (pandas, NumPy, scikit-learn, PyTorch, TensorFlow)
  • Workflow automation & reproducible research
  • Scripting for high-performance computing (HPC & cloud)
  • 🧬 Exome & Whole-Genome Sequencing

    Comprehensive analysis of coding and non-coding variants to identify clinically relevant findings.
    Key Methods & Techniques:
  • Raw data QC, alignment & post-alignment QC
  • Somatic & germline SNVs, indels, and CNV calling
  • Variant QC & annotation
  • Variant interpretation (ACMG/AMP, ASCO/CAP)
  • HPO-based variant prioritization
  • Pedigree/segregation analysis, trio analysis, carrier screening
  • 🌐 Genome-Wide Association Studies (GWAS)

    Statistical analysis of genetic variants across the genome to identify loci associated with complex traits and diseases.
    Key Methods & Techniques:
  • Data QC & preprocessing
  • Association testing (single variant & gene-based)
  • Covariate adjustment
  • Family-based GWAS methods
  • 🧮 Polygenic Risk Scores (PRS)

    Development and application of PRS models for predicting disease susceptibility and treatment response.
    Key Methods & Techniques:
  • PRS calculation (PRSice, LDpred, PRS-CS)
  • Cross-ancestry PRS transferability
  • Validation in independent cohorts
  • Integration with clinical covariates
  • 📖 De novo Transcriptome Assembly & Annotation

    Reference-free reconstruction of transcriptomes for organisms lacking a genome assembly.
    Key Methods & Techniques:
  • De novo transcriptome assembly (Trinity, Oases)
  • Isoform discovery & quantification
  • Transcriptome completeness assessment (BUSCO)
  • Functional annotation (BLAST, InterProScan, GO, KEGG)
  • 🧫 Single-Cell RNA Sequencing

    High-resolution analysis of gene expression at the single-cell level.
    Key Methods & Techniques:
  • Cell type clustering & annotation
  • Trajectory/pseudotime analysis
  • Differential expression in single cells
  • Batch correction & dataset integration
  • 🧩 Bacterial GWAS

    Genome-wide association studies in bacteria using diverse genetic features.
    Key Methods & Techniques:
  • SNP-based bacterial GWAS
  • Indels, k-mers, unitigs, orthologous genes
  • Alignment-based & k-mer–based approaches
  • Population structure correction
  • 🧪 Epigenomics & Chromatin Analysis

    Profiling epigenetic modifications and chromatin accessibility.
    Key Methods & Techniques:
  • ChIP-seq, CUT&RUN, CUT&Tag analysis
  • ATAC-seq for chromatin accessibility
  • DNA methylation (arrays & WGBS)
  • Differential methylation analysis
  • ⚙️ Automated Pipeline Development

    Design and implementation of reproducible and scalable pipelines for omics data analysis.
    Key Methods & Techniques:
  • Nextflow, Snakemake, Cromwell/WDL
  • Containerization (Docker, Singularity)
  • Workflow deployment on HPC & cloud systems
  • Continuous integration & automated testing
  • 💻 Scientific Computing

    High-performance and cloud-based computing for large-scale data analysis.
    Key Methods & Techniques:
  • Parallel & distributed computing (MPI, Dask, Spark)
  • HPC job scheduling (SLURM, PBS)
  • Cloud platforms (AWS, GCP, Azure)
  • Scalable storage & resource optimization
  • 📊 Data Visualization

    Advanced visualization techniques for genomic and biological data using modern plotting libraries and interactive visualization tools.
    Key Methods & Techniques:
  • ggplot2, plotly, seaborn, matplotlib
  • Interactive dashboards (Shiny, Dash, Streamlit)
  • Genome browser integration
  • Publication-ready visualizations
  • 🔍 Rare Variant Burden Testing

    Statistical methods to evaluate the cumulative effect of rare variants within genes or genomic regions on disease risk.
    Key Methods & Techniques:
  • Burden tests, SKAT, SKAT-O
  • Collapsing methods
  • Family-based & population-based testing
  • 📈 Post-GWAS Analysis

    Downstream functional interpretation of GWAS findings to link genetic associations with biology.
    Key Methods & Techniques:
  • Functional annotation of variants
  • Gene mapping & gene-based association testing
  • Pathway & gene set enrichment analysis
  • Meta-analysis
  • Cross-trait & pleiotropy analysis
  • Heritability estimation & partitioned heritability
  • Statistical fine-mapping & functional priors
  • eQTL & colocalization analysis
  • Mendelian Randomization (MR)
  • Visualization (Manhattan plots, LocusZoom)
  • 🎶 RNA-seq Data Analysis

    Comprehensive transcriptomic analysis from bulk RNA sequencing data.
    Key Methods & Techniques:
  • Raw data QC, alignment, & post-alignment QC
  • Quantification & differential expression analysis
  • Pathway enrichment & over-representation analysis
  • Alternative splicing analysis
  • Batch correction & removal of unwanted variation
  • Gene fusion detection
  • Weighted gene co-expression analysis (WGCNA)
  • Allele-specific expression
  • 🔗 Multi-Omics Integration

    Integrative analysis of genomics, transcriptomics, epigenomics, proteomics, metabolomics and other omics layers to obtain a holistic view of biological systems. This approach helps uncover interactions between different molecular layers, improve disease classification, elucidate regulatory mechanisms, and increase power and resolution beyond single-omics studies.
    Key Methods & Techniques:
  • Cross-omics correlation and network construction (co-expression, co-methylation, multi-omics networks)
  • Multi-view data integration methods (e.g. MOFA, DIABLO, joint NMF)
  • Dimensionality reduction and latent factor modelling
  • Regulatory inference (linking epigenetic marks to gene expression and downstream effects)
  • Multi-omics clustering and subtype discovery
  • 🦠 Microbial Genome Assembly & Annotation

    Complete microbial genome assembly, annotation, and comparative genomics.
    Key Methods & Techniques:
  • De novo genome assembly & quality control
  • Genome annotation pipelines
  • Replicon, integron, transposon, prophage & plasmid identification
  • Phylogroup & MLST typing
  • Phylogenetic tree construction
  • Virulence & antimicrobial resistance gene detection
  • Stress/heat/salt resistance genes, biofilm genes, heavy metal resistance genes
  • 🌍 16S/18S/ITS & Whole-Genome Metagenomics

    Comprehensive microbial community analysis using amplicon and shotgun sequencing.
    Key Methods & Techniques:
  • 16S, 18S & ITS amplicon analysis
  • Shotgun metagenomics
  • Taxonomic profiling (Kraken, MetaPhlAn)
  • Functional profiling (HUMAnN, eggNOG)
  • Community structure & diversity metrics
  • 🌳 Phylogenetic Analysis

    Evolutionary inference using molecular sequences and genomes.
    Key Methods & Techniques:
  • Sequence alignment & phylogenetic tree construction
  • Molecular clock models
  • Comparative genomics
  • Phylogenetic placement & species identification
  • 🤖 Machine Learning

    Application of machine learning for predictive modeling in genomics and biomedicine.
    Key Methods & Techniques:
  • Supervised & unsupervised learning
  • Feature selection & dimensionality reduction
  • Deep learning (CNNs, RNNs, transformers)
  • Model validation & interpretation (SHAP, LIME)