0+
Public Projects
// Open to Data + AI Opportunities
Hi, I'm Sumant Jadiyappagoudar — a data analyst specialising in machine learning, SQL analytics, Tableau dashboards, and bioinformatics pipelines.
> |
0+
Public Projects
0%
Best Model Accuracy
0K
Records Analysed
0+
Tools & Frameworks
// Technical Stack
// Featured Builds
NLP + Machine Learning
Python · scikit-learn · TF-IDF · DistilBERT · FastAPI · Streamlit · NLTK · pandas
Built and deployed an NLP classifier on 20,000 Amazon Fine Food Reviews, achieving 91% accuracy and 0.954 ROC-AUC with TF-IDF, Logistic Regression, class-balanced training, and GridSearchCV optimisation.
Benchmarked against a DistilBERT baseline, built FastAPI and Streamlit inference flows, and documented error-analysis findings around sarcasm, short reviews, and class-level F1 performance.
Pharmacovigilance Analytics
Python · pandas · Plotly · Streamlit · SQL · SciPy · PRR
Analysed 528,000 FDA FAERS adverse-event records from 2015–2025 and applied Proportional Reporting Ratio methodology to identify 8,920 disproportionate drug-reaction signals, including 1,403 high-priority pairs.
Built a reproducible pipeline from raw CSVs to a 218,977-report serious-event subset, then developed a Streamlit dashboard with dynamic Plotly views.
E-commerce Analytics
Python · DuckDB · pandas · XGBoost · SHAP · Streamlit · Plotly
Built an end-to-end analytics project on 99,441 Brazilian e-commerce orders, joining 9 Olist datasets into a DuckDB star schema and engineering RFM, delivery, payment, review, and purchase-behaviour features.
Developed customer segmentation, churn prediction, and 12-month CLV modelling, then shipped a Streamlit dashboard with segment explorer, churn-risk, CLV, and business-summary views.
Bioinformatics Automation
Python · R · FastAPI · PostgreSQL · n8n · Plotly · JavaScript
Engineered an end-to-end bioinformatics pipeline processing 10,000–20,000 genes per dataset using differential gene expression analysis, empirical Bayes methods, and a 0.05 p-value cutoff.
People Analytics
Python · scikit-learn · XGBoost · pandas · SMOTE · seaborn
Built an ML pipeline on the IBM HR Analytics dataset to predict employee attrition with 86.7% accuracy, using XGBoost and SMOTE for 16.1% class imbalance.
Business Intelligence
SQL · PostgreSQL · Power BI · Data Modeling · DAX
Designed a complete analytics pipeline from raw CSV data to interactive Power BI dashboards. Built DAX measures and data models for revenue, profit margin, and monthly trend drill-downs.
YouTube Analytics
Python · YouTube Data API v3 · PostgreSQL · pandas · SQLAlchemy · seaborn
Built an automated ETL pipeline that extracts India's trending YouTube videos across 5 categories, enriches them with engagement metrics, and loads clean records into PostgreSQL.
Collected 383 unique videos across 35 successful pipeline runs, then analysed category performance, channel-size effects, title patterns, video duration, and engagement behaviour.
Bioinformatics
Python · Biopython · BioPandas
COVID-19 sequence analysis pipeline built with Biopython for biological pattern exploration and comparative genomics.
Bioinformatics
Python · Biopython · Needleman–Wunsch
Sequence alignment experiments comparing biological sequences with reproducible, peer-reviewable methods.
Tableau
Tableau Public · KPI Design · Filters · Storytelling
Built an interactive Tableau dashboard with clear KPI views, filters, and narrative visuals for stakeholder-friendly reporting.
// Live Dashboard
// About Me
I specialise in creating data products that are not only accurate, but useful for real decisions. My work combines machine learning, robust analysis, and communication that stakeholders can act on.
class DataAnalyst:
name = "Sumant J"
focus = [
"Analytics",
"Machine Learning",
"Bioinformatics",
]
tools = {
"query": "SQL",
"visualise": "Tableau",
"model": "scikit-learn",
}
def value(self):
return "insight → action"
// Let's Build
If you're hiring for data, analytics, or AI-focused roles, I'd love to connect and discuss how I can contribute to your team.