abhishek.kumar
Open to international remote & DevRel at AI-native teams

Abhishek
Kumar.

I’m a LLM engineer

Ten years of building production AI, mostly in the parts you don’t see — retrieval that doesn’t lie, evals that don’t lie either, and agents that politely admit when they’re wrong. Currently leading enterprise RAG delivery for a Tier-1 engagement; on weekends I run a 7,000-person practitioner community in Bangalore.

Abhishek Kumar sitting on stone steps in a white shirt and sunglasses, string lights and a tiled-roof building behind him. Black and white.
Bangalore-based. Internet-based.
Currently mid-thought.

A short, dated list of what’s top of mind.

Last updated: May 2026 · in the spirit of nownownow.com
  • JobPilot — a personal job-hunt copilot for the v0 global hackathon. Next.js + Neon Postgres + Apify on Vercel. Shipping the v0 submission this week.building
  • Porting the Superpowers plugin to Kiro IDE on Windows — pairing on Windows is a love story I’m still negotiating.porting
  • Publishing Claude Agent Skills: v0-prompting, humanizer, AI Ark, 360Brew profile optimizer, and a LinkedIn content one I keep tweaking at 2am.shipping
  • Quietly exploring international remote roles and DevRel positions at AI-native companies. (Yes, this is a hint.)listening
  • Reading: PageIndex paper, agentic eval harnesses, anything Anthropic’s research org puts on arxiv at 11pm IST.reading

Four outcomes, not a wall of skills.

I think people hire engineers for results, not for buzzword bingo. So here’s the work, grouped by what it actually produces.

01 / production

Production RAG systems

Compliance-grade retrieval pipelines that cite their sources and don’t make things up. Hybrid retrieval, reranking, chunking strategy that survives a 300-page PDF.

RAG (retrieval kind) Bedrock Qdrant FastAPI
02 / quality gates

LLM evaluation & quality gates

Eval harnesses that turn “vibes” into a number. RAGAS, TruLens, LLM-as-judge, regression suites that ship with every prompt change.

RAGAS TruLens DeepEval judge-LLMs
03 / agents

AI agent architectures

Multi-agent orchestration, tool calling, and MCP integrations — built so you can read the trace and understand exactly why it did what it did.

LangGraph MCP Anthropic SDK DSPy
04 / leadership

GenAI engineering leadership

Taking AI pilots from prototype to production with governance, cost controls, and quality gates. The unglamorous middle of the lifecycle, owned end-to-end.

pilot → prod governance cost controls EU AI Act

Six things I’ve shipped, roughly in order of recency.

Pulled from the resume. Each one has a working artifact behind it — happy to walk through eval results, retrieval configs, or the bits that almost shipped sideways.

2026
now

PIA Compliance RAG Pipeline

Production

“Claude reads 300 pages of privacy law and cites its sources.”

End-to-end production RAG for a Tier-1 enterprise engagement: Claude Sonnet on Bedrock, Qdrant for hybrid retrieval, FastAPI orchestration, RAGAS gates. Running a formal head-to-head of PageIndex vs chunked-RAG with reranking, measuring faithfulness, context precision, latency, and per-query cost — the decision framework other compliance workflows now follow.

Claude SonnetBedrockQdrantRAGASPageIndexFastAPI
2025

Enterprise Asset Classification at Scale

Production

“222 asset classes, structured outputs, a multi-week process killed.”

LLM classifier on Claude/Bedrock for a regulated enterprise client: structured output validation, confidence scoring, rationale generation, and a human-in-the-loop review queue for audit defensibility. Replaced a multi-week manual categorization with near-real-time.

ClaudeBedrockstructured outputsHITLgovernance
2025

JobPilot

In flight

“A job-hunt copilot that actually reads the listings.”

Personal project, v0 global hackathon submission. Next.js + Neon Postgres + Apify scrapers on Vercel, with retrieval over my own work history and a small evaluator for fit. Currently being dogfooded on my own search.

Next.jsNeonApifyv0personal
2024

Claude Interview Platform

🏆 Hackathon Winner

“$10K and a year of validation.”

Winner of the AWS Global GenAI Hackathon. Claude on Bedrock with dynamic, context-aware question generation and RAG-grounded transcript evaluation against role competencies. Prompt orchestration, structured-output scoring, quality validation across both generated and evaluated content.

ClaudeBedrockRAGeval-LLM$10K prize
2024

PharmaGraph

Production

“A knowledge graph that explains itself to a pharmacist.”

Neo4j knowledge graph with LLM-based entity linking and a RAG retrieval interface for complex pharmacological queries, built for a large US healthcare deployment. Every response grounded to a source document with inline citation and auditor-visible provenance. Azure OpenAI + Azure AI Search.

Neo4jAzure OpenAIAI Searchgrounded RAGprovenance
2025

GTMind

Personal

“Multi-tenant B2B prospecting, wrapped in MCP.”

Multi-tenant B2B prospecting SaaS exposed as an MCP server. FastAPI + Anthropic MCP SDK. The clients pull leads through Claude — the agent does the boring part, the human does the call.

FastAPIMCPAnthropic SDKmulti-tenant

A decade, four chapters, one consistent through-line.

Started in analytics, ended up in production GenAI. The titles shifted; the work — turning messy real-world data into systems people can trust — didn’t.

2015 → 2020 Analytics & Data Science (multiple roles)
Gambit Sports · Rooster Properties · Jupiter Infrastructure

Started in sports analytics and operational analytics; built predictive models, ETL pipelines, and early ML systems across sports, real estate, and infrastructure verticals.

2021 → 2023 Lead Business Analyst
Practo Technologies · healthcare consumer tech

Marketing Mix Modeling, real-time campaign optimization, user segmentation. First serious production ML work — the kind that has SLAs.

2023 → 2025 Manager, Data Science & Analytics
Factspan Analytics

Transitioned into GenAI leadership. Built production RAG, knowledge graphs, and MLOps infrastructure. Won the AWS Global GenAI Hackathon during this period.

2025 → now Digital Engineering Staff Engineer, AI & GenAI
NTT DATA

Leading enterprise RAG delivery, LLM evaluation frameworks, and AI governance for Tier-1 clients. Architecting parallel cloud-native reference implementations across AWS Bedrock and Azure OpenAI.

7,000 practitioners, 40+ events, one chai habit.

I lead the Bangalore chapter of The AI Collective — the largest practitioner community in India for production AI engineers. Since March 2025, we’ve shipped 40+ events, hosted speakers from across the ecosystem, and built the kind of room where people argue about eval harnesses and then go get dinner.

Apr 2026 · Bangalore
Claude Code Meetup
200 attendees · speaker + organizer
Mar 2026 · Bangalore
Vercel v0 Buildathon
Buildathon co-lead
Feb 2026 · Microsoft
GenAI Demo Day 2.0
Hosted at Microsoft Bangalore
Jan 2026 · Bangalore
Genspark Event
80 attendees · talk + Q&A
2025 — ongoing
Andela — AI for DevOps
Cohort facilitator (paid)
Since Mar 2025
AI Collective — Bangalore
Chapter lead · 7,000+ members
In partnership with
Microsoft Anthropic Vercel Genspark Andela

Mostly Claude Agent Skills and notes on what broke.

I publish small, sharp agent skills — opinionated prompts and tools designed to make Claude better at one specific thing. A few are below; the rest are in drafts.

Published agent skills
v0-prompting humanizer / de-AIs your copy AI Ark OpenClaw 360Brew — profile optimizer LinkedIn content
Soon · essay “What I got wrong about RAG eval (twice).”
Soon · teardown “PageIndex vs chunked RAG: an honest benchmark.”
Soon · field notes “Running a 5K-person AI community on weekends.”

The short version of the long PDF.

M.S. — Machine Learning & AI Liverpool John Moores University, UK
2019 → 2021
PG Diploma — ML & AI IIIT Bangalore
2019 → 2020
B.Tech — Engineering Manipal Institute of Technology
2012 → 2015
🏆 AWS Global GenAI Hackathon Winner GenAI Academy · Green Belt Azure AI Fundamentals Azure Generative AI GCP Intro to GenAI deeplearning.ai · DL Specialization CSPO® · Scrum Alliance

Working on something interesting? Send a note.

I read everything. Best for: production RAG / eval consulting, DevRel conversations at AI-native companies, community collabs, or just a good thread about retrieval. Slow for: cold sales pitches.

Tweaks ×