Life Science HPC on AWS MCP Server¶
A RAG-powered search system that provides semantic search across documentation for products, platforms, and services commonly used in AWS-centric life science HPC environments. It is designed to assist with:
- Design and deployment of HPC clusters and computational chemistry pipelines
- Troubleshooting application-specific and Slurm job failures
- Configuration of complex HPC software stacks
The server exposes 32 MCP tools covering CryoSPARC (cryo-EM), GROMACS (molecular dynamics), PLUMED (enhanced sampling), RELION (cryo-EM), Schrodinger Suite (computational chemistry) plus the Schrödinger support Knowledge Base, Slurm (workload management), Posit Workbench (multi-user IDE / HPC integration), and BioTeam consulting notes, making this documentation accessible to AI assistants through the Model Context Protocol.
Site access
Most pages on this site are public, including this landing page, About, Costs, the Documentation Sources catalog, and the MCP Tools reference. The /health endpoint is also public.
The following areas require SSO authentication with a named BioTeam user account and will redirect you to Okta to sign in:
- Admin — architecture, deployment, and ingestion internals
- Integration — MCP client configuration guides
- Stats — live in-memory usage dashboard
/observability/— hourly aggregate access/traffic dashboard (contains security-relevant logs)
MCP server requires authentication
The MCP server endpoint (/mcp) requires SSO authentication independently of the docs site. When connecting from Claude Code or another MCP client, you will be prompted to authenticate via Okta with your BioTeam credentials. Only named BioTeam users have access.
Internal sandbox project
This is an internal BioTeam sandbox/test project exploring RAG-powered MCP servers on AWS. Infrastructure costs are tracked publicly as part of the experiment to understand the operational cost profile of this architecture.
Documentation Sources¶
| Source | Chunks | Description |
|---|---|---|
| CryoSPARC | 1,713 | Cryo-EM data processing — job types, tutorials, setup guides |
| GROMACS | 2,108 | Molecular dynamics — installation, mdp options, reference manual (2026.1 + 2025.4) |
| PLUMED | 3,790 | Enhanced sampling — actions, CVs, biases, tutorials (v2.10 + v2.9) |
| Schrodinger Suite | 41,842 | Computational chemistry — Glide, FEP+, Maestro, Desmond, and 70+ products across 2025-4, 2026-1, and 2026-2, plus public release notes for all three |
| Schrödinger KB | 747 | Support-team Knowledge Base articles — error decoding, install/license troubleshooting, Job Server diagnostics |
| Slurm | 2,141 | HPC workload manager — commands, configuration, FAQ from 7 sources |
| RELION | 264 | Cryo-EM structure determination — SPA/STA tutorials, installation, AWS ParallelCluster build guide |
| Posit Workbench | 788 | Workbench Server Pro admin guide — Slurm/K8s Job Launcher, authentication, load balancing |
| BioTeam Consulting Notes | 142 | Postmortems, build guides, troubleshooting from BioTeam HPC consulting work |
Available Tools (32)¶
CryoSPARC (4 tools)¶
cryosparc_search_docs— Semantic search across CryoSPARC documentationcryosparc_search_job_type— Search scoped to job type documentationcryosparc_list_sections— List documentation sections with chunk countscryosparc_get_doc_stats— Index statistics
Schrodinger Suite (5 tools)¶
schrodinger_search_docs— Semantic search with product/version filtering (covers reference docs + release notes)schrodinger_search_linux_packages— Exact Linux package dependency lookupschrodinger_list_products— List indexed products and sectionsschrodinger_list_versions— List indexed documentation versionsschrodinger_get_doc_stats— Index statistics with version breakdown
Schrödinger KB (3 tools)¶
schrodinger_kb_search— Semantic search across the support Knowledge Base with optional theme filterschrodinger_kb_list_articles— Enumerate indexed articles, optionally filtered by themeschrodinger_kb_get_doc_stats— Index statistics with theme and triage-score breakdown
Slurm (6 tools)¶
slurm_search_docs— General semantic search with filtersslurm_search_command— Command-specific search (sbatch, slurm.conf, etc.)slurm_search_faq— FAQ and troubleshooting searchslurm_list_topics— Browse documentation by type/category/sourceslurm_list_sources— Show all sources with version, date, chunk countsslurm_get_doc_stats— Index statistics
RELION (4 tools)¶
relion_search_docs— Semantic search across RELION 5.x docs, community guides, and ParallelCluster build guiderelion_search_job_type— Search scoped to SPA/STA job type documentationrelion_list_sections— List documentation sections with chunk countsrelion_get_doc_stats— Index statistics with authority and job type breakdown
GROMACS (3 tools)¶
gromacs_search_docs— Semantic search across GROMACS manual (2026.1 + 2025.4) with version/section filtersgromacs_list_sections— List documentation sections with chunk countsgromacs_get_doc_stats— Index statistics with version and section breakdown
PLUMED (3 tools)¶
plumed_search_docs— Semantic search across PLUMED docs (v2.10 + v2.9) with version/section filtersplumed_list_sections— List sections (modules) with chunk countsplumed_get_doc_stats— Index statistics with version and section breakdown
Posit Workbench (1 tool)¶
posit_search_docs— Semantic search across Posit Workbench Server Pro admin documentation with optional section filter
BioTeam Consulting Notes (3 tools)¶
bioteam_search_consulting_notes— Search postmortems, build guides, and troubleshooting notes with optional software filterbioteam_list_notes— List all consulting notes grouped by software areabioteam_get_doc_stats— Index statistics by software and note type
Connect¶
All clients authenticate via OAuth 2.1 — you'll be prompted to sign in with Okta on first connection.
claude mcp add --transport http hpc-docs https://hpc-mcp.apps.bioteam.cloud/mcp
Add to .vscode/mcp.json:
{
"servers": {
"hpc-docs": {
"type": "http",
"url": "https://hpc-mcp.apps.bioteam.cloud/mcp"
}
}
}
Add to ~/.codex/config.toml:
[mcp_servers.hpc-docs]
type = "http"
url = "https://hpc-mcp.apps.bioteam.cloud/mcp"
Add to Claude Desktop settings (claude_desktop_config.json):
{
"mcpServers": {
"hpc-docs": {
"type": "http",
"url": "https://hpc-mcp.apps.bioteam.cloud/mcp"
}
}
}
See the Integration Guide for detailed setup instructions, usage examples, and troubleshooting.
Complementary MCP Sources¶
AWS Knowledge MCP¶
For AWS infrastructure topics — ParallelCluster, Parallel Computing Service (PCS), EFA, FSx for Lustre, EC2 instance types, networking, and IAM — configure the AWS Knowledge MCP alongside this server. The two complement each other:
| Server | Covers |
|---|---|
| hpc-docs (this server) | Application-layer documentation: CryoSPARC, GROMACS, PLUMED, RELION, Schrodinger Suite, Slurm, Posit Workbench |
| AWS Knowledge MCP | Infrastructure-layer documentation: ParallelCluster, PCS, EFA, FSx, EC2, CloudFormation |
The AWS Knowledge MCP has strong coverage of both AWS HPC orchestration services:
- ParallelCluster — Full v3 configuration reference, Slurm customization (CustomSlurmSettings, memory-based scheduling), extensive troubleshooting (cluster creation failures, node initialization, scaling issues with specific failure codes and log paths), and current awareness (latest releases). Covers both v2 and v3 docs, CloudFormation integration, and AWS Batch mode.
- Parallel Computing Service (PCS) — User guides, API reference, compute node group management, Slurm CLI filter plugins, custom Slurm settings (60+ parameters), managed accounting, troubleshooting (bootstrap failures, node registration, instance termination), and HPC blog posts with real-world deployment patterns.
Configure it in Claude Code:
claude mcp add --transport http aws-knowledge https://knowledge-mcp.global.api.aws
Use both servers together
When working on AWS-based HPC deployments, configure both MCP servers. Ask application-specific questions (CryoSPARC job configuration, Schrodinger licensing, Slurm partition tuning) against hpc-docs, and infrastructure questions (ParallelCluster cluster config, EFA placement groups, FSx for Lustre performance) against AWS Knowledge MCP.
Architecture¶
Embedding Engine¶
- Model: BAAI/bge-base-en-v1.5 (768 dimensions)
- Backend: fastembed with ONNX Runtime
- Query latency: ~5ms per embedding
- Why ONNX: Native ARM64 NEON/SVE acceleration on Graviton. ~200 MB vs ~2 GB for PyTorch. Ingest is slower but query time (what matters) is comparable.
Vector Store¶
- Engine: ChromaDB with HNSW index
- Collections: One per source (
cryosparc_docs,gromacs_docs,plumed_docs,schrodinger_docs,slurm_docs,relion_docs,bioteam_consulting_notes) - Storage: SQLite-backed, baked into the Docker image at build time
- Portability: The
chroma_db/directory is fully cross-platform (x86_64, aarch64). Ingest locally, deploy the same bytes to Graviton.
MCP Server¶
- Framework: FastMCP (from the
mcpPython package) - Transport: Streamable HTTP with
stateless_http=True - Auth: MCP OAuth 2.1 Authorization Server with Okta delegation
- Landing page: MkDocs Material static site served at
/, built in a separate Docker stage
Authentication¶
The system has two independent auth layers, both delegating to Okta:
| Layer | Protects | Mechanism | Token Lifetime |
|---|---|---|---|
| MCP OAuth 2.1 AS | /mcp endpoint |
MCP server acts as its own OAuth AS, delegates user auth to Okta. Issues opaque access/refresh tokens in-memory. | Access: 1h, Refresh: 24h |
| Docs OIDC | Documentation pages | CF Function validates HMAC-SHA256 session cookie. Lambda@Edge handles Okta OIDC callback. | Session cookie: 8h |
The MCP server implements Dynamic Client Registration (RFC 7591) so clients like Claude Code can self-register on first connection. Okta's Org Authorization Server doesn't support open DCR, which is why the MCP server acts as its own OAuth AS rather than pointing clients directly at Okta. See the architecture docs for detailed sequence diagrams of both auth flows.
Infrastructure¶
- Compute: AWS ECS Fargate, ARM64 (Graviton), 1 vCPU / 2 GB
- Edge: CloudFront with path-based routing, TLS termination, ACM cert
- Storage: S3 private bucket with OAC for documentation site
- NAT: fck-nat instance (cost-optimized, ~$3/month vs ~$32/month for NAT Gateway)
- DNS: Route 53 at
hpc-mcp.apps.bioteam.cloud
See the full architecture docs for detailed diagrams, design decisions, and infrastructure configuration.
Contact¶
For questions about this project or BioTeam's HPC consulting services, visit bioteam.net.