Skip to content

Life Science HPC on AWS MCP Server

A RAG-powered search system that provides semantic search across documentation for products, platforms, and services commonly used in AWS-centric life science HPC environments. It is designed to assist with:

  • Design and deployment of HPC clusters and computational chemistry pipelines
  • Troubleshooting application-specific and Slurm job failures
  • Configuration of complex HPC software stacks

The server exposes 32 MCP tools covering CryoSPARC (cryo-EM), GROMACS (molecular dynamics), PLUMED (enhanced sampling), RELION (cryo-EM), Schrodinger Suite (computational chemistry) plus the Schrödinger support Knowledge Base, Slurm (workload management), Posit Workbench (multi-user IDE / HPC integration), and BioTeam consulting notes, making this documentation accessible to AI assistants through the Model Context Protocol.

Site access

Most pages on this site are public, including this landing page, About, Costs, the Documentation Sources catalog, and the MCP Tools reference. The /health endpoint is also public.

The following areas require SSO authentication with a named BioTeam user account and will redirect you to Okta to sign in:

  • Admin — architecture, deployment, and ingestion internals
  • Integration — MCP client configuration guides
  • Stats — live in-memory usage dashboard
  • /observability/ — hourly aggregate access/traffic dashboard (contains security-relevant logs)

MCP server requires authentication

The MCP server endpoint (/mcp) requires SSO authentication independently of the docs site. When connecting from Claude Code or another MCP client, you will be prompted to authenticate via Okta with your BioTeam credentials. Only named BioTeam users have access.

Internal sandbox project

This is an internal BioTeam sandbox/test project exploring RAG-powered MCP servers on AWS. Infrastructure costs are tracked publicly as part of the experiment to understand the operational cost profile of this architecture.

Documentation Sources

Source Chunks Description
CryoSPARC 1,713 Cryo-EM data processing — job types, tutorials, setup guides
GROMACS 2,108 Molecular dynamics — installation, mdp options, reference manual (2026.1 + 2025.4)
PLUMED 3,790 Enhanced sampling — actions, CVs, biases, tutorials (v2.10 + v2.9)
Schrodinger Suite 41,842 Computational chemistry — Glide, FEP+, Maestro, Desmond, and 70+ products across 2025-4, 2026-1, and 2026-2, plus public release notes for all three
Schrödinger KB 747 Support-team Knowledge Base articles — error decoding, install/license troubleshooting, Job Server diagnostics
Slurm 2,141 HPC workload manager — commands, configuration, FAQ from 7 sources
RELION 264 Cryo-EM structure determination — SPA/STA tutorials, installation, AWS ParallelCluster build guide
Posit Workbench 788 Workbench Server Pro admin guide — Slurm/K8s Job Launcher, authentication, load balancing
BioTeam Consulting Notes 142 Postmortems, build guides, troubleshooting from BioTeam HPC consulting work

Available Tools (32)

CryoSPARC (4 tools)

  • cryosparc_search_docs — Semantic search across CryoSPARC documentation
  • cryosparc_search_job_type — Search scoped to job type documentation
  • cryosparc_list_sections — List documentation sections with chunk counts
  • cryosparc_get_doc_stats — Index statistics

Schrodinger Suite (5 tools)

  • schrodinger_search_docs — Semantic search with product/version filtering (covers reference docs + release notes)
  • schrodinger_search_linux_packages — Exact Linux package dependency lookup
  • schrodinger_list_products — List indexed products and sections
  • schrodinger_list_versions — List indexed documentation versions
  • schrodinger_get_doc_stats — Index statistics with version breakdown

Schrödinger KB (3 tools)

  • schrodinger_kb_search — Semantic search across the support Knowledge Base with optional theme filter
  • schrodinger_kb_list_articles — Enumerate indexed articles, optionally filtered by theme
  • schrodinger_kb_get_doc_stats — Index statistics with theme and triage-score breakdown

Slurm (6 tools)

  • slurm_search_docs — General semantic search with filters
  • slurm_search_command — Command-specific search (sbatch, slurm.conf, etc.)
  • slurm_search_faq — FAQ and troubleshooting search
  • slurm_list_topics — Browse documentation by type/category/source
  • slurm_list_sources — Show all sources with version, date, chunk counts
  • slurm_get_doc_stats — Index statistics

RELION (4 tools)

  • relion_search_docs — Semantic search across RELION 5.x docs, community guides, and ParallelCluster build guide
  • relion_search_job_type — Search scoped to SPA/STA job type documentation
  • relion_list_sections — List documentation sections with chunk counts
  • relion_get_doc_stats — Index statistics with authority and job type breakdown

GROMACS (3 tools)

  • gromacs_search_docs — Semantic search across GROMACS manual (2026.1 + 2025.4) with version/section filters
  • gromacs_list_sections — List documentation sections with chunk counts
  • gromacs_get_doc_stats — Index statistics with version and section breakdown

PLUMED (3 tools)

  • plumed_search_docs — Semantic search across PLUMED docs (v2.10 + v2.9) with version/section filters
  • plumed_list_sections — List sections (modules) with chunk counts
  • plumed_get_doc_stats — Index statistics with version and section breakdown

Posit Workbench (1 tool)

  • posit_search_docs — Semantic search across Posit Workbench Server Pro admin documentation with optional section filter

BioTeam Consulting Notes (3 tools)

  • bioteam_search_consulting_notes — Search postmortems, build guides, and troubleshooting notes with optional software filter
  • bioteam_list_notes — List all consulting notes grouped by software area
  • bioteam_get_doc_stats — Index statistics by software and note type

Connect

All clients authenticate via OAuth 2.1 — you'll be prompted to sign in with Okta on first connection.

claude mcp add --transport http hpc-docs https://hpc-mcp.apps.bioteam.cloud/mcp

Add to .vscode/mcp.json:

{
  "servers": {
    "hpc-docs": {
      "type": "http",
      "url": "https://hpc-mcp.apps.bioteam.cloud/mcp"
    }
  }
}

Add to ~/.codex/config.toml:

[mcp_servers.hpc-docs]
type = "http"
url = "https://hpc-mcp.apps.bioteam.cloud/mcp"

Add to Claude Desktop settings (claude_desktop_config.json):

{
  "mcpServers": {
    "hpc-docs": {
      "type": "http",
      "url": "https://hpc-mcp.apps.bioteam.cloud/mcp"
    }
  }
}

See the Integration Guide for detailed setup instructions, usage examples, and troubleshooting.

Complementary MCP Sources

AWS Knowledge MCP

For AWS infrastructure topics — ParallelCluster, Parallel Computing Service (PCS), EFA, FSx for Lustre, EC2 instance types, networking, and IAM — configure the AWS Knowledge MCP alongside this server. The two complement each other:

Server Covers
hpc-docs (this server) Application-layer documentation: CryoSPARC, GROMACS, PLUMED, RELION, Schrodinger Suite, Slurm, Posit Workbench
AWS Knowledge MCP Infrastructure-layer documentation: ParallelCluster, PCS, EFA, FSx, EC2, CloudFormation

The AWS Knowledge MCP has strong coverage of both AWS HPC orchestration services:

  • ParallelCluster — Full v3 configuration reference, Slurm customization (CustomSlurmSettings, memory-based scheduling), extensive troubleshooting (cluster creation failures, node initialization, scaling issues with specific failure codes and log paths), and current awareness (latest releases). Covers both v2 and v3 docs, CloudFormation integration, and AWS Batch mode.
  • Parallel Computing Service (PCS) — User guides, API reference, compute node group management, Slurm CLI filter plugins, custom Slurm settings (60+ parameters), managed accounting, troubleshooting (bootstrap failures, node registration, instance termination), and HPC blog posts with real-world deployment patterns.

Configure it in Claude Code:

claude mcp add --transport http aws-knowledge https://knowledge-mcp.global.api.aws

Use both servers together

When working on AWS-based HPC deployments, configure both MCP servers. Ask application-specific questions (CryoSPARC job configuration, Schrodinger licensing, Slurm partition tuning) against hpc-docs, and infrastructure questions (ParallelCluster cluster config, EFA placement groups, FSx for Lustre performance) against AWS Knowledge MCP.

Architecture

Embedding Engine

  • Model: BAAI/bge-base-en-v1.5 (768 dimensions)
  • Backend: fastembed with ONNX Runtime
  • Query latency: ~5ms per embedding
  • Why ONNX: Native ARM64 NEON/SVE acceleration on Graviton. ~200 MB vs ~2 GB for PyTorch. Ingest is slower but query time (what matters) is comparable.

Vector Store

  • Engine: ChromaDB with HNSW index
  • Collections: One per source (cryosparc_docs, gromacs_docs, plumed_docs, schrodinger_docs, slurm_docs, relion_docs, bioteam_consulting_notes)
  • Storage: SQLite-backed, baked into the Docker image at build time
  • Portability: The chroma_db/ directory is fully cross-platform (x86_64, aarch64). Ingest locally, deploy the same bytes to Graviton.

MCP Server

  • Framework: FastMCP (from the mcp Python package)
  • Transport: Streamable HTTP with stateless_http=True
  • Auth: MCP OAuth 2.1 Authorization Server with Okta delegation
  • Landing page: MkDocs Material static site served at /, built in a separate Docker stage

Authentication

The system has two independent auth layers, both delegating to Okta:

Layer Protects Mechanism Token Lifetime
MCP OAuth 2.1 AS /mcp endpoint MCP server acts as its own OAuth AS, delegates user auth to Okta. Issues opaque access/refresh tokens in-memory. Access: 1h, Refresh: 24h
Docs OIDC Documentation pages CF Function validates HMAC-SHA256 session cookie. Lambda@Edge handles Okta OIDC callback. Session cookie: 8h

The MCP server implements Dynamic Client Registration (RFC 7591) so clients like Claude Code can self-register on first connection. Okta's Org Authorization Server doesn't support open DCR, which is why the MCP server acts as its own OAuth AS rather than pointing clients directly at Okta. See the architecture docs for detailed sequence diagrams of both auth flows.

Infrastructure

  • Compute: AWS ECS Fargate, ARM64 (Graviton), 1 vCPU / 2 GB
  • Edge: CloudFront with path-based routing, TLS termination, ACM cert
  • Storage: S3 private bucket with OAC for documentation site
  • NAT: fck-nat instance (cost-optimized, ~$3/month vs ~$32/month for NAT Gateway)
  • DNS: Route 53 at hpc-mcp.apps.bioteam.cloud

See the full architecture docs for detailed diagrams, design decisions, and infrastructure configuration.

Contact

For questions about this project or BioTeam's HPC consulting services, visit bioteam.net.