Natural language access to 1000 Genomes Project dataset, hosted online in Dnaerys variant store Sequenced & aligned by New York Genome Center (GRCh38). 3202 samples: 2504 unrelated samples from phase three panel + 698 samples from 602 family trios - dataset details - real-time access to 138 044 723 unique variants and ~442 billion individual genotypes - variant, sample and genotype selection based
Add this skill
npx mdskills install dnaerys/onekgpd-mcpProvides rich genomic data access with sophisticated filtering and scientific examples
1# 1000 Genomes Project Dataset MCP Server23Natural language access to _**1000 Genomes Project dataset**_, hosted online in _[Dnaerys variant store](https://dnaerys.org/)_45Sequenced & aligned by _New York Genome Center_ (_GRCh38_). _3202 samples_: 2504 unrelated samples from phase6three panel + 698 samples from 602 family trios - [dataset details](https://www.internationalgenome.org/data-portal/data-collection/30x-grch38)78### Key Features910- _real-time_ access to _138 044 723_ unique variants and _~442 billion_ individual genotypes1112- variant, sample and genotype selection based on coordinates, annotations, zygosity1314- filtering by VEP (impact, biotype, feature type, variant class, consequences), ClinVar Clinical Significance (202502),15gnomADe + gnomADg 4.1, AlphaMissense Score & AlphaMissense Class annotations1617 - [full annotation composition](./docs/annotations.md)1819- returned variants annotated with _HGVSp_, _gnomADe + gnomADg_, _AlphaMissense score_ + cohort-wide statistics2021## Online Service2223Remote MCP service via _Streamable HTTP:_2425- http://db.dnaerys.org/mcp26- https://db.dnaerys.org/mcp2728## Examples2930#### Macromolecular structural complexes3132> _The MCM2-7 Complex (The "DNA Helicase Motor") is a molecular masterpiece. It’s a heterohexameric ring where each subunit is33a distinct "gear" in the DNA-unzipping motor. Unlike homomeric rings (where every subunit is the same), this complex is asymmetric.34Each interface between subunits is unique, and they don't all burn ATP at the same rate. The MCM2/5 interface is the "gate" that35must physically open to allow DNA to enter the ring and then snap shut. This is a high-stress mechanical point._36>37> _Identify individuals in the KGP cohort carrying missense variants at the MCM2/5 interface. Specifically, look for38'charge-reversal' variants (e.g., Aspartate to Lysine). In these specific samples, analyze the 'compensatory coupling':39do they carry a secondary, reciprocal charge-reversal variant on the opposing subunit interface that restores the40electrostatic 'latch' ?_41>42> _Identify individuals in the KGP cohort who carry high-pathogenicity variants in the Walker A or Walker B motifs43(the ATP-burning heart) of any MCM subunit in MCM2-7 Complex. For these individuals, perform a 'Systemic Flux' analysis:44look at their variants in the leading-strand polymerase (POLE) and the sliding clamp (PCNA). Do you detect a signature of45'Coordinated Deceleration' where the motor, the clamp, and the polymerase all carry variants that suggest a slower but46highly-accurate replication fork ?_4748#### Macromolecular structural complexes4950> _The human RNA Exosome (Exo-9 core) is a "dead machine" that acts as a scaffold. In lower organisms the ring itself can degrade RNA.51In humans, the 9-subunit ring has lost all its catalytic teeth and is purely a structural tunnel that guides RNA into the catalytic52subunits (DIS3 or EXOSC10) attached at the bottom. Since RNA is a highly negatively charged polymer, the residues lining this pore53are typically positively charged (Lysine, Arginine), but not too "sticky" or RNA will jam. So, to reach the "shredder" at the bottom54it must slide through a narrow pore formed by the Exo-9 ring._55>56> _The task: analyse all missense variants in the KGP cohort that map to the internal pore-lining residues of the Exo-9 ring.57Look for 'charge-swap' variants where a positive residue (K, R) is replaced by a negative one (D, E). If an individual is healthy58despite having a 'negative patch' in the tunnel that should repel RNA, do they carry a compensatory variant in the cap subunits59(EXOSC1, 2, 3) that widens the entrance? Use a 3D electrostatic surface map to determine if the 'healthy' cohort maintains a specific60electrostatic gradient._6162#### Synergistic Epistasis in Redox Homeostasis6364> _Cellular redox homeostasis is maintained by two parallel antioxidant systems: the glutathione system65and the thioredoxin system. Complete loss of either GSR or TXNRD1 is incompatible with mammalian development, yet population66databases contain individuals carrying variants predicted to impair enzyme function._67>6869> _Identify clusters of individuals in the KGP cohort who carry multiple 'Moderate' impact VEP variants across both systems.70Reasoning through the AlphaMissense structural implications, can you detect a 'balancing act' where a loss of efficiency71in Glutathione reductase is consistently paired with high-confidence benign or potentially activating variants in the72Thioredoxin system ? Synthesize a model of 'Redox Robustness' based on the co-occurrence of these variants across the cohort._7374#### Macromolecular structural complexes7576> _Treat the 26S Proteasome as a mechanically redundant 3D machine and map every missense variant from the KGP individuals77across all 33 subunits. Perform a spatial analysis to determine if pathogenic variation is statistically partitioned toward78the distal 'Lid' (Zone C) rather than the more evolutionary constrained 'Core' (Zone A) or 'Gating' (Zone B) interfaces.79Identify individuals with a high cumulative burden (2+ 'Likely Pathogenic' variants) to investigate inter-subunit compensation,80searching for paired 'weakening' and 'stabilizing' mutations at protein-protein hinges. Finally, define the 'mechanical81tolerance' of the proteasome by establishing the maximum cumulative structural disruption observed in a single healthy82individual based on AlphaMissense scores and calculated ΔΔG values._8384_[More examples](./examples/README.md)_8586---8788## Architecture8990Implemented as a Java EE service, accessing _KGP dataset_ via gRPC calls to public Dnaerys variant store service.9192- provides MCP over _Streamable HTTP_, _HTTP/SSE_ and _STDIO_ transports93- service implementation is based on [Quarkus MCP Server framework](https://docs.quarkiverse.io/quarkus-mcp-server/dev/)94- tools: _computeAlphaMissenseAvg, computeVariantBurden, countSamples, countSamplesHomozygousReference, countVariants,95 countVariantsInSamples, getDatasetInfo, getKinshipDegree, selectSamples, selectSamplesHomozygousReference,96 selectVariants, selectVariantsInSamples_97 - [implementation](./src/main/java/org/dnaerys/mcp/OneKGPdMCPServer.java)9899## Installation100101Project can be run locally with MCP over _stdio_ and/or _http_ transports102103#### Option A - build & run locally104105- build the project and package it as a single _über-jar_:106 - jar is located in `target/onekgpd-mcp-runner.jar` and includes all dependencies107108```shell script109./mvnw clean package -DskipTests -Dquarkus.package.jar.type=uber-jar110```111112with skipping test compilation113114```shell script115./mvnw clean package -Dmaven.test.skip=true -Dquarkus.package.jar.type=uber-jar116```117118- run it locally with _dev profile_119 - both _stdio_ and _http_ transports are enabled120 - http transport is on quarkus [http.port](./src/main/resources/application.properties)121 - project expects _JRE 21_ to be available at runtime122123```shell script124java -Dquarkus.profile=dev -jar <full path>/onekgpd-mcp-runner.jar125```126127#### Option B - build & run in docker128129- in order to run in docker, _stdio_ transport needs to be disabled to prevent application from stopping itself130due to closed stdio in containers131 - it's already configured in _prod profile_132 - it's the default configuration overall133134- build with _prod profile_135136```shell script137docker build -f Dockerfile -t onekgpd-mcp .138```139140- run as you prefer, e.g.141142```shell script143docker run -p 9000:9000 --name onekgpd-mcp --rm onekgpd-mcp144```145146#### Option C - pull from Docker Hub147148- pull prebuilt image; _stdio_ transport disabled, _http_ transport on port 9000149150```shell script151docker pull dnaerys/onekgpd-mcp:latest152```153154- run155156```shell script157docker run -p 9000:9000 --name onekgpd-mcp --rm onekgpd-mcp158```159160---161162#### Connecting with MCP clients163164- to connect via _http_ transport, _remote or local_, simply direct the client to a destination,165_e.g._ `http://localhost:9000/mcp` or `https://db.dnaerys.org:443/mcp`166 - _NB:_ _Claude Desktop_ won't work with `http://localhost:9000/mcp` option (e.g. when running MCP server in a docker container).167 This option is for clients like _Goose_.168169- to connect via _stdio_ transport, MCP client should start application with _dev profile_ and with a full path to the jar file170 - e.g. for _Claude Desktop_ add to config files (e.g. `claude_desktop_config.json`):171172```json173{174 "mcpServers": {175 "OneKGPd": {176 "command": "java",177 "args": ["-Dquarkus.profile=dev", "-jar", "/full/path/onekgpd-mcp-runner.jar"]178 }179 }180}181```182183#### Verification184185> How many variants exist in 1000 Genome Project ?186187### Test Coverage Status188189| Component | Type | Tests | Status |190|-----------|------|-------|--------|191| Entity Mappers (9 classes) | Unit | 314 | ✅ Complete |192| DnaerysClient | Unit | 58 (7 disabled) | ✅ Complete |193| DnaerysClient | Integration | 5 (1 disabled) | ✅ Complete |194| OneKGPdMCPServer | Unit | 26 | ✅ Complete |195| OneKGPdMCPServer | Integration | 5 | ✅ Complete |196| Other | Unit | 1 | ✅ Complete |197| Other | Integration | 1 | ✅ Complete |198| **Total** | | **410 tests** | **402 passing, 8 disabled** |199200**Test Breakdown:**201- Unit tests: 399 (7 disabled, 392 passing)202- Integration tests: 11 (1 disabled, 10 passing)203204**Disabled Tests:**205- 7 DnaerysClient unit tests (PaginationTests, streaming gRPC limitation - `wiremock-grpc-extension:0.11.0` cannot mock streaming RPCs yet)206- 1 DnaerysClient integration test (PaginationLogicTests, streaming gRPC limitation - `wiremock-grpc-extension:0.11.0` cannot mock streaming RPCs yet)207208### Running Tests209210```bash211# Unit tests only (no server required)212./mvnw test213214# Integration tests (requires db.dnaerys.org access)215./mvnw verify -DskipITs=false216217# Update test baselines after data changes218./mvnw verify -DskipITs=false -DupdateBaseline=true219```220221See [TEST_SPECIFICATION.md](./docs/TEST_SPECIFICATION.md) for detailed test documentation.222223---224225_Test part of this project is written by Claude. Fun part is written by humans._226227---228229## Privacy Policy230231OneKGPd MCP Server operates as a read-only interface layer for 1000 Genomes Project dataset.232Server does not collect, store, or transmit any user data. No conversation data is recorded.233No personal information is collected. No cookies, tracking mechanisms or authentication are used.234235## Support236237For issues, questions, or feedback: https://github.com/dnaerys/onekgpd-mcp/issues238239## License240241This project is licensed under the Apache License 2.0 - see the [LICENSE](./LICENSE) file for details.242
Full transparency — inspect the skill content before installing.