Skip to content

Getting Started

Building from Source

PlinkingDuck is built as a DuckDB extension using the standard extension build system.

Prerequisites

  • C++17 compiler (GCC or Clang)
  • CMake 3.12+
  • Make
  • Git (with submodule support)

Build

git clone --recurse-submodules https://github.com/teaguesterling/plinking_duck.git
cd plinking_duck
make -j4

This produces:

Output Description
build/release/duckdb DuckDB shell with extension auto-loaded
build/release/test/unittest Test runner
build/release/extension/plinking_duck/plinking_duck.duckdb_extension Loadable extension binary

Run Tests

make test

Loading the Extension

When using the built DuckDB shell (build/release/duckdb), the extension is automatically loaded. For other DuckDB installations:

LOAD 'path/to/plinking_duck.duckdb_extension';

First Query

Given a PLINK 2 fileset (cohort.pgen, cohort.pvar, cohort.psam):

-- Check what variants are in the dataset
SELECT CHROM, POS, ID, REF, ALT
FROM read_pvar('cohort.pvar')
LIMIT 10;
-- Check the sample list
SELECT IID, SEX
FROM read_psam('cohort.psam')
LIMIT 10;
-- Read genotypes for a specific region
SELECT CHROM, POS, ID, genotypes
FROM read_pgen('cohort.pgen')
WHERE CHROM = '1';
-- Genotype orient mode: one row per variant-sample pair
SELECT chrom, pos, id, iid, genotype
FROM read_pfile('cohort', orient := 'genotype')
WHERE genotype IS NOT NULL
LIMIT 20;

PlinkingDuck supports both PLINK 2 and legacy PLINK 1 file formats. See the File Formats reference for details on what's supported.