plink_score¶

Compute polygenic risk scores (PRS) from genotype data and variant weights.

Synopsis¶

plink_score(path VARCHAR, weights := ...,
            [, pvar := ..., psam := ..., samples := ..., region := ...,
            center := ..., no_mean_imputation := ...]) -> TABLE

Parameters¶

Name	Type	Default	Description
`path`	`VARCHAR`	(required)	Path to the `.pgen` file
`weights`	`LIST(DOUBLE)` or `LIST(STRUCT)`	(required)	Variant scoring weights
`pvar`	`VARCHAR`	Auto-discovered	Path to `.pvar` or `.bim` file
`psam`	`VARCHAR`	Auto-discovered	Path to `.psam` or `.fam` file (required)
`samples`	`LIST(VARCHAR)` or `LIST(INTEGER)`	All	Subset to specific samples
`region`	`VARCHAR`	All	Filter to genomic region (`chr:start-end`)
`center`	`BOOLEAN`	`false`	Variance-standardized scoring
`no_mean_imputation`	`BOOLEAN`	`false`	Skip mean imputation for missing genotypes

See Common Parameters for details on pvar, psam, samples, and region.

Weights Parameter¶

The weights parameter accepts two formats:

Positional Mode: `LIST(DOUBLE)`¶

A list of numeric weights, one per variant in order. The list length must match the variant count (or the region-filtered variant count). Zero-weight variants are skipped.

plink_score('data.pgen', weights := [0.5, -0.3, 0.0, 1.2])

ID-keyed Mode: `LIST(STRUCT(id VARCHAR, allele VARCHAR, weight DOUBLE))`¶

A list of structs specifying variant ID, scored allele, and weight. Variants not found in the .pvar file are silently skipped. The allele field determines orientation:

If allele matches ALT: dosage is used as-is
If allele matches REF: dosage is flipped (2 - alt_dosage)
If allele matches neither: the variant is skipped

plink_score('data.pgen', weights := [
    {'id': 'rs1', 'allele': 'A', 'weight': 0.5},
    {'id': 'rs2', 'allele': 'T', 'weight': -0.3}
])

Output Columns¶

Column	Type	Description
`FID`	`VARCHAR`	Family ID (NULL if no FID column in `.psam`)
`IID`	`VARCHAR`	Individual ID
`ALLELE_CT`	`INTEGER`	Total alleles observed (2 per non-missing scored variant)
`DENOM`	`INTEGER`	Denominator for score averaging (same as ALLELE_CT)
`NAMED_ALLELE_DOSAGE_SUM`	`DOUBLE`	Sum of scored allele dosages across variants
`SCORE_SUM`	`DOUBLE`	Sum of weight * dosage across all scored variants
`SCORE_AVG`	`DOUBLE`	`SCORE_SUM / ALLELE_CT` (0 if ALLELE_CT is 0)

Description¶

plink_score computes a polygenic risk score for each sample by reading genotype dosages (via pgenlib's PgrGetD) and applying variant-specific weights.

Scoring Formula¶

For each sample, the score is:

SCORE_SUM = sum(weight_i * dosage_i) for all scored variants i
SCORE_AVG = SCORE_SUM / ALLELE_CT

Where dosage_i is the allele dosage (0, 1, or 2 for hardcalled genotypes).

Missing Data Handling¶

Three modes control how missing genotypes are handled:

Mode	Behavior
Default (mean imputation)	Missing dosages are replaced with the variant's mean dosage
`no_mean_imputation := true`	Missing samples are skipped (their ALLELE_CT reflects only non-missing variants)
`center := true`	Variance-standardized scoring: `(dosage - mean) / sd`. Missing samples are skipped. Monomorphic variants are excluded.

center and no_mean_imputation cannot both be true.

Requirements¶

The .psam file is required (sample IDs are needed for output)
The weights parameter is required

Examples¶

-- Positional weights (one per variant)
SELECT IID, SCORE_SUM, SCORE_AVG
FROM plink_score('data/example.pgen',
    weights := [0.5, -0.3, 1.2, 0.0]);

-- ID-keyed weights with allele specification
SELECT IID, SCORE_SUM
FROM plink_score('data/example.pgen', weights := [
    {'id': 'rs1', 'allele': 'A', 'weight': 0.5},
    {'id': 'rs2', 'allele': 'T', 'weight': -0.3}
]);

-- Variance-standardized scoring
SELECT IID, SCORE_AVG
FROM plink_score('data/example.pgen',
    weights := [1.0, 1.0, 1.0, 1.0],
    center := true);

-- Without mean imputation
SELECT IID, SCORE_SUM, ALLELE_CT
FROM plink_score('data/example.pgen',
    weights := [0.5, -0.3, 1.2, 0.8],
    no_mean_imputation := true);

-- Score a subset of samples
SELECT IID, SCORE_AVG
FROM plink_score('data/example.pgen',
    weights := [0.5, 0.3, 0.2, 0.1],
    samples := ['SAMPLE1', 'SAMPLE2']);