plink_score¶
Compute polygenic risk scores (PRS) from genotype data and variant weights.
Synopsis¶
plink_score(path VARCHAR, weights := ...,
[, pvar := ..., psam := ..., samples := ..., region := ...,
center := ..., no_mean_imputation := ...]) -> TABLE
Parameters¶
| Name | Type | Default | Description |
|---|---|---|---|
path |
VARCHAR |
(required) | Path to the .pgen file |
weights |
LIST(DOUBLE) or LIST(STRUCT) |
(required) | Variant scoring weights |
pvar |
VARCHAR |
Auto-discovered | Path to .pvar or .bim file |
psam |
VARCHAR |
Auto-discovered | Path to .psam or .fam file (required) |
samples |
LIST(VARCHAR) or LIST(INTEGER) |
All | Subset to specific samples |
region |
VARCHAR |
All | Filter to genomic region (chr:start-end) |
center |
BOOLEAN |
false |
Variance-standardized scoring |
no_mean_imputation |
BOOLEAN |
false |
Skip mean imputation for missing genotypes |
See Common Parameters for details on pvar, psam, samples, and region.
Weights Parameter¶
The weights parameter accepts two formats:
Positional Mode: LIST(DOUBLE)¶
A list of numeric weights, one per variant in order. The list length must match the variant count (or the region-filtered variant count). Zero-weight variants are skipped.
plink_score('data.pgen', weights := [0.5, -0.3, 0.0, 1.2])
ID-keyed Mode: LIST(STRUCT(id VARCHAR, allele VARCHAR, weight DOUBLE))¶
A list of structs specifying variant ID, scored allele, and weight. Variants not found in the .pvar file are silently skipped. The allele field determines orientation:
- If
allelematches ALT: dosage is used as-is - If
allelematches REF: dosage is flipped (2 - alt_dosage) - If
allelematches neither: the variant is skipped
plink_score('data.pgen', weights := [
{'id': 'rs1', 'allele': 'A', 'weight': 0.5},
{'id': 'rs2', 'allele': 'T', 'weight': -0.3}
])
Output Columns¶
| Column | Type | Description |
|---|---|---|
FID |
VARCHAR |
Family ID (NULL if no FID column in .psam) |
IID |
VARCHAR |
Individual ID |
ALLELE_CT |
INTEGER |
Total alleles observed (2 per non-missing scored variant) |
DENOM |
INTEGER |
Denominator for score averaging (same as ALLELE_CT) |
NAMED_ALLELE_DOSAGE_SUM |
DOUBLE |
Sum of scored allele dosages across variants |
SCORE_SUM |
DOUBLE |
Sum of weight * dosage across all scored variants |
SCORE_AVG |
DOUBLE |
SCORE_SUM / ALLELE_CT (0 if ALLELE_CT is 0) |
Description¶
plink_score computes a polygenic risk score for each sample by reading genotype dosages (via pgenlib's PgrGetD) and applying variant-specific weights.
Scoring Formula¶
For each sample, the score is:
SCORE_SUM = sum(weight_i * dosage_i) for all scored variants i
SCORE_AVG = SCORE_SUM / ALLELE_CT
Where dosage_i is the allele dosage (0, 1, or 2 for hardcalled genotypes).
Missing Data Handling¶
Three modes control how missing genotypes are handled:
| Mode | Behavior |
|---|---|
| Default (mean imputation) | Missing dosages are replaced with the variant's mean dosage |
no_mean_imputation := true |
Missing samples are skipped (their ALLELE_CT reflects only non-missing variants) |
center := true |
Variance-standardized scoring: (dosage - mean) / sd. Missing samples are skipped. Monomorphic variants are excluded. |
center and no_mean_imputation cannot both be true.
Requirements¶
- The
.psamfile is required (sample IDs are needed for output) - The
weightsparameter is required
Examples¶
-- Positional weights (one per variant)
SELECT IID, SCORE_SUM, SCORE_AVG
FROM plink_score('data/example.pgen',
weights := [0.5, -0.3, 1.2, 0.0]);
-- ID-keyed weights with allele specification
SELECT IID, SCORE_SUM
FROM plink_score('data/example.pgen', weights := [
{'id': 'rs1', 'allele': 'A', 'weight': 0.5},
{'id': 'rs2', 'allele': 'T', 'weight': -0.3}
]);
-- Variance-standardized scoring
SELECT IID, SCORE_AVG
FROM plink_score('data/example.pgen',
weights := [1.0, 1.0, 1.0, 1.0],
center := true);
-- Without mean imputation
SELECT IID, SCORE_SUM, ALLELE_CT
FROM plink_score('data/example.pgen',
weights := [0.5, -0.3, 1.2, 0.8],
no_mean_imputation := true);
-- Score a subset of samples
SELECT IID, SCORE_AVG
FROM plink_score('data/example.pgen',
weights := [0.5, 0.3, 0.2, 0.1],
samples := ['SAMPLE1', 'SAMPLE2']);
See Also¶
- plink_freq -- allele frequencies (useful for QC before scoring)
- Genotype Encoding -- encoding reference