plink_missing¶
Compute per-variant or per-sample missingness rates.
Synopsis¶
plink_missing(path VARCHAR [, pvar := ..., psam := ..., samples := ...,
region := ..., mode := ...]) -> TABLE
Parameters¶
| Name | Type | Default | Description |
|---|---|---|---|
path |
VARCHAR |
(required) | Path to the .pgen file |
pvar |
VARCHAR |
Auto-discovered | Path to .pvar or .bim file |
psam |
VARCHAR |
Auto-discovered | Path to .psam or .fam file |
samples |
LIST(VARCHAR) or LIST(INTEGER) |
All | Subset to specific samples |
region |
VARCHAR |
All | Filter to genomic region (chr:start-end) |
mode |
VARCHAR |
'variant' |
'variant' for per-variant or 'sample' for per-sample |
See Common Parameters for details on pvar, psam, samples, and region.
Output Columns¶
Variant Mode (mode := 'variant')¶
| Column | Type | Description |
|---|---|---|
CHROM |
VARCHAR |
Chromosome |
POS |
INTEGER |
Base-pair position |
ID |
VARCHAR |
Variant identifier |
REF |
VARCHAR |
Reference allele |
ALT |
VARCHAR |
Alternate allele |
MISSING_CT |
INTEGER |
Number of missing genotypes |
OBS_CT |
INTEGER |
Number of non-missing genotypes |
F_MISS |
DOUBLE |
Fraction missing (MISSING_CT / total_samples) |
Sample Mode (mode := 'sample')¶
| Column | Type | Description |
|---|---|---|
FID |
VARCHAR |
Family ID (NULL if unavailable) |
IID |
VARCHAR |
Individual ID |
MISSING_CT |
INTEGER |
Number of missing genotypes across variants |
OBS_CT |
INTEGER |
Number of non-missing genotypes across variants |
F_MISS |
DOUBLE |
Fraction missing (MISSING_CT / total_variants) |
Description¶
plink_missing computes missingness statistics using pgenlib's PgrGetMissingness fast path, which extracts missingness without full genotype decompression.
Variant Mode¶
In variant mode (the default), each row represents one variant with its missingness count across all (or selected) samples. This mode supports multi-threaded parallel scanning.
Sample Mode¶
In sample mode, the function scans all variants and accumulates per-sample missing counts, then outputs one row per sample. Sample mode:
- Requires a
.psamor.famfile (for IID output) - Is single-threaded (scans all variants, then emits sample rows)
- Supports
regionfiltering (only counts missingness for variants in the region)
Projection Pushdown¶
If only metadata columns are selected (variant identifiers in variant mode, FID/IID in sample mode), the missingness computation is skipped.
Examples¶
-- Per-variant missingness
SELECT * FROM plink_missing('data/example.pgen');
-- Per-sample missingness
SELECT * FROM plink_missing('data/example.pgen', mode := 'sample');
-- Find variants with high missingness
SELECT ID, F_MISS
FROM plink_missing('data/example.pgen')
WHERE F_MISS > 0.05;
-- Find samples with high missingness
SELECT IID, F_MISS
FROM plink_missing('data/example.pgen', mode := 'sample')
WHERE F_MISS > 0.1;
-- Missingness in a specific region
SELECT ID, F_MISS
FROM plink_missing('data/example.pgen',
region := '1:1-50000000');
See Also¶
- plink_freq -- allele frequencies (includes MISSING_CT with
counts := true) - plink_hardy -- HWE test
- Quality Control Guide -- missingness thresholds in QC