plink_freq¶
Compute per-variant allele frequencies using pgenlib's fast counting path.
Synopsis¶
plink_freq(path VARCHAR [, pvar := ..., psam := ..., samples := ...,
region := ..., counts := ...]) -> TABLE
Parameters¶
| Name | Type | Default | Description |
|---|---|---|---|
path |
VARCHAR |
(required) | Path to the .pgen file |
pvar |
VARCHAR |
Auto-discovered | Path to .pvar or .bim file |
psam |
VARCHAR |
Auto-discovered | Path to .psam or .fam file |
samples |
LIST(VARCHAR) or LIST(INTEGER) |
All | Subset to specific samples |
region |
VARCHAR |
All | Filter to genomic region (chr:start-end) |
counts |
BOOLEAN |
false |
Include genotype count columns |
See Common Parameters for details on pvar, psam, samples, and region.
Reserved Parameters (Not Yet Implemented)¶
| Name | Type | Description |
|---|---|---|
dosage |
BOOLEAN |
Dosage-based frequency computation (raises error if true) |
Output Columns¶
Default Columns¶
| Column | Type | Description |
|---|---|---|
CHROM |
VARCHAR |
Chromosome |
POS |
INTEGER |
Base-pair position |
ID |
VARCHAR |
Variant identifier |
REF |
VARCHAR |
Reference allele |
ALT |
VARCHAR |
Alternate allele |
ALT_FREQ |
DOUBLE |
Alternate allele frequency (NULL if all missing) |
OBS_CT |
INTEGER |
Number of observed alleles (2 * non-missing samples) |
Additional Columns with counts := true¶
| Column | Type | Description |
|---|---|---|
HOM_REF_CT |
INTEGER |
Homozygous reference count |
HET_CT |
INTEGER |
Heterozygous count |
HOM_ALT_CT |
INTEGER |
Homozygous alternate count |
MISSING_CT |
INTEGER |
Missing genotype count |
Description¶
plink_freq computes allele frequencies for each variant using pgenlib's PgrGetCounts fast path, which counts genotypes without full decompression. This is significantly faster than extracting individual genotypes.
Frequency Calculation¶
The alternate allele frequency is computed as:
ALT_FREQ = (HET_CT + 2 * HOM_ALT_CT) / (2 * (HOM_REF_CT + HET_CT + HOM_ALT_CT))
When all genotypes for a variant are missing, ALT_FREQ is NULL and OBS_CT is 0.
Projection Pushdown¶
If only metadata columns (CHROM, POS, ID, REF, ALT) are selected, genotype counting is skipped entirely.
Examples¶
-- Basic allele frequencies
SELECT ID, ALT_FREQ FROM plink_freq('data/example.pgen');
-- With genotype counts
SELECT ID, ALT_FREQ, HOM_REF_CT, HET_CT, HOM_ALT_CT
FROM plink_freq('data/example.pgen', counts := true);
-- Filter rare variants (MAF < 1%)
SELECT ID, ALT_FREQ
FROM plink_freq('data/example.pgen')
WHERE ALT_FREQ < 0.01 OR ALT_FREQ > 0.99;
-- Frequency in a sample subset
SELECT ID, ALT_FREQ
FROM plink_freq('data/example.pgen',
samples := ['CASE1', 'CASE2', 'CASE3']);
-- Region-restricted frequencies
SELECT ID, ALT_FREQ
FROM plink_freq('data/example.pgen',
region := '22:1-50000000');
See Also¶
- plink_hardy -- HWE test (also uses genotype counts)
- plink_missing -- missingness analysis
- Quality Control Guide -- QC workflows using frequency data