pycmplot.liftover
Lazy hg18 → hg38 and hg19 → hg38 coordinate conversion powered by
pyliftover.
Conversion is triggered only when a genome-build column is detected (or
explicitly named via --build_column), and only for rows annotated as
hg18 or hg19. All other rows are passed through unchanged.
pycmplot.liftover
Genome coordinate liftover utilities (hg18 → hg38 and hg19 → hg38).
The pyliftover.LiftOver objects are initialised lazily — they
are created on first use and cached in a module-level dictionary, so
importing this module never triggers a file-not-found error even if the
chain files have not been configured yet.
Supported conversions
pycmplot harmonises input coordinates to GRCh38. Two source assemblies are supported:
hg19/ GRCh37 → GRCh38 (default, bundled chain file)hg18/ NCBI36 → GRCh38 (bundled chain file; used when input rows carry ahg18build label)
Resource configuration
Chain file paths are resolved through
ResourceConfig. By default, bundled chain
files are used (pycmplot/data/hg19ToHg38.over.chain.gz and
pycmplot/data/hg18ToHg38.over.chain.gz). They can be overridden by
setting the environment variables:
export PYCMPLOT_CHAIN_HG19_HG38=/path/to/hg19ToHg38.over.chain.gz
export PYCMPLOT_CHAIN_HG18_HG38=/path/to/hg18ToHg38.over.chain.gz
- pycmplot.liftover.liftover_hg18_to_hg38(chrom: str, pos: int, resources: ResourceConfig | None = None) int | None[source]
Convert a single hg18 (NCBI36) position to its hg38 equivalent.
Uses a lazily loaded and cached
LiftOverobject backed by the hg18→hg38 chain file specified in resources. When multiple hg38 mappings exist for a given position, the one with the highest chain score is returned.- Parameters:
chrom (str) – Chromosome name without the
'chr'prefix (e.g.'1','X'). The prefix is added internally before querying pyliftover.pos (int) – 0-based hg18 position, as expected by
pyliftover.LiftOver.resources (ResourceConfig, optional) –
ResourceConfiginstance. Falls back todefault_resourceswhenNone.
- Returns:
Corresponding 0-based hg38 position, or
Noneif the position could not be mapped (unmapped region, chromosome gap, or deleted sequence).- Return type:
int or None
See also
liftover_hg19_to_hg38Equivalent helper for hg19 coordinates.
liftover_positionApplies the appropriate per-row dispatcher to a full DataFrame.
- pycmplot.liftover.liftover_hg19_to_hg38(chrom: str, pos: int, resources: ResourceConfig | None = None) int | None[source]
Convert a single hg19 position to its hg38 equivalent.
Uses a lazily loaded and cached
LiftOverobject backed by the chain file specified in resources. When multiple hg38 mappings exist for a given position, the one with the highest chain score is returned.- Parameters:
chrom (str) – Chromosome name without the
'chr'prefix (e.g.'1','X'). The prefix is added internally before querying pyliftover.pos (int) – 0-based hg19 position, as expected by
pyliftover.LiftOver.resources (ResourceConfig | Target Build Version, optional) –
ResourceConfiginstance. Falls back todefault_resourceswhenNone.
- Returns:
Corresponding 0-based hg38 position, or
Noneif the position could not be mapped (unmapped region, chromosome gap, or deleted sequence).- Return type:
int or None
Notes
pyliftover uses 0-based coordinates (BED convention). GWAS summary statistics files typically use 1-based coordinates (VCF/Ensembl convention). The caller (
liftover_position()) is responsible for any coordinate-system adjustment.Examples
>>> from pycmplot.liftover import liftover_hg19_to_hg38 >>> new_pos = liftover_hg19_to_hg38("11", 5246695) >>> new_pos 5225465
- pycmplot.liftover.liftover_position(df: DataFrame, hg38_chr_limits: dict = None, resources: ResourceConfig | None = None) DataFrame[source]
Liftover all hg18/hg19 rows in df to hg38 coordinates.
Iterates over every row in df and dispatches to
liftover_hg19_to_hg38()for rows whoseBUILDcolumn equals'hg19'or toliftover_hg18_to_hg38()for rows whoseBUILDcolumn equals'hg18'. Rows with any other build value are passed through unchanged. Rows for which liftover returnsNoneor0(unmappable positions) are silently dropped.Two provenance columns are added to the returned DataFrame so that the original coordinates remain accessible:
OLD_POS— the pre-liftover base-pair position.OLD_BUILD— the original build value ('hg19').
After processing, the
BUILDcolumn is updated to'hg38'for all rows.- Parameters:
df (pandas.DataFrame) – Summary statistics DataFrame with canonical columns
CHR,POS, andBUILD. ThePOScolumn is coerced tointbefore processing.resources (ResourceConfig, optional) –
ResourceConfiginstance supplying the chain file path. Falls back todefault_resourceswhenNone.
- Returns:
A copy of df with:
POSreplaced by hg38 coordinates for all hg19 rows.BUILDset to'hg38'for all rows.OLD_POSandOLD_BUILDcolumns added.Rows with unmappable positions (new
POS == 0) removed.
- Return type:
See also
liftover_hg19_to_hg38Single-position conversion function called internally.
Examples
>>> from pycmplot.liftover import liftover_position >>> df_hg38 = liftover_position(df) >>> df_hg38["BUILD"].unique() array(['hg38'], dtype=object) >>> "OLD_POS" in df_hg38.columns True