pycmplot.liftover

Lazy hg18 → hg38 and hg19 → hg38 coordinate conversion powered by pyliftover. Conversion is triggered only when a genome-build column is detected (or explicitly named via --build_column), and only for rows annotated as hg18 or hg19. All other rows are passed through unchanged.

pycmplot.liftover

Genome coordinate liftover utilities (hg18 → hg38 and hg19 → hg38).

The pyliftover.LiftOver objects are initialised lazily — they are created on first use and cached in a module-level dictionary, so importing this module never triggers a file-not-found error even if the chain files have not been configured yet.

Supported conversions

pycmplot harmonises input coordinates to GRCh38. Two source assemblies are supported:

  • hg19 / GRCh37 → GRCh38 (default, bundled chain file)

  • hg18 / NCBI36 → GRCh38 (bundled chain file; used when input rows carry a hg18 build label)

Resource configuration

Chain file paths are resolved through ResourceConfig. By default, bundled chain files are used (pycmplot/data/hg19ToHg38.over.chain.gz and pycmplot/data/hg18ToHg38.over.chain.gz). They can be overridden by setting the environment variables:

export PYCMPLOT_CHAIN_HG19_HG38=/path/to/hg19ToHg38.over.chain.gz
export PYCMPLOT_CHAIN_HG18_HG38=/path/to/hg18ToHg38.over.chain.gz
pycmplot.liftover.liftover_hg18_to_hg38(chrom: str, pos: int, resources: ResourceConfig | None = None) int | None[source]

Convert a single hg18 (NCBI36) position to its hg38 equivalent.

Uses a lazily loaded and cached LiftOver object backed by the hg18→hg38 chain file specified in resources. When multiple hg38 mappings exist for a given position, the one with the highest chain score is returned.

Parameters:
  • chrom (str) – Chromosome name without the 'chr' prefix (e.g. '1', 'X'). The prefix is added internally before querying pyliftover.

  • pos (int) – 0-based hg18 position, as expected by pyliftover.LiftOver.

  • resources (ResourceConfig, optional) – ResourceConfig instance. Falls back to default_resources when None.

Returns:

Corresponding 0-based hg38 position, or None if the position could not be mapped (unmapped region, chromosome gap, or deleted sequence).

Return type:

int or None

See also

liftover_hg19_to_hg38

Equivalent helper for hg19 coordinates.

liftover_position

Applies the appropriate per-row dispatcher to a full DataFrame.

pycmplot.liftover.liftover_hg19_to_hg38(chrom: str, pos: int, resources: ResourceConfig | None = None) int | None[source]

Convert a single hg19 position to its hg38 equivalent.

Uses a lazily loaded and cached LiftOver object backed by the chain file specified in resources. When multiple hg38 mappings exist for a given position, the one with the highest chain score is returned.

Parameters:
  • chrom (str) – Chromosome name without the 'chr' prefix (e.g. '1', 'X'). The prefix is added internally before querying pyliftover.

  • pos (int) – 0-based hg19 position, as expected by pyliftover.LiftOver.

  • resources (ResourceConfig | Target Build Version, optional) – ResourceConfig instance. Falls back to default_resources when None.

Returns:

Corresponding 0-based hg38 position, or None if the position could not be mapped (unmapped region, chromosome gap, or deleted sequence).

Return type:

int or None

Notes

pyliftover uses 0-based coordinates (BED convention). GWAS summary statistics files typically use 1-based coordinates (VCF/Ensembl convention). The caller (liftover_position()) is responsible for any coordinate-system adjustment.

See also

liftover_position

Applies

func:liftover_hg19_to_hg38 row-wise to a full DataFrame.

Examples

>>> from pycmplot.liftover import liftover_hg19_to_hg38
>>> new_pos = liftover_hg19_to_hg38("11", 5246695)
>>> new_pos
5225465
pycmplot.liftover.liftover_position(df: DataFrame, hg38_chr_limits: dict = None, resources: ResourceConfig | None = None) DataFrame[source]

Liftover all hg18/hg19 rows in df to hg38 coordinates.

Iterates over every row in df and dispatches to liftover_hg19_to_hg38() for rows whose BUILD column equals 'hg19' or to liftover_hg18_to_hg38() for rows whose BUILD column equals 'hg18'. Rows with any other build value are passed through unchanged. Rows for which liftover returns None or 0 (unmappable positions) are silently dropped.

Two provenance columns are added to the returned DataFrame so that the original coordinates remain accessible:

  • OLD_POS — the pre-liftover base-pair position.

  • OLD_BUILD — the original build value ('hg19').

After processing, the BUILD column is updated to 'hg38' for all rows.

Parameters:
  • df (pandas.DataFrame) – Summary statistics DataFrame with canonical columns CHR, POS, and BUILD. The POS column is coerced to int before processing.

  • resources (ResourceConfig, optional) – ResourceConfig instance supplying the chain file path. Falls back to default_resources when None.

Returns:

A copy of df with:

  • POS replaced by hg38 coordinates for all hg19 rows.

  • BUILD set to 'hg38' for all rows.

  • OLD_POS and OLD_BUILD columns added.

  • Rows with unmappable positions (new POS == 0) removed.

Return type:

pandas.DataFrame

See also

liftover_hg19_to_hg38

Single-position conversion function called internally.

Examples

>>> from pycmplot.liftover import liftover_position
>>> df_hg38 = liftover_position(df)
>>> df_hg38["BUILD"].unique()
array(['hg38'], dtype=object)
>>> "OLD_POS" in df_hg38.columns
True