API Reference
Parsers
parse_xml
uniprotlib.parse_xml(*paths)
Stream-parse one or more UniProt XML files, yielding UniProtEntry objects.
Accepts plain XML or gzip-compressed files (auto-detected from .gz
extension). Handles both namespace variants (http:// for single-entry
web downloads, https:// for bulk FTP dumps). Files are processed
sequentially. Memory stays bounded regardless of file size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*paths
|
str | Path
|
One or more file paths (str or Path) to UniProt XML files. |
()
|
Yields:
| Type | Description |
|---|---|
UniProtEntry
|
UniProtEntry for each |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no paths are provided. |
Example::
from uniprotlib import parse_xml
for entry in parse_xml("uniprot_sprot.xml.gz"):
print(entry.primary_accession, entry.organism.scientific_name)
parse_idmapping
uniprotlib.parse_idmapping(*paths, id_type=None)
Stream-parse one or more UniProt idmapping.dat files.
Yields one IdMapping per line (one accession–database–id triple).
Accepts plain text or gzip-compressed files (auto-detected from .gz
extension).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*paths
|
str | Path
|
One or more file paths (str or Path) to idmapping.dat files. |
()
|
id_type
|
str | None
|
If set, only yield rows matching this database type,
e.g. |
None
|
Yields:
| Type | Description |
|---|---|
IdMapping
|
IdMapping for each (matching) line in the file. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no paths are provided. |
Example::
from uniprotlib import parse_idmapping
for m in parse_idmapping("idmapping.dat.gz", id_type="GeneID"):
print(m.accession, m.id)
Models
UniProtEntry
uniprotlib.UniProtEntry
dataclass
A single UniProtKB entry parsed from XML.
Attributes:
| Name | Type | Description |
|---|---|---|
primary_accession |
str
|
Primary accession, e.g. |
accessions |
list[str]
|
All accessions including primary and secondary. |
entry_name |
str
|
Mnemonic entry name, e.g. |
dataset |
str
|
|
protein_name |
str | None
|
Recommended full protein name. None if not annotated. |
gene |
Gene | None
|
Gene names. None if the entry has no gene annotation. |
organism |
Organism
|
Source organism with taxonomy. |
sequence |
Sequence
|
Amino acid sequence with metadata. |
keywords |
list[str]
|
UniProt keywords, e.g. |
db_references |
list[DbReference]
|
Cross-references to external databases. |
protein_existence |
str | None
|
Protein existence evidence level, e.g.
|
Organism
uniprotlib.Organism
dataclass
Organism annotation from a UniProt entry.
Attributes:
| Name | Type | Description |
|---|---|---|
scientific_name |
str | None
|
Binomial name, e.g. |
common_name |
str | None
|
Vernacular name, e.g. |
tax_id |
str | None
|
NCBI Taxonomy identifier as a string, e.g. |
lineage |
list[str]
|
Taxonomic lineage from root to most specific taxon,
e.g. |
Gene
uniprotlib.Gene
dataclass
Gene names associated with a UniProt entry.
Attributes:
| Name | Type | Description |
|---|---|---|
primary |
str | None
|
Primary gene name, e.g. |
synonyms |
list[str]
|
Alternative gene names, e.g. |
ordered_locus_names |
list[str]
|
Systematic locus identifiers, e.g. |
orf_names |
list[str]
|
Open reading frame identifiers. |
Sequence
uniprotlib.Sequence
dataclass
Protein amino acid sequence.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
str
|
Amino acid string (no whitespace), e.g. |
length |
int
|
Number of amino acids. |
mass |
int
|
Molecular mass in Daltons. |
checksum |
str
|
CRC64 checksum of the sequence. |
DbReference
uniprotlib.DbReference
dataclass
Cross-reference to an external database.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
Database name, e.g. |
id |
str
|
Identifier in that database, e.g. |
molecule |
str | None
|
Isoform identifier, e.g. |
properties |
dict[str, str]
|
Additional key-value properties, e.g.
|
IdMapping
uniprotlib.IdMapping
dataclass
Single row from a UniProt idmapping.dat file.
Each row maps a UniProt accession to one identifier in an external database.
Attributes:
| Name | Type | Description |
|---|---|---|
accession |
str
|
UniProtKB accession, e.g. |
id_type |
str
|
Database name, e.g. |
id |
str
|
Identifier in that database, e.g. |