biostrings

Efficient manipulation of genomic sequences in Python, inspired by the design of Bioconductor's Biostrings package.

The core design relies on a "pool and ranges" memory model:

DNAStringSet stores all sequences in a single contiguous block of memory (the pool).
Individual sequences are defined by start and width coordinates (the ranges).
Slicing a DNAStringSet returns a view (a new set of ranges pointing to the same pool), making subsetting operations virtually instantaneous and memory-free, regardless of the data size.

Install

To get started, install the package from PyPI

pip install biostrings

Quick Start

Working with Single Sequences

The DNAString class represents a single DNA sequence. It enforces the IUPAC DNA alphabet and supports efficient byte-level operations.

from biostrings import DNAString

# Create a DNA string
dna = DnaString("TTGAAAA-CTC-N")
print(dna)
# Output: TTGAAAA-CTC-N

# Basic operations
print(len(dna))            # 13
print(dna[0:3])            # DnaString(length=3, sequence='TTG')

# Reverse Complement
# Handles IUPAC ambiguity codes correctly (e.g., N -> N, M -> K)
rc = dna.reverse_complement()
print(rc)
# Output: N-GAG-TTTTCAA

Working with Sets of Sequences

The DNAStringSet is the primary container for handling collections of sequences (e.g., reads from a FASTA file).

from biostrings import DNAStringSet

# Efficiently create a set from a list of strings
seqs = [
    "ACGT",
    "GATTACA",
    "TTGAAAA-CTC-N",
    "ACGTACGT"
]
dss = DNAStringSet(seqs, names=["s1", "s2", "s3", "s4"])

print(dss)
# Output:
# <DNAStringSet of length 4>
#   [ 1]   4 ACGT                 s1
#   [ 2]   7 GATTACA              s2
#   [ 3]  13 TTGAAAA-CTC-N        s3
#   [ 4]   8 ACGTACGT             s4

Note

This project has been set up using BiocSetup and PyScaffold.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
docs		docs
lib		lib
src/biostrings		src/biostrings
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

biostrings

Install

Quick Start

Working with Single Sequences

Working with Sets of Sequences

Note

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

BiocPy/biostrings

Folders and files

Latest commit

History

Repository files navigation

biostrings

Install

Quick Start

Working with Single Sequences

Working with Sets of Sequences

Note

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages