How I used Google Colab’s A100 to basecall 100 Gb of data within 3hrs (and only cost me $2).
Share
[4 min read] By Igor Bogorad
Our new sequencing company processes a lot of data using Oxford Nanopore Technology’s MinION flow cell. It’s an amazing instrument that sequences long strands of DNA through a protein pore, generating electrical signals as DNA passes through. These signals are then decoded, or “basecalled,” using sophisticated AI-driven algorithms such as Dorado. With a GPU, the sequencer can almost instantaneously generate raw signals and convert them to DNA sequences (live basecalling). While live basecalling usually works smoothly, hiccups can happen.
Recently, I hit one of those hiccups after updating Oxford's software called MinKNOW. The live basecaller produced corrupted FASTQ files.

This problem was reported [here], and ONT’s support team quickly provided a patch [here]. While MinKNOW can perform post-run basecalling, this is painfully slow on modest GPUs (e.g., the RTX models ONT recommends for MinIONs [link]). Basecalling 100 GB this way could take 10+ hours.
But Dorado’s repository clearly states that it is “heavily optimized for Nvidia A100 and H100 GPUs” and performs best on these systems. Of course, buying an A100 outright (~$10,000) is unrealistic for most labs.
The workaround: use Google Colab’s A100 instances in the cloud.
Practical Setup
- Colab Plan: If you’re not a heavy user, Colab’s Pay-As-You-Go option is great. $10 gives you 100 credits, which is more than enough for several sequencing runs. A100 GPUs cost 5 credits/hour ($0.50/hr).
- Pro vs Pro+: A Pro subscription increases your chances of landing an A100. With Pro+, your priority is even higher. If the A100 isn’t immediately available, wait a bit and try again.
- Select A100 under “Runtime” -> “Change runtime type”
Caveat: Upload bandwidth matters. At <50 Mbps, the upload time can outweigh the GPU speed benefit. At our lab in Bonneville Labs, fast fiber internet makes this practical.
Workflow (Simplified)
For those who aren’t full-time coders (like me), here’s the rough sequence:
-
Mount Google Drive so Colab can access your data.
-
Download and install Dorado (must repeat each session since Colab resets).
-
Download the model, such as: dna_r10.4.1_e8.2_400bps_sup@v5.0.0
.
-
List POD5/FAST5 files and confirm paths.
-
Run basecalling → output combined FASTQ.
- Demultiplex (if needed) using dorado demux.
Here is link to the Google Colab notebook. Feel free to make a copy and edit it as you like.
Result: 100 GB basecalled in ~3 hours at a total compute cost of about $2.
Why It Matters
Cheap access to powerful GPUs can be a lifesaver when you hit unexpected software issues. Beyond basecalling, A100s in Colab could also support other GPU-intensive genomics tasks.
Have you tried other fast or low-cost cloud solutions for nanopore data? Drop a note below—I’d love to compare strategies.
Interested in high quality and inexpensive nanopore sequencing? Try Angstrom Innovation at sequencing.angstrominno.com
