That's not true. I just did a high-quality sequence and assembly of a new specie...

lawrenceyan · on Dec 27, 2021

Given that the decoder is machine-learned and depends on a training set to go from squiggle -> ATGC..., how do you ensure that sequences which haven't been seen before (not in the training set) are still accurately accounted for?

joshuamcginnis · on Dec 28, 2021

We used Guppy for basecalling, which is neural network based and used to turn raw signal data into the predicted bases. There're no guarantees of accuracy, only tools to determine and assess quality. One major way of assessing accuracy is to compare the subject genome with other similar reference genomes and denote the high-degree of homology in highly-conserved regions.

lawrenceyan · on Jan 2, 2022

My question is if in the future, we would be able to fully rely on translations to predicted bases for sequencing or if there would always be a need to compare with a different sequencing methodology in the case of de novo genetic information that previously hasn't been seen before (no reference genomes being available in that case).

Is there publicly available information on how accurate Guppy is, as well as how the amount of training data scales with improvements in accuracy?

It didn't seem like these things were mentioned explicitly in the Community Update, other than that it’s expected to continue improving, but a clearer roadmap would definitely be much more helpful.

pas · on Dec 27, 2021

How do you know the quality of the resulting sequence?

joshuamcginnis · on Dec 27, 2021

There are quality checks throughout the entire process, starting from the raw read quality scores returned directly from the sequencer all the way to fully assembled genome completeness. In our paper, one of the tools we used for this is called BUSCO[0] which scored our assembly at 97.9%, a relatively high score for de novo assemblies.

[0] https://pubmed.ncbi.nlm.nih.gov/31020564/

ampdepolymerase · on Dec 27, 2021

You don't, not without either resequencing it with another sequencing system or benchmarking the sequencer with a known sequence.

jcims · on Dec 27, 2021

I thought I recognized your name from the side hustle story. :) This is super cool!!!

joshuamcginnis · on Dec 28, 2021

Thanks man!