Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's not true. I just did a high-quality sequence and assembly of a new species of fungus from my home lab using nanopore. You can see all my code used for assembly and analysis that will be referenced in a paper I plan to publish in Jan here: https://github.com/EverymanBio/pestalotiopsis


Given that the decoder is machine-learned and depends on a training set to go from squiggle -> ATGC..., how do you ensure that sequences which haven't been seen before (not in the training set) are still accurately accounted for?


We used Guppy for basecalling, which is neural network based and used to turn raw signal data into the predicted bases. There're no guarantees of accuracy, only tools to determine and assess quality. One major way of assessing accuracy is to compare the subject genome with other similar reference genomes and denote the high-degree of homology in highly-conserved regions.


My question is if in the future, we would be able to fully rely on translations to predicted bases for sequencing or if there would always be a need to compare with a different sequencing methodology in the case of de novo genetic information that previously hasn't been seen before (no reference genomes being available in that case).

Is there publicly available information on how accurate Guppy is, as well as how the amount of training data scales with improvements in accuracy?

It didn't seem like these things were mentioned explicitly in the Community Update, other than that it’s expected to continue improving, but a clearer roadmap would definitely be much more helpful.


How do you know the quality of the resulting sequence?


There are quality checks throughout the entire process, starting from the raw read quality scores returned directly from the sequencer all the way to fully assembled genome completeness. In our paper, one of the tools we used for this is called BUSCO[0] which scored our assembly at 97.9%, a relatively high score for de novo assemblies.

[0] https://pubmed.ncbi.nlm.nih.gov/31020564/


You don't, not without either resequencing it with another sequencing system or benchmarking the sequencer with a known sequence.


I thought I recognized your name from the side hustle story. :) This is super cool!!!


Thanks man!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: