Pan-genomic advances for fighting reference bias

Talk
Ben Langmead
Time: 
09.15.2022 14:00 to 15:00
Location: 

IRB- 4105- Zoom Link-https://umd.zoom.us/j/97287503999

Also on Zoom- https://umd.zoom.us/j/97287503999

Sequencing data analysis often begins with aligning sequencing reads to a reference genome, where the reference takes the form of a linear string of bases. But linearity leads to reference bias, a tendency to miss or misreport alignments containing non-reference alleles, which can confound downstream statistical and biological results. This is a major concern in human genomics; we don't want to live in a world where diagnostics and therapeutics are differentially effective depending how closely our genome matches the reference.

Fortunately, computer science and bioinformatics are meeting the moment. In particular, we can now index and align sequencing reads to references that include many population variants. Here I will describe this journey from the early days of efficient genome indexing -- especially the FM index approach behind Bowtie and BWA -- continuing through more modern methods for graph-shaped references and references that include many genomes. I will emphasize recent results that show how to optimize simple and complex pan-genome representations for effective avoidance of reference bias. Finally, I will outline some promising future areas, including a new class of compressed indexes that improves locality of reference.