Thursday, April 14, 2016

What about the molecules?

It is very likely that every person that has asked me about my research has come to a comment like "yes, I love beetles and morphology but I don't really like molecules... and I know I will have to deal with them at some point"

My experience with molecular data is very limited at this point in time. I have performed maybe one or two extractions and maybe just one full protocol composed of DNA extraction, electrophoresis and PCR, without ever knowing the final product, just for the technique and it was maybe in 2009 during my Master's at the Biology Department at UPRM, and then, when I joined the Short Lab in August 2015, I knew that working with molecules was going to happen!

The lab has been working actively with molecular data, so by December 2015 a full set of amplified samples were sent for sequencing, and by mid January this year it was my duty to take a look at the sequences and edit them when necessary in order to produce clean alignments. All the visualizations, edits and alignments were done using Geneious R-8, which I consider a friendly and efficient tool.

The process (which may be obvious for those familiar with this kind of data) starts with files to be imported into the Program. At this time we had for each gene of each specimen, two sequences: forward and reverse. Those two sequences needed to be matched to get one Assembly for each specimen. On the Assembly (composed of two strands) it is necessary to check for ambiguities, for example when one of the strands says A, but the other says G, and make a decision (based on the height of the peak and the probability on each strand) to call one chain as G or A to match the other one. You can check the image below to see what I'm trying to explain. For this edition process, we created files with a track of all the modifications made in the sequences, in case it is necessary to check for errors some steps ahead.

Screenshot of Geneious R-8: Edit of COI sequence.

After you get your editing done (and documented), it is time to convert the file into a Consensus Sequence, which contains only one strand, longer than either of the previous file, as it contains the forward and reverse tails of the amplified fragment. You need to do that for all of the specimens. Once you have all the edited Consensus sequences, you can align them (see image below) and ask the Program to highlight the base pairs that differ among sequences. You need to watch out for sequences that might be reversed, which causes them not to be properly aligned.

Screenshot of Geneious R-8: Edit of COI sequence.

At this stage you check for gaps and 'misplaced' bases, which strongly depend on which gene are you working with. Then you might return to the Assemblies in order to see if the gaps are real and if the odd bases are the product of one of the previously resolved ambiguities (here is where the notes come in handy). Finally, you can ask the Program to produce a tree with selected Consensus sequences (see image below).

Screenshot of Geneious R-8: Raw COI tree.

As far as I can tell, Geneious would produce a raw result that allows visualization. It also allows to run sequences through BLAST when you suspect contamination or you amplified something else by accident. I still don't know if you can run formal analyses and models of molecular evolution in the program... I haven't got there yet.

One of the 'extra' lessons learned from this process is that consistency (apply the same rules to everything), order and documentation are fundamental for an efficient development of this kind of work.