I was trying to build a Cazy reference database to explore the carbohydrate profiles of the genomes in our microbial laboratory collection. I have a parallel script downloading the proteins sequences from NCBI then aligning and creating a hmm profile for later use.
This morning I am looking through the process that ran overnight and spotted some alignment that didn’t work. Trying to reproduce the error:
Ok let’s have a look at sequence 3205 in our fasta file
Trying to figure out what was happening there for a good 15 minutes until suddenly…
Turns out the NCBI download fail a couple of times during the night and just added the error message to the fasta file…
Oh well…