Hi everyone and thanks in advance for any of your help.

I am running into an issue recently when trying to calculate the scores of a high number of short peptide alignments (10000).

I have to calculate all the alignement scores (calculation based on the BLOSUM62 matrix, but I could also want to use other BLOSUM matrix) of a set of 10000 alignments.

Those alignments are between one "reference" sequence (that does not vary) and 10000 other (almost random) sequences. Those sequences are all very short (9 to 20aa — *my example bellow is with 9aa*).

First, I already generated the list of 10000 (almost random) peptide sequences in an .xls (or .cvs) table. No issue for that.

Then, I found this **biopython script** to calculate the score of **one** alignement (here i put one example with 2 of my peptides): (the script come from this forum)

```
from Bio.SubsMat.MatrixInfo import blosum62 as blosum
blosum.update(((b,a),val) for (a,b),val in list(blosum.items()))
def score_pairwise(seq1, seq2, matrix, gap_s, gap_e, gap = True):
for A,B in zip(seq1, seq2):
diag = ('-'==A) or ('-'==B)
yield (gap_e if gap else gap_s) if diag else matrix[(A,B)]
gap = diag
seq1 = 'ALNSSGTLT'
seq2 = 'AVNDSG-LT'
print(sum(score_pairwise(seq1, seq2, blosum, -3, -1)))
```

—> this give me the correct score of 27. All perfect.

The issue I am facing now is that although this script is perfect for me to calculate the score of the 2 sequences aligned (seq1 and seq2), I am struggling to find a neat way to apply this Python script on 10000 alignements.

As I said, **seq1 does not vary** so nothing to change here. Now, i wanted python to **retrieve the 10000 other seq2** sequences from my xls/cvs sheet into a **list** so that the variable seq2 could take the various string values from that list. For that I did:

```
import pandas as pd
all_seq2_list=pd.read_csv('10000seq2.csv')
all_seq2_list.values.T[0].tolist()
```

Which return me the list ['randompeptide1', 'randompeptide2',…, 'randompeptide10000']

So far so good.

I have a vague idea that I should find a way to make a loop where the seq2 variable changes each time, assigning it the new string value (peptide sequence) corresponding to the next sequence in my all_seq2_list but I don't have enough technical knowledge in python to understand how to do that.

Bottom line is:
**I am struggling with the construction of the loop and how to (for each iteration) give the next value from my all_seq2_list to my seq2 variable; and then calculate the 10000 resulting scores**
I think it is probably not very difficult for someone who use python every day but as a newbie I am stuck.

Can anyone help me ? Thank you!

The part that import the BLOSUM matrix and defines pairwise scoring should be outside of the loop, as it needs to be defined only once at the top of the script. What you are doing here is repeating these imports and definitions for each of your sequences. While it won't hurt the final result, it is a waste of time.

`seq1`

can also be defined outside of the loop since it never changes.