|
In order for your WAM run to work correctly, your sequences should be aligned with the known structures provided on the input page . Autoalign will do this for you in the majority of cases, but if your sequence contains non-standard features, you will need to do a manual alignment. How to do this is detailed below, but first, a word on the numbering system used for residues on the sequence submission section of the WAM site. A word on numberingA residue numbering scheme is used to label residues. This is shown above and below the known structures on the input page . It is an aligned numbering system, which is designed to account for deletions in CDRs and other parts of the sequence, so that a residue in a particular position - for example the first residue of the H3 loop - is numbered the same in all structures. A unified numbering scheme is used for both the light and the heavy chain, such that the light chain residues are numbered 1 to 117 and the heavy chain residues 118 to 249. This is detailed as below.
Thus, a loop is numbered the same whatever its length; and in alignments, deletions are denoted as '-'. You should use '-' when inserting deletions. See the aligned light and heavy chains on the WAM input page as an example. Aligning the sequenceTwo topics are covered: aligning the CDRs, and aligning the framework. In most cases, only the CDRs need to be aligned; however, lambda light chains have a framework deletion at residue 13, and occasionally deletions need to be added to compensate for missing residues at the beginning or the end of the sequence (see here). Note that you should only input the Fv section of each chain (VL and VH), up to the -KRA, -LGQ or -TVS motif (see here); do not enter any constant region sequence! Aligning the CDRsMost deletions will occur in the CDRs, so they are the important sections to get right in terms of alignment.
1. Determining the CDRs from conserved framework residuesThe first stage in your alignment should be to determine which residues comprise the CDR. All CDRs are marked by highly conserved framework residues to either side, which can be used to determine this. L1 The L1 loop is preceded by a conserved cysteine (C) at residue 23, and followed by a conserved tryptophan (W) at residue 41. The latter is usually followed by a tyrosine (Y) at residue 42. This is illustrated with the structure fragments below.
222 444 123<------L-1------>123 Residue number ITCSATSSVN-------YMHWFQ ITCSASSSVS-------SLHWYQ MTCSASSSVN-------YMYWYQ ITCRASENIYS------NLAWYQ ITCKASQDIRK------YLNWYQ ITCSANALPNQ------YAYWYQ MACRASSSVSST-----YLHWYQ ISCSGTSSNIG----SSTVNWYQ ISCTGSSSNIG---AGHNVKWYQ ISCRASESVDDD--GNSFLHWYQ ISCRSSQSLVHS-NGNTYLHWYL MSCKSSQSLLNSGNQKNFLAWYQ MSCKSSQSLLNSGNQKNYLTWYQL2 The L2 loop, almost always 7 residues in length, is preceded by a tyrosine (Y) at residue 55 and followed by a glycine (G) at residue 63. Due to its conservation, it rarely needs to be aligned. L3 The L3 loop is preceded by a conserved cysteine (C) at residue 94 and followed by a conserved Phe-Gly (FG) pair at residues 106 and 107. This is illustrated below.
111
999 000
234<---L-3--->678 Residue number
YYCQQRSS--YPITFGS
YYCQQWTY--PLITFGA
YYCQQWGR--NP-TFGG
YYCQHFWG--TPYTFGG
YYCLQHGE--SPYTFGG
YYCQAWDN--SASIFGG
YYCQQYSG--YPLTFGA
YYCAAWDVSLNAYVFGT
YYCQSYDR--SLRVFGG
YYCQQSNE--DPLTFGA
YFCSQSTH--VPLTFGA
YYCQNDHS--YPLTFGA
YYCQNDYS--YPLTFGA
H1
The H1 loop is preceded, 4 residues before the first residue, by a conserved
cysteine (C) at residue 139 and followed by a conserved Trp-Val (WV) pair at
residues 155 and 156. In a few cases (e.g. the first two examples below)
there are two tryptophans in this
region, and the V can be replaced by I which can confuse things - if so, look
out for a motif like WIRQ, WVRQ, WVKQ - i.e. a charged residue two after the
W and then a hydrophilic residue (normally Q).
1111 1111 3444 5555 9012<----H1---->5678 Residue number CKATGYTFSE--YWIEWVKE CKATGYTFST--YWIEWVKQ CTVSGFSLTG--YGVNWVRQ CATSGFTFTD--YYMSWVRQ CKASGFNIKD--YYMHWVKQ CAASGFDFSK--YWMSWVRQ CATSGFTFSD--YYMYWVRQ CKASGYAFSS--FWVNWVKQ CATSGFTFSD--FYMEWVRQ CAASGYSITS-DYAWNWIRQHere the last example has an I rather than the normal V; and the first two examples have two tryptophans. The second one is the correct one to look for (155); the first one (152) is within the loop. H2
The H2 loop is preceded, 3 residues before the first residue, by a conserved
tryptophan (W) - often part of a motif WIG -
at residue 166 and followed by a conserved tyrosine (Y) at
residue 181. Sometimes more than 1 Y appear in the sequence around this
point but this one can be identified because it is normally followed 5 residues
later by a lysine (K) at residue 186. However it can mean that aligning the
H2 is difficult in some cases. See here for hints on
aligning the heavy chain.
H3 The H3 loop is preceded, 3 residues before the first residue, by a conserved cysteine (C) at residue 217 - this typically forms part of a CAR or CTR motif. The loop is followed by a conserved Trp-Gly (WG) pair at residues 240 and 241.
222 222 111 444 789<--------H-3------->012 Residue number CTRGYSS-------------MDYWGQ CARGDGN-------------YGYWGQ CARERDYR------------LDYWGQ CTRDPYGP------------AAYWGQ CARDNSYY------------FDYWGQ CARLHYYGY-----------NAYWGQ CARHGGYYA-----------MDYWGQ CARSGNYPYA----------MDYWGQ CARNYYGSTWY---------FDVWGA CAKHRVSYVLTG--------FDSWGQ
2. Aligning the loops themselvesOnce the sequence representing the CDR has been determined, deletions will need to be added to the sequence in most cases. It is important that, for a given length loop, these are in the correct place in the loop. Work out first how many deletions are needed by looking for loops of corresponding length to yours in the aligned sequences above, and add the deletions ('-') in the corresponding place. See the tips below for help with the heavy chain, which can be difficult to align. If there is no matching length loop to yours, the deletions can be added anywhere - as long as they're within that loop! 3. General tips for aligning the heavy chain
Here are a couple of general tips for aligning loops in the
heavy chain, where the H2 can be a bit tricky. Align the H1 first, which
is normally quite easy, then
add deletions immediately before the CAR or CTR
which precedes the H3, until it is aligned. This is illustrated below.
EDSAVYFCARSGNYPYA----------MDYWG Known structure TAIYFCARLHYYGYNAYWG Your sequence2. Post "CAR" alignment: EDSAVYFCARSGNYPYA----------MDYWG Known structure TAIYF--CARLHYYGYNAYWG Your sequenceThen align the H3 loop correctly, so that the WG at its C terminus lines up: 3. Post H3 alignment: EDSAVYFCARSGNYPYA----------MDYWG Known structure TAIYF--CARLHYYGY-----------NAYWG Your sequenceand add deletions at the C terminal of the chain if necessary, referring to this important note . Then you should be left with none, two or three deletions before the CAR/CTR. Remove them, and insert them in the H2 loop (the Trp at residue 166 should now be aligned so it should be easy to find the start of the H2 loop; if you cannot, for example if the "WIG" motif is not present in your sequence, another way of doing it is to count 14 residues beyond the Trp at the end of H1) at the same position as the known H2 sequences with the same number of deletions (see the H2 examples above). The whole heavy chain should now be aligned; if not, you've done something wrong! 3a. Post H3, Pre H2 alignment: -----H2----- ---------H3--------- WVAYISNGGG--STYYPDTVK...EDSAVYFCARSGNYPYA----------MDYWG WIGQIYPGDGDNKYNGKFK...EDTAIYFCAR--LHYYGY-----------NAYWG Top: known structure; bottom: your sequence4. Post H2 alignment: -----H2----- ---------H3--------- WVAYISNGGG--STYYPDTVK...EDSAVYFCARSGNYPYA----------MDYWG WIGQIYPGDG--DNKYNGKFK...EDTAIYFCARLHYYGY-----------NAYWG Top: known structure; bottom: your sequenceIf there are a different number of deletions in H2 (i.e. not none, 2 or 3) it will be a non canonical loop anyway, so as long as they go in the H2 region it doesn't matter where.
Occasionally there are deletions in the framework. These are normally found at
the terminals of the light and heavy chains; use the aligned sequences text box
to look for
conserved residue patterns at the terminals and align your sequence accordingly.
Conserved (or almost-conserved)
motifs at the C-terminal include KRA for kappa light chains, LGQ
for lambda light chains and TVS for heavy chains. So, for example, if your
light chain ends in KR, you need to add a deletion.
An important thing to note
is that Vh sequences often specify an extra S on the C-terminal, i.e
TVSS. This second S should be removed when you submit your sequence
to WAM, as it is not needed. Do not replace it by a deletion,
though!
Last updated 22/11/01 |