Wam logo
Home
Background
Submit!
Introduction
Commercial users
Autoalign
Manual alignment
Alignment help
WAMpredict
Download
History
Links
References
Help
CPAD home
WAM sequence submission

In order for your WAM run to work correctly, your sequences should be aligned with the known structures provided on the input page . Autoalign will do this for you in the majority of cases, but if your sequence contains non-standard features, you will need to do a manual alignment. How to do this is detailed below, but first, a word on the numbering system used for residues on the sequence submission section of the WAM site.

A word on numbering

A residue numbering scheme is used to label residues. This is shown above and below the known structures on the input page . It is an aligned numbering system, which is designed to account for deletions in CDRs and other parts of the sequence, so that a residue in a particular position - for example the first residue of the H3 loop - is numbered the same in all structures. A unified numbering scheme is used for both the light and the heavy chain, such that the light chain residues are numbered 1 to 117 and the heavy chain residues 118 to 249. This is detailed as below.

Fv fragment Start End
Light chain 1 117
Heavy chain 118 249
CDR-L1 24 40
CDR-L2 56 62
CDR-L3 95 105
CDR-H1 143 154
CDR-H2 169 180
CDR-H3 220 239

Thus, a loop is numbered the same whatever its length; and in alignments, deletions are denoted as '-'. You should use '-' when inserting deletions. See the aligned light and heavy chains on the WAM input page as an example.

Aligning the sequence

Two topics are covered: aligning the CDRs, and aligning the framework. In most cases, only the CDRs need to be aligned; however, lambda light chains have a framework deletion at residue 13, and occasionally deletions need to be added to compensate for missing residues at the beginning or the end of the sequence (see here). Note that you should only input the Fv section of each chain (VL and VH), up to the -KRA, -LGQ or -TVS motif (see here); do not enter any constant region sequence!

Aligning the CDRs

Most deletions will occur in the CDRs, so they are the important sections to get right in terms of alignment.

1. Determining the CDRs from conserved framework residues

The first stage in your alignment should be to determine which residues comprise the CDR. All CDRs are marked by highly conserved framework residues to either side, which can be used to determine this.

L1

The L1 loop is preceded by a conserved cysteine (C) at residue 23, and followed by a conserved tryptophan (W) at residue 41. The latter is usually followed by a tyrosine (Y) at residue 42. This is illustrated with the structure fragments below.

 
222                 444
123<------L-1------>123 Residue number

ITCSATSSVN-------YMHWFQ
ITCSASSSVS-------SLHWYQ
MTCSASSSVN-------YMYWYQ
ITCRASENIYS------NLAWYQ
ITCKASQDIRK------YLNWYQ
ITCSANALPNQ------YAYWYQ
MACRASSSVSST-----YLHWYQ
ISCSGTSSNIG----SSTVNWYQ
ISCTGSSSNIG---AGHNVKWYQ
ISCRASESVDDD--GNSFLHWYQ
ISCRSSQSLVHS-NGNTYLHWYL
MSCKSSQSLLNSGNQKNFLAWYQ
MSCKSSQSLLNSGNQKNYLTWYQ

L2

The L2 loop, almost always 7 residues in length, is preceded by a tyrosine (Y) at residue 55 and followed by a glycine (G) at residue 63. Due to its conservation, it rarely needs to be aligned.

L3

The L3 loop is preceded by a conserved cysteine (C) at residue 94 and followed by a conserved Phe-Gly (FG) pair at residues 106 and 107. This is illustrated below.



              111
999           000
234<---L-3--->678 Residue number

YYCQQRSS--YPITFGS
YYCQQWTY--PLITFGA
YYCQQWGR--NP-TFGG
YYCQHFWG--TPYTFGG
YYCLQHGE--SPYTFGG
YYCQAWDN--SASIFGG
YYCQQYSG--YPLTFGA
YYCAAWDVSLNAYVFGT
YYCQSYDR--SLRVFGG
YYCQQSNE--DPLTFGA
YFCSQSTH--VPLTFGA
YYCQNDHS--YPLTFGA
YYCQNDYS--YPLTFGA

H1

The H1 loop is preceded, 4 residues before the first residue, by a conserved cysteine (C) at residue 139 and followed by a conserved Trp-Val (WV) pair at residues 155 and 156. In a few cases (e.g. the first two examples below) there are two tryptophans in this region, and the V can be replaced by I which can confuse things - if so, look out for a motif like WIRQ, WVRQ, WVKQ - i.e. a charged residue two after the W and then a hydrophilic residue (normally Q).
This is shown below.


1111            1111
3444            5555
9012<----H1---->5678 Residue number

CKATGYTFSE--YWIEWVKE
CKATGYTFST--YWIEWVKQ
CTVSGFSLTG--YGVNWVRQ
CATSGFTFTD--YYMSWVRQ
CKASGFNIKD--YYMHWVKQ
CAASGFDFSK--YWMSWVRQ
CATSGFTFSD--YYMYWVRQ
CKASGYAFSS--FWVNWVKQ
CATSGFTFSD--FYMEWVRQ
CAASGYSITS-DYAWNWIRQ

Here the last example has an I rather than the normal V; and the first two examples have two tryptophans. The second one is the correct one to look for (155); the first one (152) is within the loop.

H2

The H2 loop is preceded, 3 residues before the first residue, by a conserved tryptophan (W) - often part of a motif WIG - at residue 166 and followed by a conserved tyrosine (Y) at residue 181. Sometimes more than 1 Y appear in the sequence around this point but this one can be identified because it is normally followed 5 residues later by a lysine (K) at residue 186. However it can mean that aligning the H2 is difficult in some cases. See here for hints on aligning the heavy chain.
Some example H2 loops are shown below.


111            111111
666            888888
678<----H2---->123456 Residue number

WIGEILPGSG--RTNYREKFK
WIGEILPGSG--STYYNEKFK
WLGMIWGDG---NTDYNSALK
WLGFIRNKADGYTTEYSASVK
WIGLIDPENG--NTIYDPKFQ
WIGEIHPDSG--TINYTPSLK
WVAYISNGGG--STYYPDTVK
WIGQIYPGDG--DNKYNGKFK
WIAASRNKGNKYTTEYSASVK
WVSGVFGSGG--NTDYADAVK

N.B. Don't get the lysine at position 186 mixed up with occasional lysines at 184!

H3

The H3 loop is preceded, 3 residues before the first residue, by a conserved cysteine (C) at residue 217 - this typically forms part of a CAR or CTR motif. The loop is followed by a conserved Trp-Gly (WG) pair at residues 240 and 241.


222                    222
111                    444
789<--------H-3------->012 Residue number

CTRGYSS-------------MDYWGQ
CARGDGN-------------YGYWGQ
CARERDYR------------LDYWGQ
CTRDPYGP------------AAYWGQ
CARDNSYY------------FDYWGQ
CARLHYYGY-----------NAYWGQ
CARHGGYYA-----------MDYWGQ
CARSGNYPYA----------MDYWGQ
CARNYYGSTWY---------FDVWGA
CAKHRVSYVLTG--------FDSWGQ

2. Aligning the loops themselves

Once the sequence representing the CDR has been determined, deletions will need to be added to the sequence in most cases. It is important that, for a given length loop, these are in the correct place in the loop. Work out first how many deletions are needed by looking for loops of corresponding length to yours in the aligned sequences above, and add the deletions ('-') in the corresponding place. See the tips below for help with the heavy chain, which can be difficult to align. If there is no matching length loop to yours, the deletions can be added anywhere - as long as they're within that loop!

3. General tips for aligning the heavy chain

Here are a couple of general tips for aligning loops in the heavy chain, where the H2 can be a bit tricky. Align the H1 first, which is normally quite easy, then add deletions immediately before the CAR or CTR which precedes the H3, until it is aligned. This is illustrated below.

1. Pre "CAR" alignment:



EDSAVYFCARSGNYPYA----------MDYWG Known structure
TAIYFCARLHYYGYNAYWG           Your sequence

2. Post "CAR" alignment:


EDSAVYFCARSGNYPYA----------MDYWG Known structure
TAIYF--CARLHYYGYNAYWG         Your sequence

Then align the H3 loop correctly, so that the WG at its C terminus lines up:

3. Post H3 alignment:


EDSAVYFCARSGNYPYA----------MDYWG Known structure
TAIYF--CARLHYYGY-----------NAYWG Your sequence

and add deletions at the C terminal of the chain if necessary, referring to this important note .
Then you should be left with none, two or three deletions before the CAR/CTR. Remove them, and insert them in the H2 loop (the Trp at residue 166 should now be aligned so it should be easy to find the start of the H2 loop; if you cannot, for example if the "WIG" motif is not present in your sequence, another way of doing it is to count 14 residues beyond the Trp at the end of H1) at the same position as the known H2 sequences with the same number of deletions (see the H2 examples above). The whole heavy chain should now be aligned; if not, you've done something wrong!

3a. Post H3, Pre H2 alignment:


   -----H2-----                   ---------H3---------

WVAYISNGGG--STYYPDTVK...EDSAVYFCARSGNYPYA----------MDYWG
WIGQIYPGDGDNKYNGKFK...EDTAIYFCAR--LHYYGY-----------NAYWG

Top: known structure; bottom: your sequence

4. Post H2 alignment:



   -----H2-----                   ---------H3---------

WVAYISNGGG--STYYPDTVK...EDSAVYFCARSGNYPYA----------MDYWG
WIGQIYPGDG--DNKYNGKFK...EDTAIYFCARLHYYGY-----------NAYWG 

Top: known structure; bottom: your sequence

If there are a different number of deletions in H2 (i.e. not none, 2 or 3) it will be a non canonical loop anyway, so as long as they go in the H2 region it doesn't matter where.

Framework deletions

Occasionally there are deletions in the framework. These are normally found at the terminals of the light and heavy chains; use the aligned sequences text box to look for conserved residue patterns at the terminals and align your sequence accordingly. Conserved (or almost-conserved) motifs at the C-terminal include KRA for kappa light chains, LGQ for lambda light chains and TVS for heavy chains. So, for example, if your light chain ends in KR, you need to add a deletion. An important thing to note is that Vh sequences often specify an extra S on the C-terminal, i.e TVSS. This second S should be removed when you submit your sequence to WAM, as it is not needed. Do not replace it by a deletion, though!
Another thing to watch for is a deletion at residue 13 in lambda light chains.

Last updated 22/11/01