This page details a number of tentative sequence-structure rules for CDR-H3s which can be optionally used in WAM to improve accuracy in certain cases. In general, they relate to the RMSD screen (as that is the screen which compares each model to known H3 structures), although for 12-residue H3s they apply whichever screen is used (see here ). Rules extend on the section on H3 structure ; but to be included here some backup evidence must have been obtained. Note that the residue numbering used here differs from the "AbM" numbering used in the input and alignment page; it is of the format ZZ:N+x or ZZ:C-y where ZZ is the loop identity, x is the displacement from the loop N terminus and y the displacement backwards from the C terminus. At present, there are a number of rules, relating to 5,8,10,11 and 12 residue H3s. The rules for 5 residue H3s, in particular, are given with a reasonable amount of confidence.:
5-residue H3 structureDue to the small size, 5-residue H3s do not show well-defined "kinked" or "extended" C-terminals. Nevertheless, they do show either the "kinked" H3:C-2 carbonyl/Trp H3:C+1 sidechain interaction, or, in one case (1nld), the "extended" Asp H3:C-1 sidechain/Trp H3:C+1 sidechain interaction. Furthermore, unlike the longer loops, they separate out into reasonably well-defined structural subgroups which can be determined by sequence. These will be detailed shortly here, but for the moment it should be noted that they are sufficiently well-defined to recommend keeping the sequence-structure option "on" when modelling 5-residue H3s using the RMSD screen as the final screen. The RMS deviations between the structures are shown below.
a) No Asp at H3:C-1 This, together with b), are the rules which, for longer loops, define a "kinked" structure. Three structures, 1ggi, 1ind, and 1aj7, show this sequence pattern and, as the table shows, all are close in conformation. 1ind is a little different, as it is perturbed slightly by an interaction between His H3:C and Ser H3:N-1. Two further structures, 1psk and 3fct, show this conformation
despite having an Asp at this position. 3fct is discussed below; in
1psk, the Lys at H3:N takes the place of the Arg in forming the salt bridge
with the Asp. b) Asp at H3:C-1 and Arg at H3:N-1 Three structures, 1nld, 1e4x (a new structure, not shown) and 3fct, fit this alternative "kinked" definition. However, while 3fct forms a group a) structure, 1nld and 1e4x do not and have a more "extended" structure, with the typical "extended" Asp H3:C-1/Trp H3:C+1 interaction formed. The most likely reason is due to the small size of the loop: it is difficult for the Arg H3:N-1 to form the salt bridge with the Asp, and so an alternative conformation is formed. Quite why 3fct shows a group a) structure is unclear; it may be that the second interaction between Arg H3:N and Asp H3:N+1 "tips the balance" in favour of the "kinked" interaction pattern, unlike 1nld. c) Asp at H3:C-1 and no Arg at H3:N-1 plus Lys at L2:N-1The Asp and no Arg sequence pattern would normally result in the "extended" interaction pattern (see above). However, in the two 5-residue structures (1gpo and 1dqq) with this pattern, the "kinked" interaction of the H3:C-2 carbonyl with the Trp H3:C+1 is shown, as the Asp salt-bridges with Lys L2:N-1. Lys is not always seen at this position, and, indeed, the 5-residue H3 with an Asp at H3:C-1 and no Arg at H3:N-1, and no Lys or Arg at L2:N-1 does form an extended structure (see below). More support for this rule comes from a low-resolution structure, 3hfm, which has an Asp at H3:C-1 and a Lys at L2:N-1. This shows the same conformational class as 1gpo and 1dqq. d) Asp at H3:C-1 and no Arg at H3:N=1, no Lys at L2:N-1 There is just one example (1cr9), and this forms the extended structure (see above). Conclusion On the evidence so far, it can be said with reasonable confidence that structural class c) above will lead to the conformation shown in 1gpo, 1dqq and 3hfm, and that structural class a) will lead to the conformation shown in 1ggi, 1ind and 1aj7, with a slight perturbation if the H-bond present in 1ind can be formed. Due to the problem of 3fct, the likelihood of structural class b) forming the conformation shown in 1nld and 1e4x is rather less, but we can say for the moment that as long as there is not a positively charged residue at H3:N, the conformation will form. Effect on WAM The sequence-structure option on the submission page has been extended to cover 5-residue H3s. If a given 5-residue H3 fits one of the above sequence rules, then the RMSD screen will compare the generated models with an average of the known structures fitting that rule rather than the whole set. For group a) we break the group into two: if (like 1ind) there is a His at H3:C and a Ser, Asn or Thr at H3:N-1, the generated models will be compared against 1ind; if not, we compare against 1aj7, 1ggi and 1psk. Tryptophan as first residue in 8-residue H3sIn a recent paper, the Shirai group have studied the role of the first residue of CDR-H3 loops by comparing the structure of two eight residue loops: one beginning with Trp (1kem: WGSYAMDY) and one not (1plg: GGKFAMDY). The general conclusion was that the Trp was important in influencing the H3 conformation: our own investigations also support this, as examination with molecular graphics software clearly shows the Trp in 1kem forcing the loop into an upright conformation. Effect on WAMIn view of the findings on Trp as the first N-terminal residue, it has been decided to alter the RMSD screen accordingly. Rather than compare each CAMAL generated conformation with the average of the known 8-residue CDR-H3 structures, they are compared with the structure of 1kem instead. Accordingly, 1kem is removed from the list of 8-residue H3s used to generate the average conformation when screening sequences without a Trp at the first residue. Aspartate as first residue in 8-residue H3sWith the solution of a new structure (1jpt), it has become clear that even within the close group of 8-residue H3s, there is a group particularly close in conformation, namely 1jpt along with 1fgn and 1mim. All three have unfitted H3/H3 RMSDs below 1.0A. Examination of the sequence has shed some light on this: all three have an aspartate at H3:N, which hydrogen-bonds to backbone H atoms at H3:N+1 and H3:N+2. One other 8-residue structure (1fl5) also shows aspartate at H3:N; however, unlike the other three there is a Ser at H1:C which forms an alternative hydrogen bond. Effect on WAMSince this interaction has been observed in three structures, it is now offered as an option in the RMSD screen in WAM. If the structure being modelled exhibits an aspartate at H3:N, and any residue but Ser and the similar Thr and Asn at H1:C, it will be compared against 1fgn, 1mim and 1jpt rather than the full main group of 8-residue kinked H3s. Glycine as first residue in 8-residue H3sTwo structures, 1plg and 1cic, are very similar in conformation, each having a "weak kink". Both also show Gly at H3:N. Kim et al, through the results of dynamics simulations, have speculated that Gly at the N-terminal (common to both) could be important; our own investigations suggest that this could indeed be true for 8 and 9 residue loops, as the 9-residue structure 1mfb also has a "weak" kink and is the only example with Gly at the N-terminal. Furthermore, the 8-residue H3 loop 1b2w is similar in conformation to 1plg and 1cic, if not quite as similar as they are to each other (2.0 angstrom RMSD from 1plg; 1.5 angstrom RMSD from 1cic), and it too has Gly at H3:N and a "weak" kink, although its apex differs somewhat, due to the large Trp H3:N+4 sidechain and the Pro H3:N+3. However, glycines at the N-terminal for other lengths do not exhibit "weak" kinks; it thus appears to be a steric effect (due to Gly allowing more freedom in the backbone than other residue types), dependent on loop length. Effect on WAMSince this interaction has been observed in two structures, it is now offered as an option in the RMSD screen in WAM. If the structure being modelled exhibits a glycine at H3:N, it will be compared against 1plg and 1cic rather than the full main group of 8-residue kinked H3s. Arg/His interaction in 10 residue H3sWith the solution of new structures (1jgu and 1c1e), it has become clear that even within the close group of 10-residue H3s, there is a group particularly close in conformation, namely 1a4j, 1jgu, 1c1e and 1dba. All members of this group are within 1.0A unfitted H3/H3 RMSD from each other. Examination of the sequence has suggested a reason for the closeness in conformation. All except 1dba have an Arg at H3:C-4, which packs against His at L1:N+7. 1dba has a Trp at H3:C-4, but this unfortunately does not pack against the His so one cannot be totally confident of this rule. Effect on WAMSince this interaction has been observed in three structures, it is now offered as an option in the RMSD screen in WAM. If the structure being modelled exhibits the appropriate Arg and His residues, it will be compared against 1a4j, 1jgu and 1c1e rather than the full group. The confidence in this rule is somewhat lower than for the others here, due to 1dba, but, like the others, is offered as an option. 11-residue H3 loops : a common structureWith the solving of a new 11-residue H3 structure (PDB code 1ejo), a possible canonical-type structure can be defined. This is very similar (1.3 angstrom RMSD) to another 11-residue H3 (1fpt), and both are moderately similar (1.8 angstrom RMSD) to a third, 1frg. This compares to a typical RMSD of 2 to 2.5A amongst the more diffuse "main group of 11 residue H3 structures" on the H3 structure page. On considering the sequences: 1ejo RAFDSDVGFAS 1fpt DFYDYDVGFDY 1frg RERYDEKGFAY we can see that both 1ejo and 1fpt contain the motif VGF from residue H3:C-4 to residue H3:C-2. Examination of different 11-residue H3 structures on Insight-II reveals that the residue H3:C-4 is normally buried, while the residue H3:C-3 (normally non-Gly, and often large) sticks upwards. In 1ejo and 1fpt, Val H3:C-4 instead sticks upwards, packing against the Tyr immediately before the L2, and occupying the space left by the Gly at H3:C-3. This appears to influence the loop conformation. In 1frg, there is a Lys rather than Val, but similar behaviour is observed (if not exactly the same conformation due to the differing properties of Lys compared to Val). Effect on WAMThe sequence-structure option on the submission page has been extended to cover 11-residue kinked H3s. If the VGF motif is shown, the RMSD screen will compare the generated models with an average of the structures of 1ejo and 1fpt rather than the full 11-residue set, as long as the sequence-structure rules option on the submission page is selected. 12-residue H3 loops : a common N-terminal structureWith the solving of a new 12-residue H3 structure (PDB code 1dee), some possible sequence-structure rules for certain instances of 12-residue H3s, in particular for the N-terminal, can be postulated. Examination on Insight-II has revealed that 1dee, 1osp and 1igm all share a particular conformation of the N-terminal 4 residues. 1dee and 1osp are similar at the apex, whereas 1igm differs. Additionally, 1igm and 1vge, although they exhibit differing N-terminal conformations, share a similar conformation in the remainder of the loop from the apex to the C-terminal. On considering the sequences:1dee VKFYDPTAPNDY 1osp SRDYYGSSGFAF 1igm HRVSYVLTGFDS 1vge DPYGGGKSEFDYwe can see that all three of 1dee, 1osp and 1igm contain an Arg or Lys at H3:N+1. In all three cases, this interacts with a Glu or Gln at L2:C-1, and it is this interaction which defines the conformation of the N-terminal four residues. The interaction defining the common apex-to-kink conformation in 1igm and 1vge is rather less well defined. It is best seen in 1igm where the Val at H3:N+5 packs with hydrophobics at L3:C-3 and L3:C-1. The only other 12-residue H3 structure with hydrophobics at both these L3 positions is 1vge, which has a Gly at H3:N+5 rather than Val. Nonetheless the L3 residues appear to influence the backbone. This is thus rather speculative, and is limited to 12 residue H3s as no other H3 length exhibits a conformation dependent on the L3:C-3 and C-1 residues. Effect on WAMAt present, only the first of the two interactions is included in the sequence-structure rule option on the submission page as it appears to be clear-cut. Since it applies to the N-terminal only, to use it at the RMSD screen stage would be ill-advised as it would weight the results too heavily towards similarity at the N-terminal when the apex needs to be considered too. This is different from the approach taken for 5, 8, 10 and 11 residue loops, where the rules apply to the whole CDR and thus the RMSD screen can be safely used. Therefore the screening is done instead at the database search stage, by using N-terminal C-alpha to C-alpha distance constraints derived only from 1dee, 1igm and 1osp, so that the database search selects conformations with a similar N-terminal to these three structures if the sequence/structure rules are matched. Since apices vary considerably in these long loops, the VFF (energy) screen is the recommended final screen. Last updated 21/10/02 |