Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000263Rosetta[All Projects] Input Handlingpublic2013-05-01 22:372014-01-03 12:42
Assigned Toamw579 
PlatformAll platformsOSAnyOS VersionAny
Product Version 
Fixed in VersionAll 
Summary0000263: does not recognize D-canonical amino acids properly
DescriptionWhen listing AAs other than the Big Twenty in resfile lines, you use notation like:


or something similar. This works fine for L and D NCAAs, whose names are their three letter codes. This does not work with D-CAAs. If you use the three letter code (i.e. DLE for D-leucine; DLY for D-lysine), quits because those aren't in the residue type set. Looking at the line in question, it'c clear why--it's asking for the name, not the three letter code.

Unfortunately, inputting e.g.


is equivalent to NATAA--it silently fails to ever design in the D amino acids. I suppose I'm not _certain_ that it's a problem with resfile parsing, but it seems likely.
Steps To ReproduceAttempt to perform some kind of design with D-CAAs using a resfile.
Tagsdesign, resfile
Application(s) AffectedConfirmed to affect oop_dock_design and hbs_dock_design; almost certain to affect homodimer_design, etc.
Command Line Used/scratch/amw579/rosetta/rosetta_source/bin/oop_dock_design.linuxgccrelease @flags -database /scratch/amw579/rosetta/rosetta_database/
Developer Options
Fixed in SVN Versiond16b60c9d6d
Attached Files? file icon flags [^] (272 bytes) 2013-05-01 22:37 [Show Content]

- Relationships

-  Notes
amw579 (Administrator)
2013-05-02 13:30

As far as I can tell, it looks like this would be resolved by having ResidueLevelTask::allow_noncanonical_aa to operate on the 3-letter IO_STRING of a D-CAA, since for L-CAAs, L-NCAAs, and D-NCAAs they are identical... or by making the params files for the D-CAAs have their names be the same as the 3-letter IO_STRINGs. I don't think we lose much by changing the params files in this way.
rmoretti (Attentive Developer)
2013-05-02 13:57

One issue you run into is that the 3-letter IO string is not necessarily unique. There could potentially be a number of different residue types with the same 3 letter IO string. Using the full residue name allows for better control over which one you're substituting.

There's a potential to adjust the NC resfile reader to have a fall-back to using 3-letter codes if it doesn't find the the token in the full list, but you may have problems if there are multiple (potentially patched) residue types with the same 3-letter code. You'll also might run into problems where the 3-letter code contains spaces - the resfile reader might not know where that's supposed to go, as the resfile syntax ignores whitespace.

All in all, the better solution to the problem is likely better documentation that the resfile is looking for residue type names rather than three letter codes, and better awareness that they're not necessarily going to be the same. - Changing the D-CAA residue type names might address this particular issue, but sidesteps the underlying cause.
amw579 (Administrator)
2013-05-02 14:21

In re: documentation, I don't think it's sufficient to say "please give the name, not the three letter code." Because in the status quo, _neither the name nor three letter code works_.

Giving the residue type name (i.e. DLEU) does nothing. It's equivalent to NATAA. I think when it's parsing it only grabs three characters, or something, and so it always fails a comparison to any residue names in the residue set. That's my inference; I haven't actually dug into the parser. So maybe fixing that parser would be sufficient.

Giving the three letter code fails because no residue type with that _name_ is in the residue type set.

I don't see why the 3-letter code is any more likely to contain spaces than the name, though that might be my personal misunderstanding.

Certainly, at the moment, changing the names of the 19 D-CAAs to be three characters long doesn't create name-name overlaps with anything. (After all, since all other names were identical to their 3-letter codes and all other 3-letter codes are unique, the three first letters of the D-CAAs were guaranteed to be unoccupied.)

I guess fixing the parser is the best long-term solution.
smlewis (Administrator)
2013-05-03 17:23
edited on: 2013-05-03 17:24

You did modify the residue type set to include the d-caas, right? (either residue_types.txt or -extra_res_fa?) This is in the spirit of "most likely problems first".

NC is not modifying tokens. allow_noncanonical_aa is explicitly looking for a name3.

smlewis (Administrator)
2013-05-03 17:28

Nevermind, I checked with my own toy problem and it duplicates, it's not -extra_res_fa
smlewis (Administrator)
2013-05-03 17:33

OK, so, the NC token does this:

    if ( residue_set.has_name( nc_to_include_ )){

but the packertask function does this:

            aaname == (**aas_iter).name3() )

We look for the name in the first place, and the name3 in the second. That's the bug.

I don't think it matters which way this code is unified (both to name3, or both to name), so long as it happens. I'd suggest changing NC to look for name3, not name, because otherwise I'm not sure how to get the logic in packertask to work.

Changing the dcaa's to have name3 == name will also bury the problem. (or, it did in my test).
smlewis (Administrator)
2013-05-03 17:34

ResidueTypeSet has has_name3, BTW. Try that first.
amw579 (Administrator)
2013-05-03 18:16

That makes sense; I wasn't sure where to look for the conflict so it's good you were able to find it in packertask.

Yeah, it's currently working for me when I change the param files so name3 == name, too.
smlewis (Administrator)
2013-05-03 18:19

To be super-clear, it's in the NC token command for the resfile reader, not the packer task. If you didn't find the right line, this function of

  PackerTask & task,
  Size resid
) const

  core::chemical::ResidueTypeSet const & residue_set = task.residue_task( resid ).get_original_residue_set();
  if ( residue_set.has_name( nc_to_include_ )){
    task.nonconst_residue_task(resid).allow_noncanonical_aa( nc_to_include_ );
  } else {
    std::stringstream err_msg;
    err_msg << "Unable to add non-canonical amino acid " << nc_to_include_ << " because it is not in the ResidueTypeSet for residue " << resid << ".";
}//end NC

change has_name to has_name3 and give it a shot.
amw579 (Administrator)
2013-05-03 19:54

This appears to work. Alas, I don't think I have SVN access yet (at least, it's timed out every time I've tried to log in thus far for some reason...) so I can only relate the fact that that's in fact an adequate solution on my end and maybe someone with real power can implement it.
smlewis (Administrator)
2013-05-04 20:20

SVN access? You know we switched the whole community to git on github last week, right? You are on rosetta-devel, right?
amw579 (Administrator)
2013-05-04 21:09

I knew that we were going to switch to git, and I remember from minirosettacon that it was around now, but I don't think anyone ever added me to rosetta-devel, maybe for the same reason (though god knows what that is) that I never got svn access.

Oh, well. Fresh start?
AndrewLeaverFay (Administrator)
2014-01-03 12:42

Adding a new description for a ResidueType: its interchangeability group.

    The packer will look at all the residues types in the a given interchangeability
    group to decide which ones to allow. This functionality replaces the previous
    "name3" string which is overloaded for PDB I/O and is unnecessarily restirctive
    in length.

    The interchangeability group can be specified in the residue topology file,
    but if it is not given, then it will be set from the IO_STRING field (the name3).

    New PatchOperation classes for working with interchangeability group data.
    Moving PatchOperation class definitions from the .hh file to the .cc file.

    The "NC" ResfileOperation now uses the interchangeability group to append
    non-canonical amino acids (and other not-canonical-amino-acid residue types)
    to the ResidueLevelTask.

    Updating the interchangeability groups for the residue types defined in the
    fullatom/residue_types/rosetta_specific directory (which previously all
    shared the name3 of "XXX").

    Updating the patch files that used the SET_IO_STRING command so that they
    now also update the interchangeability group.

- Issue History
Date Modified Username Field Change
2013-05-01 22:37 amw579 New Issue
2013-05-01 22:37 amw579 File Added: flags
2013-05-01 22:38 amw579 Tag Attached: design
2013-05-01 22:38 amw579 Tag Attached: resfile
2013-05-02 13:30 amw579 Note Added: 0000269
2013-05-02 13:57 rmoretti Note Added: 0000270
2013-05-02 14:21 amw579 Note Added: 0000271
2013-05-03 17:23 smlewis Note Added: 0000274
2013-05-03 17:24 smlewis Note Edited: 0000274 View Revisions
2013-05-03 17:28 smlewis Note Added: 0000275
2013-05-03 17:33 smlewis Note Added: 0000276
2013-05-03 17:34 smlewis Note Added: 0000277
2013-05-03 18:16 amw579 Note Added: 0000278
2013-05-03 18:19 smlewis Note Added: 0000279
2013-05-03 18:20 smlewis Assigned To => amw579
2013-05-03 18:20 smlewis Status new => assigned
2013-05-03 19:54 amw579 Note Added: 0000280
2013-05-04 20:20 smlewis Note Added: 0000281
2013-05-04 21:09 amw579 Note Added: 0000282
2014-01-03 12:42 AndrewLeaverFay Fixed in SVN Version => d16b60c9d6d
2014-01-03 12:42 AndrewLeaverFay Note Added: 0000324
2014-01-03 12:42 AndrewLeaverFay Status assigned => resolved
2014-01-03 12:42 AndrewLeaverFay Fixed in Version => All
2014-01-03 12:42 AndrewLeaverFay Resolution open => fixed

Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker