MantisBT - Rosetta
View Issue Details
0000263Rosetta[All Projects] Input Handlingpublic2013-05-01 22:372014-01-03 12:42
All platformsAnyAny
Confirmed to affect oop_dock_design and hbs_dock_design; almost certain to affect homodimer_design, etc.
/scratch/amw579/rosetta/rosetta_source/bin/oop_dock_design.linuxgccrelease @flags -database /scratch/amw579/rosetta/rosetta_database/
0000263: does not recognize D-canonical amino acids properly
When listing AAs other than the Big Twenty in resfile lines, you use notation like:


or something similar. This works fine for L and D NCAAs, whose names are their three letter codes. This does not work with D-CAAs. If you use the three letter code (i.e. DLE for D-leucine; DLY for D-lysine), quits because those aren't in the residue type set. Looking at the line in question, it'c clear why--it's asking for the name, not the three letter code.

Unfortunately, inputting e.g.


is equivalent to NATAA--it silently fails to ever design in the D amino acids. I suppose I'm not _certain_ that it's a problem with resfile parsing, but it seems likely.
Attempt to perform some kind of design with D-CAAs using a resfile.
design, resfile
? flags (272) 2013-05-01 22:37
Issue History
2013-05-01 22:37amw579New Issue
2013-05-01 22:37amw579File Added: flags
2013-05-01 22:38amw579Tag Attached: design
2013-05-01 22:38amw579Tag Attached: resfile
2013-05-02 13:30amw579Note Added: 0000269
2013-05-02 13:57rmorettiNote Added: 0000270
2013-05-02 14:21amw579Note Added: 0000271
2013-05-03 17:23smlewisNote Added: 0000274
2013-05-03 17:24smlewisNote Edited: 0000274bug_revision_view_page.php?bugnote_id=274#r79
2013-05-03 17:28smlewisNote Added: 0000275
2013-05-03 17:33smlewisNote Added: 0000276
2013-05-03 17:34smlewisNote Added: 0000277
2013-05-03 18:16amw579Note Added: 0000278
2013-05-03 18:19smlewisNote Added: 0000279
2013-05-03 18:20smlewisAssigned To => amw579
2013-05-03 18:20smlewisStatusnew => assigned
2013-05-03 19:54amw579Note Added: 0000280
2013-05-04 20:20smlewisNote Added: 0000281
2013-05-04 21:09amw579Note Added: 0000282
2014-01-03 12:42AndrewLeaverFayFixed in SVN Version => d16b60c9d6d
2014-01-03 12:42AndrewLeaverFayNote Added: 0000324
2014-01-03 12:42AndrewLeaverFayStatusassigned => resolved
2014-01-03 12:42AndrewLeaverFayFixed in Version => All
2014-01-03 12:42AndrewLeaverFayResolutionopen => fixed

2013-05-02 13:30   
As far as I can tell, it looks like this would be resolved by having ResidueLevelTask::allow_noncanonical_aa to operate on the 3-letter IO_STRING of a D-CAA, since for L-CAAs, L-NCAAs, and D-NCAAs they are identical... or by making the params files for the D-CAAs have their names be the same as the 3-letter IO_STRINGs. I don't think we lose much by changing the params files in this way.
2013-05-02 13:57   
One issue you run into is that the 3-letter IO string is not necessarily unique. There could potentially be a number of different residue types with the same 3 letter IO string. Using the full residue name allows for better control over which one you're substituting.

There's a potential to adjust the NC resfile reader to have a fall-back to using 3-letter codes if it doesn't find the the token in the full list, but you may have problems if there are multiple (potentially patched) residue types with the same 3-letter code. You'll also might run into problems where the 3-letter code contains spaces - the resfile reader might not know where that's supposed to go, as the resfile syntax ignores whitespace.

All in all, the better solution to the problem is likely better documentation that the resfile is looking for residue type names rather than three letter codes, and better awareness that they're not necessarily going to be the same. - Changing the D-CAA residue type names might address this particular issue, but sidesteps the underlying cause.
2013-05-02 14:21   
In re: documentation, I don't think it's sufficient to say "please give the name, not the three letter code." Because in the status quo, _neither the name nor three letter code works_.

Giving the residue type name (i.e. DLEU) does nothing. It's equivalent to NATAA. I think when it's parsing it only grabs three characters, or something, and so it always fails a comparison to any residue names in the residue set. That's my inference; I haven't actually dug into the parser. So maybe fixing that parser would be sufficient.

Giving the three letter code fails because no residue type with that _name_ is in the residue type set.

I don't see why the 3-letter code is any more likely to contain spaces than the name, though that might be my personal misunderstanding.

Certainly, at the moment, changing the names of the 19 D-CAAs to be three characters long doesn't create name-name overlaps with anything. (After all, since all other names were identical to their 3-letter codes and all other 3-letter codes are unique, the three first letters of the D-CAAs were guaranteed to be unoccupied.)

I guess fixing the parser is the best long-term solution.
2013-05-03 17:23   
(edited on: 2013-05-03 17:24)
You did modify the residue type set to include the d-caas, right? (either residue_types.txt or -extra_res_fa?) This is in the spirit of "most likely problems first".

NC is not modifying tokens. allow_noncanonical_aa is explicitly looking for a name3.

2013-05-03 17:28   
Nevermind, I checked with my own toy problem and it duplicates, it's not -extra_res_fa
2013-05-03 17:33   
OK, so, the NC token does this:

    if ( residue_set.has_name( nc_to_include_ )){

but the packertask function does this:

            aaname == (**aas_iter).name3() )

We look for the name in the first place, and the name3 in the second. That's the bug.

I don't think it matters which way this code is unified (both to name3, or both to name), so long as it happens. I'd suggest changing NC to look for name3, not name, because otherwise I'm not sure how to get the logic in packertask to work.

Changing the dcaa's to have name3 == name will also bury the problem. (or, it did in my test).
2013-05-03 17:34   
ResidueTypeSet has has_name3, BTW. Try that first.
2013-05-03 18:16   
That makes sense; I wasn't sure where to look for the conflict so it's good you were able to find it in packertask.

Yeah, it's currently working for me when I change the param files so name3 == name, too.
2013-05-03 18:19   
To be super-clear, it's in the NC token command for the resfile reader, not the packer task. If you didn't find the right line, this function of

  PackerTask & task,
  Size resid
) const

  core::chemical::ResidueTypeSet const & residue_set = task.residue_task( resid ).get_original_residue_set();
  if ( residue_set.has_name( nc_to_include_ )){
    task.nonconst_residue_task(resid).allow_noncanonical_aa( nc_to_include_ );
  } else {
    std::stringstream err_msg;
    err_msg << "Unable to add non-canonical amino acid " << nc_to_include_ << " because it is not in the ResidueTypeSet for residue " << resid << ".";
}//end NC

change has_name to has_name3 and give it a shot.
2013-05-03 19:54   
This appears to work. Alas, I don't think I have SVN access yet (at least, it's timed out every time I've tried to log in thus far for some reason...) so I can only relate the fact that that's in fact an adequate solution on my end and maybe someone with real power can implement it.
2013-05-04 20:20   
SVN access? You know we switched the whole community to git on github last week, right? You are on rosetta-devel, right?
2013-05-04 21:09   
I knew that we were going to switch to git, and I remember from minirosettacon that it was around now, but I don't think anyone ever added me to rosetta-devel, maybe for the same reason (though god knows what that is) that I never got svn access.

Oh, well. Fresh start?
2014-01-03 12:42   
Adding a new description for a ResidueType: its interchangeability group.

    The packer will look at all the residues types in the a given interchangeability
    group to decide which ones to allow. This functionality replaces the previous
    "name3" string which is overloaded for PDB I/O and is unnecessarily restirctive
    in length.

    The interchangeability group can be specified in the residue topology file,
    but if it is not given, then it will be set from the IO_STRING field (the name3).

    New PatchOperation classes for working with interchangeability group data.
    Moving PatchOperation class definitions from the .hh file to the .cc file.

    The "NC" ResfileOperation now uses the interchangeability group to append
    non-canonical amino acids (and other not-canonical-amino-acid residue types)
    to the ResidueLevelTask.

    Updating the interchangeability groups for the residue types defined in the
    fullatom/residue_types/rosetta_specific directory (which previously all
    shared the name3 of "XXX").

    Updating the patch files that used the SET_IO_STRING command so that they
    now also update the interchangeability group.