MantisBT - Rosetta |
View Issue Details |
|
ID | Project | Category | View Status | Date Submitted | Last Update |
0000263 | Rosetta | [All Projects] Input Handling | public | 2013-05-01 22:37 | 2014-01-03 12:42 |
|
Reporter | amw579 | |
Assigned To | amw579 | |
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | |
Platform | All platforms | OS | Any | OS Version | Any |
Product Version | | |
Fixed in Version | All | |
Application(s) Affected | Confirmed to affect oop_dock_design and hbs_dock_design; almost certain to affect homodimer_design, etc. |
Command Line Used | /scratch/amw579/rosetta/rosetta_source/bin/oop_dock_design.linuxgccrelease @flags -database /scratch/amw579/rosetta/rosetta_database/ |
Developer Options | |
Fixed in SVN Version | d16b60c9d6d |
|
Summary | 0000263: ResfileReader.cc does not recognize D-canonical amino acids properly |
Description | When listing AAs other than the Big Twenty in resfile lines, you use notation like:
20 A EMPTY NC APA NC MAL NC A06
or something similar. This works fine for L and D NCAAs, whose names are their three letter codes. This does not work with D-CAAs. If you use the three letter code (i.e. DLE for D-leucine; DLY for D-lysine), ResfileReader.cc quits because those aren't in the residue type set. Looking at the line in question, it'c clear why--it's asking for the name, not the three letter code.
Unfortunately, inputting e.g.
20 A EMPTY NC DVAL NC DLYS NC DLEU
is equivalent to NATAA--it silently fails to ever design in the D amino acids. I suppose I'm not _certain_ that it's a problem with resfile parsing, but it seems likely. |
Steps To Reproduce | Attempt to perform some kind of design with D-CAAs using a resfile. |
Additional Information | |
Tags | design, resfile |
Relationships | |
Attached Files | flags (272) 2013-05-01 22:37 https://carbon.structbio.vanderbilt.edu/mantisbt/file_download.php?file_id=29&type=bug |
|
Issue History |
Date Modified | Username | Field | Change |
2013-05-01 22:37 | amw579 | New Issue | |
2013-05-01 22:37 | amw579 | File Added: flags | |
2013-05-01 22:38 | amw579 | Tag Attached: design | |
2013-05-01 22:38 | amw579 | Tag Attached: resfile | |
2013-05-02 13:30 | amw579 | Note Added: 0000269 | |
2013-05-02 13:57 | rmoretti | Note Added: 0000270 | |
2013-05-02 14:21 | amw579 | Note Added: 0000271 | |
2013-05-03 17:23 | smlewis | Note Added: 0000274 | |
2013-05-03 17:24 | smlewis | Note Edited: 0000274 | bug_revision_view_page.php?bugnote_id=274#r79 |
2013-05-03 17:28 | smlewis | Note Added: 0000275 | |
2013-05-03 17:33 | smlewis | Note Added: 0000276 | |
2013-05-03 17:34 | smlewis | Note Added: 0000277 | |
2013-05-03 18:16 | amw579 | Note Added: 0000278 | |
2013-05-03 18:19 | smlewis | Note Added: 0000279 | |
2013-05-03 18:20 | smlewis | Assigned To | => amw579 |
2013-05-03 18:20 | smlewis | Status | new => assigned |
2013-05-03 19:54 | amw579 | Note Added: 0000280 | |
2013-05-04 20:20 | smlewis | Note Added: 0000281 | |
2013-05-04 21:09 | amw579 | Note Added: 0000282 | |
2014-01-03 12:42 | AndrewLeaverFay | Fixed in SVN Version | => d16b60c9d6d |
2014-01-03 12:42 | AndrewLeaverFay | Note Added: 0000324 | |
2014-01-03 12:42 | AndrewLeaverFay | Status | assigned => resolved |
2014-01-03 12:42 | AndrewLeaverFay | Fixed in Version | => All |
2014-01-03 12:42 | AndrewLeaverFay | Resolution | open => fixed |
Notes |
|
(0000269)
|
amw579
|
2013-05-02 13:30
|
|
As far as I can tell, it looks like this would be resolved by having ResidueLevelTask::allow_noncanonical_aa to operate on the 3-letter IO_STRING of a D-CAA, since for L-CAAs, L-NCAAs, and D-NCAAs they are identical... or by making the params files for the D-CAAs have their names be the same as the 3-letter IO_STRINGs. I don't think we lose much by changing the params files in this way. |
|
|
|
One issue you run into is that the 3-letter IO string is not necessarily unique. There could potentially be a number of different residue types with the same 3 letter IO string. Using the full residue name allows for better control over which one you're substituting.
There's a potential to adjust the NC resfile reader to have a fall-back to using 3-letter codes if it doesn't find the the token in the full list, but you may have problems if there are multiple (potentially patched) residue types with the same 3-letter code. You'll also might run into problems where the 3-letter code contains spaces - the resfile reader might not know where that's supposed to go, as the resfile syntax ignores whitespace.
All in all, the better solution to the problem is likely better documentation that the resfile is looking for residue type names rather than three letter codes, and better awareness that they're not necessarily going to be the same. - Changing the D-CAA residue type names might address this particular issue, but sidesteps the underlying cause. |
|
|
(0000271)
|
amw579
|
2013-05-02 14:21
|
|
In re: documentation, I don't think it's sufficient to say "please give the name, not the three letter code." Because in the status quo, _neither the name nor three letter code works_.
Giving the residue type name (i.e. DLEU) does nothing. It's equivalent to NATAA. I think when it's parsing it only grabs three characters, or something, and so it always fails a comparison to any residue names in the residue set. That's my inference; I haven't actually dug into the parser. So maybe fixing that parser would be sufficient.
Giving the three letter code fails because no residue type with that _name_ is in the residue type set.
I don't see why the 3-letter code is any more likely to contain spaces than the name, though that might be my personal misunderstanding.
Certainly, at the moment, changing the names of the 19 D-CAAs to be three characters long doesn't create name-name overlaps with anything. (After all, since all other names were identical to their 3-letter codes and all other 3-letter codes are unique, the three first letters of the D-CAAs were guaranteed to be unoccupied.)
I guess fixing the parser is the best long-term solution. |
|
|
(0000274)
|
smlewis
|
2013-05-03 17:23
(edited on: 2013-05-03 17:24) |
|
You did modify the residue type set to include the d-caas, right? (either residue_types.txt or -extra_res_fa?) This is in the spirit of "most likely problems first".
NC is not modifying tokens. allow_noncanonical_aa is explicitly looking for a name3.
|
|
|
|
Nevermind, I checked with my own toy problem and it duplicates, it's not -extra_res_fa |
|
|
|
OK, so, the NC token does this:
if ( residue_set.has_name( nc_to_include_ )){
but the packertask function does this:
aaname == (**aas_iter).name3() )
We look for the name in the first place, and the name3 in the second. That's the bug.
I don't think it matters which way this code is unified (both to name3, or both to name), so long as it happens. I'd suggest changing NC to look for name3, not name, because otherwise I'm not sure how to get the logic in packertask to work.
Changing the dcaa's to have name3 == name will also bury the problem. (or, it did in my test). |
|
|
|
ResidueTypeSet has has_name3, BTW. Try that first. |
|
|
(0000278)
|
amw579
|
2013-05-03 18:16
|
|
That makes sense; I wasn't sure where to look for the conflict so it's good you were able to find it in packertask.
Yeah, it's currently working for me when I change the param files so name3 == name, too. |
|
|
|
To be super-clear, it's in the NC token command for the resfile reader, not the packer task. If you didn't find the right line, this function of ResfileReader.cc:
void
NC::residue_action(
PackerTask & task,
Size resid
) const
{
core::chemical::ResidueTypeSet const & residue_set = task.residue_task( resid ).get_original_residue_set();
if ( residue_set.has_name( nc_to_include_ )){
task.nonconst_residue_task(resid).allow_noncanonical_aa( nc_to_include_ );
} else {
std::stringstream err_msg;
err_msg << "Unable to add non-canonical amino acid " << nc_to_include_ << " because it is not in the ResidueTypeSet for residue " << resid << ".";
onError(err_msg.str());
}
}//end NC
change has_name to has_name3 and give it a shot. |
|
|
(0000280)
|
amw579
|
2013-05-03 19:54
|
|
This appears to work. Alas, I don't think I have SVN access yet (at least, it's timed out every time I've tried to log in thus far for some reason...) so I can only relate the fact that that's in fact an adequate solution on my end and maybe someone with real power can implement it. |
|
|
|
SVN access? You know we switched the whole community to git on github last week, right? You are on rosetta-devel, right? |
|
|
(0000282)
|
amw579
|
2013-05-04 21:09
|
|
I knew that we were going to switch to git, and I remember from minirosettacon that it was around now, but I don't think anyone ever added me to rosetta-devel, maybe for the same reason (though god knows what that is) that I never got svn access.
Oh, well. Fresh start? |
|
|
|
Adding a new description for a ResidueType: its interchangeability group.
The packer will look at all the residues types in the a given interchangeability
group to decide which ones to allow. This functionality replaces the previous
"name3" string which is overloaded for PDB I/O and is unnecessarily restirctive
in length.
The interchangeability group can be specified in the residue topology file,
but if it is not given, then it will be set from the IO_STRING field (the name3).
New PatchOperation classes for working with interchangeability group data.
Moving PatchOperation class definitions from the .hh file to the .cc file.
The "NC" ResfileOperation now uses the interchangeability group to append
non-canonical amino acids (and other not-canonical-amino-acid residue types)
to the ResidueLevelTask.
Updating the interchangeability groups for the residue types defined in the
fullatom/residue_types/rosetta_specific directory (which previously all
shared the name3 of "XXX").
Updating the patch files that used the SET_IO_STRING command so that they
now also update the interchangeability group. |
|