Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000135Rosetta[All Projects] Crashpublic2012-10-18 13:492013-02-26 13:41
Assigned Todelucasl 
PlatformAll platformsOSAnyOS VersionAny
Product VersionTrunk 
Fixed in VersionTrunk 
Summary0000135: MySQL transaction commits can deadlock, they should be retried rather than aborting.
DescriptionIf a lot of MPI processes are writing to one database simultaneously, the database engine will occasionally deadlock. Right now Rosetta handles this by utility exiting, it should handle by retrying statement.exec() several times rather than just utility exiting.

The annoying thing is that the cppdb::mysql::Deadlock error shown below is an exception thrown as a cppdb_error exception, rather than a specific exception, either cppdb should be modified to throw a special exception for this, or we can detect it with string matching in safely_write_to_database()
Steps To ReproduceMake a few hundred connections to one database and start writing a few hundred thousand structures, it'll happen eventually.
Additional InformationERROR: cppdb::mysql::Deadlock found when trying to get lock; try restarting transaction
ERROR:: Exit from: src/basic/database/ line: 430
application called MPI_Abort(MPI_COMM_WORLD, 911) - process 0
Assertion failed in file socksm.c at line 1663: (it_plfd->revents & 0x008) == 0
Assertion failed in file socksm.c at line 1663: (it_plfd->revents & 0x008) == 0
Assertion failed in file socksm.c at line 1663: (it_plfd->revents & 0x008) == 0
internal ABORT - process 3
internal ABORT - process 4
internal ABORT - process 5
Assertion failed in file socksm.c at line 1663: (it_plfd->revents & 0x008) == 0
internal ABORT - process 1
Assertion failed in file socksm.c at line 1663: (it_plfd->revents & 0x008) == 0
internal ABORT - process 2
mpiexec: Warning: task 0 exited with status 143.
mpiexec: Warning: tasks 1-5 exited with status 1.
TagsNo tags attached.
Application(s) Affectedany JD2 application
Command Line Usedscore_jd2.linuxgccrelease -database $database @path_to_flags -nstruct $large_number
Developer OptionsConfirmed As Bug
Fixed in SVN Version51751
Attached Files

- Relationships

-  Notes
momeara (Attentive Developer)
2012-10-18 15:10

I'm in favor of refining the cppdb error codes, though our licensing of cppdb requires we pass any modifications upstream.

Sam- we have played with chunking transactions into larger blocks, have you seen this or tried this yourself? It seemed to help when we have many more client nodes than server nodes. When you create a transaction set the transaction_mode. The biggest trick is the sessions need to live long enough to do the chunking, which I'm not sure happens with the JD2DatabaseJobOutputter
delucasl (Administrator)
2012-10-18 15:17

I'm definitely leaning towards adding a deadlock_error exception. As far as I know each individual structure is a single transaction in the Job Outputter. Making bigger chunks than that would be problematic because the DatabaseFilter code assumes that the set of structures visible in the database represents all completed models.

the documentation for InnoDB ( [^]) says that you can mostly avoid deadlocks by structuring the database and queries such that transactions are serialized. I suspect that there are performance and scaling consequences to doing this though. These deadlocks are rare enough that simply retrying is probably the best outcome in our case. I'll try that method first before doing anything more complex.
delucasl (Administrator)
2013-02-26 13:41

I left this open to confirm that 51751 fixed the problem. it does. closing bug.

- Issue History
Date Modified Username Field Change
2012-10-18 13:49 delucasl New Issue
2012-10-18 13:49 delucasl Status new => assigned
2012-10-18 13:49 delucasl Assigned To => delucasl
2012-10-18 15:10 momeara Note Added: 0000119
2012-10-18 15:17 delucasl Note Added: 0000120
2013-02-26 13:41 delucasl Fixed in SVN Version => 51751
2013-02-26 13:41 delucasl Note Added: 0000146
2013-02-26 13:41 delucasl Status assigned => resolved
2013-02-26 13:41 delucasl Fixed in Version => Trunk
2013-02-26 13:41 delucasl Resolution open => fixed

Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker