Weird Behaviour when AQUA is suspended
Message boards : AQUA@home : Weird Behaviour when AQUA is suspended
| Author | Message |
|---|---|
|
Got a very strange issue here: | |
| ID: 4725 | Rating: 0 | rate:
| |
|
Maybe related to the "Possible checkpointing problem?" I posted this morning (which I'll probably downgrade to "False alarm" soon, because I haven't seen it again). I wasn't fully awake then, but I saw some signs (yellow 'low efficiency' bands in BoincView) that other CPU tasks were running, but very slowly, while AQUA was supposedly suspended - but apparently still clocking up the CPU seconds, as noted in that thread. | |
| ID: 4726 | Rating: 0 | rate:
| |
I'm also using BOINC v6.10.13 (I take it your 6.11... is a typo?), so we ought to keep an eye on this one. Yeah,typo! | |
| ID: 4728 | Rating: 0 | rate:
| |
|
Had another weird one: task 1216815. Again, far too much CPU time compared to the run time: Run time 7729.640625 CPU time 50103.48 I have some intermediate data which I'll look at in the morning, but provisionally: The AQUA task paused (task switch) in the middle of the run, but didn't do so cleanly. While AQUA was pre-empted: * AQUA was showing 'waiting to run' * CPU time was allocated to AQUA * Elapsed time was not allocated to AQUA * Other tasks (SETI, CPDN Beta) were showing 'Running' * CPU time was not allocated to the other tasks * The other tasks did accrue elapsed time * Neither AQUA not the other tasks actually made any progress * CUDA (SETI) work continued, as normal and at normal speed I'll report it to BOINC_alpha after the American (US/Canada) holiday weekend, if we don't have any answers before then. | |
| ID: 4754 | Rating: 0 | rate:
| |
|
I'm having an identical or at least similar problem. | |
| ID: 4836 | Rating: 0 | rate:
| |
|
Look, assignment of cores is completely up to BOINC, not up to us. Please feel free to share suggestions and ideas with each other here, but no amount of petty, misplaced name-calling will have any effect on the behaviour of BOINC in choosing when and how to run AQUA. If you have any suggestions for how BOINC should manage multi-threading, you can post them to the BOINC development mailing list (after subscribing): http://www.mail-archive.com/boinc_dev@ssl.berkeley.edu/info.html | |
| ID: 4840 | Rating: 0 | rate:
| |
Look, assignment of cores is completely up to BOINC, not up to us. Please feel free to share suggestions and ideas with each other here, but no amount of petty, misplaced name-calling will have any effect on the behaviour of BOINC in choosing when and how to run AQUA. If you have any suggestions for how BOINC should manage multi-threading, you can post them to the BOINC development mailing list (after subscribing): http://www.mail-archive.com/boinc_dev@ssl.berkeley.edu/info.html I have already reported to boinc_alpha (the bug-reporting list) that "AQUA, in its snooty and un-cooperative way, insists on not sharing the CPU with anybody else." Please note that this was written, in context, as a *joke*, and should be read as: "BOINC regards AQUA and other MT projects as snooty and un-cooperative, and insists on not letting them share the CPU with anybody else." That hasn't been specifically addressed, but an earlier comment of mine about BOINC starting AQUA tasks as soon as they had been downloaded (queue-jumping tasks already ready to run) led to changeset [19312]: " - client: multi-thread jobs were being given too high priority; in particular, they were preempting jobs in the middle of time slice. Solution: 1) don't use MT in the sort order defined by more_important(). 2) add a 2nd reordering in which MT jobs are moved ahead of non-MT jobs, but only if #CPUs used is < #CPUs (see promote_multi_thread_jobs())" That should be in the new v6.10.14 and newer v6.10.15: I haven't had a chance to test them over the weekend, but I'll load .15 tomorrow and see how it plays. NB I don't expect this change to have any relationship to the original subject of this thread, which was (?BOINC continuing to allocate?) CPU time to AQUA when its task was preempted, yet no progress being made. | |
| ID: 4842 | Rating: 0 | rate:
| |
|
I've now installed v6.10.15 on three machines: it appears to have severe scheduling problems when set to share a host with projects with single-threaded CPU tasks. The cure is worse than the disease, in that cores are left idle. | |
| ID: 4846 | Rating: 0 | rate:
| |
|
Thanks Richard for sharing the info. | |
| ID: 4847 | Rating: 0 | rate:
| |
|
I am running an I7 with only Aqua now crunching. Previously I had been running E@h and Cosmo with an occasional M.W. unit,but found that Aqua was hogging the cpu's. | |
| ID: 4848 | Rating: 0 | rate:
| |
Thanks Richard for sharing the info. BOINC v6.10.17 has just been made the recommended version for all users. Unfortunately, it still has difficulties co-scheduling AQUA MT tasks alongside other projects' single-threaded CPU tasks. (Idle CPUs, as previously discussed). The developers are aware of the problem, but because it only affects a very small sub-set of BOINC users (i.e. us!), they decided not to hold up the major release for everybody else. They have also said that there will be a maintenance release to address the problem somewhere along the way, but have given no timescale as yet. In the meantime, I recommend people who want to run both AQUA and other CPU projects should stick with v6.10.13: http://boinc_6.10.13_windows_intelx86.exe (32 bit) http://boinc_6.10.13_windows_x86_64.exe (64 bit) | |
| ID: 4884 | Rating: 0 | rate:
| |
BOINC v6.10.17 has just been made the recommended version for all users. This is the price to pay for being ahead of the herd :-) | |
| ID: 4885 | Rating: 0 | rate:
| |
|
So it seems. I started having this problem yesterday as well. I was able to get it up to using 5 of 6 I'm allowing BOINC to use, better than one, but still sucks. :P Sadly I was victim to the "won't connect to the localhost" problem with 6.10.13, and sicne .17 was recommended now.... C'est la vie, eh? | |
| ID: 4899 | Rating: 0 | rate:
| |
|
The problem I saw with v6.10.15/.16 (and they say it hasn't been changed with .17) is that when AQUA is 'resting' (waiting to run, allowing other projects their turn), not enough other projects started running to utilise all the available or permitted CPU cores. | |
| ID: 4900 | Rating: 0 | rate:
| |
Unfortunately, in order to do any AQUA work, you'll have to un-suspend it at some point, and when you do so you run the risk that BOINC will make a core idle again: either a task from another project will finish, and nothing will start in its place, or one will reach the 'task switch interval' and be pre-empted, again with no replacement. AQUA claims all the cores, not a fraction of them, so BOINC cannot run AQUA with other projects without "over-claiming" the CPU (running twice as many threads as cores). Eventually all single-thread apps should stop executing (so cores become idle one by one till all cores are avilable), and then AQUA should be able to run. Does that actually happen? If not, then we can see what happens if AQUA claims a fraction of the cores (number_of_cores * 0.9 for example). This may make it possible for BOINC to schedule AQUA and at least one other single-thread app. However, this will increase the running time of both apps because of competition over resources such as cache, which means that instead of full-speed computation, two threads will just thrash each other's cache and run at a slower pace than if they could run in order. We can try the above but maybe the best solution is for BOINC to stop and start multiple single-thread apps when it sees that a multi-thread app is present. | |
| ID: 4908 | Rating: 0 | rate:
| |
Unfortunately, in order to do any AQUA work, you'll have to un-suspend it at some point, and when you do so you run the risk that BOINC will make a core idle again: either a task from another project will finish, and nothing will start in its place, or one will reach the 'task switch interval' and be pre-empted, again with no replacement. I don't think you need to adjust the AQUA settings just yet. This primarily a BOINC debugging issue: it's a convenient anthropomorphism to say 'AQUA hogs the CPU', but in reality these are BOINC scheduling decisions, and should be addressed as such. BOINC decides which task(s) to run next on the basis of short term debt (STD). This is defined so that the sum of STD for all projects on the computer is zero: some will be positive, some negative, and a project with no work downloaded is always zero. BOINC allocates tasks to run from the top STD downwards. So a typical situation, and the problematic one, is when AQUA is "piggy in the middle", with some other projects with higher STD and other projects with lower STD. Given that AQUA wants to claim all the cores, how should that be handled? v6.10.13 tries to allocate first the project with the top STD, then AQUA, then .... oops .... AQUA, if it's going to get anything at all, gets the whole shooting match, so forget the top STD and run AQUA instead. That keeps the CPU utilised, but gives AQUA a disproportionate time-slice: it only surrenders the CPU when its STD gets -ve enough (below enough other projects to utilise the CPU fully before even considering AQUA). [I also have a gripe that AQUA is allocated the remaining cores instantly, without the other projects even being enough time to reach their next checkpoint, but that's a secondary issue] v6.10.17 was the next attempt. It tries to allocate first the project with the top STD: OK, it runs. Then it tries to allocate AQUA: oops, can't do that, AQUA gets all or nothing. In this case, nothing. So then BOINC tries to allocate .... Except it doesn't. That's the problem: projects below AQUA in STD don't get allocated, and the core(s) they should be utilising remain idle instead. That's why I think .17 is worse than .13, and why I advise multi-project crunchers to delay upgrading until the BOINC developers have had another go at the problem. | |
| ID: 4911 | Rating: 0 | rate:
| |
|
I am running BOINC 6.6.41 on Linux and I see it sharing my two cores well enough within my 5 projects, including AQUA. When a new AQUA unit starts it goes high priority, getting all the cores, but after a few hours it alternates with other projects. When running high priority it grabs both cores, but in normal usage it gets from 130 to 140% of the CPU, according to the "top" command. However after one hour it stops and gives a chance to two other projects, each running on one core. | |
| ID: 4912 | Rating: 0 | rate:
| |
|
Windows 7 64bit. Core2 Quad (multiple machines) 4 GB RAM Boinc 6.10.17. Same thing has been happening to me. Also one more factoid. I have a similar scenario. Aqua is running on 4 cpus, swaps out as part of normal task switch and then less than 4 cpus start work on other tasks. If I suspend the Aqua task which is waiting to run, the idle processors pick up work. | |
| ID: 4930 | Rating: 0 | rate:
| |
|
Well, I just encountered this bug too when updating Boinc from 6.6.40 to 6.6.18 on Linux. It started even before any Aqua work unit was started, just by attaching to the project and downloading a work unit, so the above suspicions are correct that just having an Aqua wu in the queue gives problems. | |
| ID: 4948 | Rating: 0 | rate:
| |
Message boards :
AQUA@home :
Weird Behaviour when AQUA is suspended