Weird Behaviour when AQUA is suspended


Advanced search

Message boards : AQUA@home : Weird Behaviour when AQUA is suspended

AuthorMessage
Profile [B^S] DonaldXP
Send message
Joined: Dec 10 08
Posts: 2
Credit: 994,478
RAC: 3,031
Message 4725 - Posted 10 Oct 2009 14:09:21 UTC

    Last modified: 10 Oct 2009 14:09:46 UTC

    Got a very strange issue here:

    WIN Vista HomePremium 64Bit
    Core2Quad 8200@2.33
    4 GB RAM
    BOINC 6.11.13 (also happended with 6.10.11)

    The situation:

    4 WUS are running(e.g. 2 Einstein,1 QMC,1 WCG)
    After a while BOINC switches to AQUA,which uses
    all 4 cores (everything alright so far...)
    BUT after the next switch (AQUA is supended now)
    only 1 WU (e.g. Einstein) is running and 3 cores
    are idle.even stopping and restarting BOINC does not
    change that.Only after the AQUA-WU is finished everything goes back to normal...
    i tried several scenarios with supending other projects,but only if AQUA is involved this is happening.very strange!!!
    Is anybody else having this issue?

    Richard Haselgrove
    Send message
    Joined: Jun 18 09
    Posts: 342
    Credit: 5,517,505
    RAC: 22,801
    Message 4726 - Posted 10 Oct 2009 14:43:23 UTC - in response to Message 4725.

      Maybe related to the "Possible checkpointing problem?" I posted this morning (which I'll probably downgrade to "False alarm" soon, because I haven't seen it again). I wasn't fully awake then, but I saw some signs (yellow 'low efficiency' bands in BoincView) that other CPU tasks were running, but very slowly, while AQUA was supposedly suspended - but apparently still clocking up the CPU seconds, as noted in that thread.

      I'm also using BOINC v6.10.13 (I take it your 6.11... is a typo?), so we ought to keep an eye on this one.

      Profile [B^S] DonaldXP
      Send message
      Joined: Dec 10 08
      Posts: 2
      Credit: 994,478
      RAC: 3,031
      Message 4728 - Posted 10 Oct 2009 15:18:15 UTC - in response to Message 4726.

        I'm also using BOINC v6.10.13 (I take it your 6.11... is a typo?), so we ought to keep an eye on this one.


        Yeah,typo!

        Richard Haselgrove
        Send message
        Joined: Jun 18 09
        Posts: 342
        Credit: 5,517,505
        RAC: 22,801
        Message 4754 - Posted 12 Oct 2009 0:12:19 UTC

          Had another weird one: task 1216815. Again, far too much CPU time compared to the run time:

          Run time 7729.640625
          CPU time 50103.48

          I have some intermediate data which I'll look at in the morning, but provisionally:

          The AQUA task paused (task switch) in the middle of the run, but didn't do so cleanly. While AQUA was pre-empted:

          * AQUA was showing 'waiting to run'
          * CPU time was allocated to AQUA
          * Elapsed time was not allocated to AQUA
          * Other tasks (SETI, CPDN Beta) were showing 'Running'
          * CPU time was not allocated to the other tasks
          * The other tasks did accrue elapsed time
          * Neither AQUA not the other tasks actually made any progress
          * CUDA (SETI) work continued, as normal and at normal speed

          I'll report it to BOINC_alpha after the American (US/Canada) holiday weekend, if we don't have any answers before then.

          lorr
          Send message
          Joined: Sep 29 09
          Posts: 1
          Credit: 2,238
          RAC: 0
          Message 4836 - Posted 19 Oct 2009 16:12:40 UTC - in response to Message 4725.

            I'm having an identical or at least similar problem.
            To me it seems to grab all cores and never release them while suspended,
            while seeming to hold one core and chug along.

            The whole point of distributed processing is to break the problem down to small enough work units that any crap single core can do it.

            I know you're a "special needs" project?

            Yes you, ufluids and bbc climate are all so "special",
            you should be banned from boinc until you learn to be less greedy whores.




            Profile Neo
            Volunteer moderator
            Project administrator
            Project developer
            Project scientist
            Avatar
            Send message
            Joined: Dec 7 08
            Posts: 486
            Credit: 5,306,180
            RAC: 13,459
            Message 4840 - Posted 19 Oct 2009 19:18:13 UTC

              Look, assignment of cores is completely up to BOINC, not up to us. Please feel free to share suggestions and ideas with each other here, but no amount of petty, misplaced name-calling will have any effect on the behaviour of BOINC in choosing when and how to run AQUA. If you have any suggestions for how BOINC should manage multi-threading, you can post them to the BOINC development mailing list (after subscribing): http://www.mail-archive.com/boinc_dev@ssl.berkeley.edu/info.html

              Richard Haselgrove
              Send message
              Joined: Jun 18 09
              Posts: 342
              Credit: 5,517,505
              RAC: 22,801
              Message 4842 - Posted 19 Oct 2009 21:20:59 UTC - in response to Message 4840.

                Look, assignment of cores is completely up to BOINC, not up to us. Please feel free to share suggestions and ideas with each other here, but no amount of petty, misplaced name-calling will have any effect on the behaviour of BOINC in choosing when and how to run AQUA. If you have any suggestions for how BOINC should manage multi-threading, you can post them to the BOINC development mailing list (after subscribing): http://www.mail-archive.com/boinc_dev@ssl.berkeley.edu/info.html

                I have already reported to boinc_alpha (the bug-reporting list) that "AQUA, in its snooty and un-cooperative way, insists on not sharing the CPU with anybody else."

                Please note that this was written, in context, as a *joke*, and should be read as:

                "BOINC regards AQUA and other MT projects as snooty and un-cooperative, and insists on not letting them share the CPU with anybody else."

                That hasn't been specifically addressed, but an earlier comment of mine about BOINC starting AQUA tasks as soon as they had been downloaded (queue-jumping tasks already ready to run) led to changeset [19312]:

                " - client: multi-thread jobs were being given too high priority; in particular, they were preempting jobs in the middle of time slice.
                Solution:
                1) don't use MT in the sort order defined by more_important().
                2) add a 2nd reordering in which MT jobs are moved ahead of non-MT jobs, but only if #CPUs used is < #CPUs
                (see promote_multi_thread_jobs())"

                That should be in the new v6.10.14 and newer v6.10.15: I haven't had a chance to test them over the weekend, but I'll load .15 tomorrow and see how it plays.

                NB I don't expect this change to have any relationship to the original subject of this thread, which was (?BOINC continuing to allocate?) CPU time to AQUA when its task was preempted, yet no progress being made.

                Richard Haselgrove
                Send message
                Joined: Jun 18 09
                Posts: 342
                Credit: 5,517,505
                RAC: 22,801
                Message 4846 - Posted 20 Oct 2009 14:25:20 UTC

                  I've now installed v6.10.15 on three machines: it appears to have severe scheduling problems when set to share a host with projects with single-threaded CPU tasks. The cure is worse than the disease, in that cores are left idle.

                  So far as I can tell (limited observation so far), there should be no problem with running AQUA on its own, or AQUA with GPU projects only - perhaps someone could test that out?

                  But I would advise against v6.10.15 if you're sharing with other CPU projects.

                  Profile Boinc Admin
                  Volunteer moderator
                  Project administrator
                  Project developer
                  Project scientist
                  Send message
                  Joined: Nov 4 08
                  Posts: 1083
                  Credit: 2,228,570
                  RAC: 2,135
                  Message 4847 - Posted 20 Oct 2009 18:31:27 UTC - in response to Message 4846.

                    Thanks Richard for sharing the info.

                    dougdoug
                    Send message
                    Joined: Oct 5 09
                    Posts: 8
                    Credit: 954,639
                    RAC: 0
                    Message 4848 - Posted 20 Oct 2009 18:46:12 UTC

                      I am running an I7 with only Aqua now crunching. Previously I had been running E@h and Cosmo with an occasional M.W. unit,but found that Aqua was hogging the cpu's.
                      Cosmo run time increased by 35% and Aqua by 30%.
                      So it's easier to run Aqua only.
                      Seti Cuda runs easy with no problems alongside Aqua
                      ____________

                      Richard Haselgrove
                      Send message
                      Joined: Jun 18 09
                      Posts: 342
                      Credit: 5,517,505
                      RAC: 22,801
                      Message 4884 - Posted 30 Oct 2009 17:05:10 UTC - in response to Message 4847.

                        Thanks Richard for sharing the info.

                        BOINC v6.10.17 has just been made the recommended version for all users.

                        Unfortunately, it still has difficulties co-scheduling AQUA MT tasks alongside other projects' single-threaded CPU tasks. (Idle CPUs, as previously discussed). The developers are aware of the problem, but because it only affects a very small sub-set of BOINC users (i.e. us!), they decided not to hold up the major release for everybody else. They have also said that there will be a maintenance release to address the problem somewhere along the way, but have given no timescale as yet.

                        In the meantime, I recommend people who want to run both AQUA and other CPU projects should stick with v6.10.13:

                        http://boinc_6.10.13_windows_intelx86.exe (32 bit)
                        http://boinc_6.10.13_windows_x86_64.exe (64 bit)

                        Profile Boinc Admin
                        Volunteer moderator
                        Project administrator
                        Project developer
                        Project scientist
                        Send message
                        Joined: Nov 4 08
                        Posts: 1083
                        Credit: 2,228,570
                        RAC: 2,135
                        Message 4885 - Posted 30 Oct 2009 17:08:35 UTC - in response to Message 4884.

                          BOINC v6.10.17 has just been made the recommended version for all users.

                          Unfortunately, it still has difficulties co-scheduling AQUA MT tasks alongside other projects' single-threaded CPU tasks. (Idle CPUs, as previously discussed). The developers are aware of the problem, but because it only affects a very small sub-set of BOINC users (i.e. us!), they decided not to hold up the major release for everybody else. They have also said that there will be a maintenance release to address the problem somewhere along the way, but have given no timescale as yet.

                          In the meantime, I recommend people who want to run both AQUA and other CPU projects should stick with v6.10.13:

                          http://boinc_6.10.13_windows_intelx86.exe (32 bit)
                          http://boinc_6.10.13_windows_x86_64.exe (64 bit)


                          This is the price to pay for being ahead of the herd :-)

                          Zanth
                          Send message
                          Joined: Aug 8 09
                          Posts: 5
                          Credit: 169,063
                          RAC: 1
                          Message 4899 - Posted 1 Nov 2009 18:04:21 UTC - in response to Message 4885.

                            So it seems. I started having this problem yesterday as well. I was able to get it up to using 5 of 6 I'm allowing BOINC to use, better than one, but still sucks. :P Sadly I was victim to the "won't connect to the localhost" problem with 6.10.13, and sicne .17 was recommended now.... C'est la vie, eh?

                            Richard Haselgrove
                            Send message
                            Joined: Jun 18 09
                            Posts: 342
                            Credit: 5,517,505
                            RAC: 22,801
                            Message 4900 - Posted 1 Nov 2009 18:19:00 UTC

                              The problem I saw with v6.10.15/.16 (and they say it hasn't been changed with .17) is that when AQUA is 'resting' (waiting to run, allowing other projects their turn), not enough other projects started running to utilise all the available or permitted CPU cores.

                              Counter-intuitively, the easiest workround is to suspend the AQUA task: even though it isn't running anyway, suspending it takes it out of the CPU scheduling calculation entirely, and BOINC schedules all the available cores properly. (If you have a second AQUA task downloaded and ready to run, you'll have to suspend that task too.)

                              Unfortunately, in order to do any AQUA work, you'll have to un-suspend it at some point, and when you do so you run the risk that BOINC will make a core idle again: either a task from another project will finish, and nothing will start in its place, or one will reach the 'task switch interval' and be pre-empted, again with no replacement.

                              Profile Boinc Admin
                              Volunteer moderator
                              Project administrator
                              Project developer
                              Project scientist
                              Send message
                              Joined: Nov 4 08
                              Posts: 1083
                              Credit: 2,228,570
                              RAC: 2,135
                              Message 4908 - Posted 2 Nov 2009 8:09:05 UTC - in response to Message 4900.

                                Unfortunately, in order to do any AQUA work, you'll have to un-suspend it at some point, and when you do so you run the risk that BOINC will make a core idle again: either a task from another project will finish, and nothing will start in its place, or one will reach the 'task switch interval' and be pre-empted, again with no replacement.


                                AQUA claims all the cores, not a fraction of them, so BOINC cannot run AQUA with other projects without "over-claiming" the CPU (running twice as many threads as cores). Eventually all single-thread apps should stop executing (so cores become idle one by one till all cores are avilable), and then AQUA should be able to run. Does that actually happen?

                                If not, then we can see what happens if AQUA claims a fraction of the cores (number_of_cores * 0.9 for example). This may make it possible for BOINC to schedule AQUA and at least one other single-thread app. However, this will increase the running time of both apps because of competition over resources such as cache, which means that instead of full-speed computation, two threads will just thrash each other's cache and run at a slower pace than if they could run in order.

                                We can try the above but maybe the best solution is for BOINC to stop and start multiple single-thread apps when it sees that a multi-thread app is present.

                                Richard Haselgrove
                                Send message
                                Joined: Jun 18 09
                                Posts: 342
                                Credit: 5,517,505
                                RAC: 22,801
                                Message 4911 - Posted 2 Nov 2009 17:19:25 UTC - in response to Message 4908.

                                  Unfortunately, in order to do any AQUA work, you'll have to un-suspend it at some point, and when you do so you run the risk that BOINC will make a core idle again: either a task from another project will finish, and nothing will start in its place, or one will reach the 'task switch interval' and be pre-empted, again with no replacement.

                                  AQUA claims all the cores, not a fraction of them, so BOINC cannot run AQUA with other projects without "over-claiming" the CPU (running twice as many threads as cores). Eventually all single-thread apps should stop executing (so cores become idle one by one till all cores are avilable), and then AQUA should be able to run. Does that actually happen?

                                  If not, then we can see what happens if AQUA claims a fraction of the cores (number_of_cores * 0.9 for example). This may make it possible for BOINC to schedule AQUA and at least one other single-thread app. However, this will increase the running time of both apps because of competition over resources such as cache, which means that instead of full-speed computation, two threads will just thrash each other's cache and run at a slower pace than if they could run in order.

                                  We can try the above but maybe the best solution is for BOINC to stop and start multiple single-thread apps when it sees that a multi-thread app is present.

                                  I don't think you need to adjust the AQUA settings just yet. This primarily a BOINC debugging issue: it's a convenient anthropomorphism to say 'AQUA hogs the CPU', but in reality these are BOINC scheduling decisions, and should be addressed as such.

                                  BOINC decides which task(s) to run next on the basis of short term debt (STD). This is defined so that the sum of STD for all projects on the computer is zero: some will be positive, some negative, and a project with no work downloaded is always zero. BOINC allocates tasks to run from the top STD downwards.

                                  So a typical situation, and the problematic one, is when AQUA is "piggy in the middle", with some other projects with higher STD and other projects with lower STD. Given that AQUA wants to claim all the cores, how should that be handled?

                                  v6.10.13 tries to allocate first the project with the top STD, then AQUA, then .... oops .... AQUA, if it's going to get anything at all, gets the whole shooting match, so forget the top STD and run AQUA instead. That keeps the CPU utilised, but gives AQUA a disproportionate time-slice: it only surrenders the CPU when its STD gets -ve enough (below enough other projects to utilise the CPU fully before even considering AQUA). [I also have a gripe that AQUA is allocated the remaining cores instantly, without the other projects even being enough time to reach their next checkpoint, but that's a secondary issue]

                                  v6.10.17 was the next attempt. It tries to allocate first the project with the top STD: OK, it runs. Then it tries to allocate AQUA: oops, can't do that, AQUA gets all or nothing. In this case, nothing. So then BOINC tries to allocate ....

                                  Except it doesn't. That's the problem: projects below AQUA in STD don't get allocated, and the core(s) they should be utilising remain idle instead. That's why I think .17 is worse than .13, and why I advise multi-project crunchers to delay upgrading until the BOINC developers have had another go at the problem.

                                  Profile tullio
                                  Send message
                                  Joined: Dec 22 08
                                  Posts: 168
                                  Credit: 228,791
                                  RAC: 2
                                  Message 4912 - Posted 2 Nov 2009 18:53:27 UTC

                                    I am running BOINC 6.6.41 on Linux and I see it sharing my two cores well enough within my 5 projects, including AQUA. When a new AQUA unit starts it goes high priority, getting all the cores, but after a few hours it alternates with other projects. When running high priority it grabs both cores, but in normal usage it gets from 130 to 140% of the CPU, according to the "top" command. However after one hour it stops and gives a chance to two other projects, each running on one core.
                                    Tullio

                                    John Sikora
                                    Send message
                                    Joined: Jul 11 09
                                    Posts: 9
                                    Credit: 6,940,685
                                    RAC: 33,300
                                    Message 4930 - Posted 6 Nov 2009 19:10:26 UTC

                                      Windows 7 64bit. Core2 Quad (multiple machines) 4 GB RAM Boinc 6.10.17. Same thing has been happening to me. Also one more factoid. I have a similar scenario. Aqua is running on 4 cpus, swaps out as part of normal task switch and then less than 4 cpus start work on other tasks. If I suspend the Aqua task which is waiting to run, the idle processors pick up work.

                                      Profile Kenneth Larsen
                                      Send message
                                      Joined: Feb 8 09
                                      Posts: 10
                                      Credit: 865,237
                                      RAC: 734
                                      Message 4948 - Posted 12 Nov 2009 12:07:24 UTC

                                        Well, I just encountered this bug too when updating Boinc from 6.6.40 to 6.6.18 on Linux. It started even before any Aqua work unit was started, just by attaching to the project and downloading a work unit, so the above suspicions are correct that just having an Aqua wu in the queue gives problems.
                                        As per the recommendation, I've now suspended the work unit.

                                        Post to thread

                                        Message boards : AQUA@home : Weird Behaviour when AQUA is suspended


                                        Home | My Account | Message Boards


                                        Copyright © 2010 D-Wave Systems Inc.