log in

Scheduler wait: VM job unmanageable, restarting later


Advanced search

Message boards : Number crunching : Scheduler wait: VM job unmanageable, restarting later

1 · 2 · 3 · 4 . . . 5 · Next
Author Message
JelleNZ
Send message
Joined: 26 Sep 11
Posts: 63
Credit: 1,559,395
RAC: 0
Message 16234 - Posted: 24 Mar 2014, 1:51:48 UTC

I just got the phrase in the title above for the T4T status reported by BOINC Manager after closing another VirtualBox Virtual Machine that had nothing to do with T4T. The T4T task now seems to be stuck in this state. Any suggestions?

JelleNZ
Send message
Joined: 26 Sep 11
Posts: 63
Credit: 1,559,395
RAC: 0
Message 16236 - Posted: 24 Mar 2014, 2:55:44 UTC

Following up on previous message. I shut down BOINC. Then saved the state of the VM in VirtualBox manager, which indicated that it was still running even after the BOINC Manager shutdown. Then I turned the computer off and on again. After starting BOINC again the T4T task continued/resumed. The WU in question was:

http://lhcathome2.cern.ch/test4theory/workunit.php?wuid=4108124

Unfortunately it seems to have been a waste of crunching after the restart; even though I did get full credit for it. Stderr output, from after the restart, is as follows:

<core_client_version>7.2.39</core_client_version>
<![CDATA[
<stderr_txt>
running.
2014-03-24 15:32:34 (1948): Status Report: virtualbox/vboxheadless is no longer running.
....

repeated
2014-03-24 15:44:20 (1948): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 15:44:21 (1948): Powering off VM.
2014-03-24 15:44:21 (1948): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 15:44:21 (1948): Successfully powered off VM.
2014-03-24 15:44:21 (1948): Deregistering VM.
2014-03-24 15:44:21 (1948): Deleting stale snapshot.
2014-03-24 15:44:21 (1948): Removing network bandwidth throttle group from VM.
2014-03-24 15:44:22 (1948): Removing storage controller(s) from VM.
2014-03-24 15:44:22 (1948): Removing VM from VirtualBox.
2014-03-24 15:44:22 (1948): Removing virtual disk drive from VirtualBox.
2014-03-24 15:44:22 (1948): Removing virtual floppy disk from VirtualBox.
15:44:27 (1948): called boinc_finish

</stderr_txt>

So, I still don't know how this happended. It just picked up a new WU for T4T so let's see if I can reproduce this behaviour.

JelleNZ
Send message
Joined: 26 Sep 11
Posts: 63
Credit: 1,559,395
RAC: 0
Message 16237 - Posted: 24 Mar 2014, 3:03:21 UTC

Further update. New WU started running. Looking at the stderr.txt again, however, it seemed to be wasting its time. After the initial messages it just keeps repeating:

Status Report: virtualbox/vboxheadless is no longer running.

So I don't know what to do. I have aborted the WU for now and stopped T4T from getting new tasks. Any advice would be greatly appreciated.

Tom*
Send message
Joined: 11 Aug 11
Posts: 95
Credit: 3,789,336
RAC: 301
Message 16239 - Posted: 24 Mar 2014, 3:32:39 UTC
Last modified: 24 Mar 2014, 3:40:50 UTC

Jelle as soon as you started running 260.72 it started failing.

I thought all (except MACS) were supposed to be running 260.73??

Is your Linux a 32 bit OS?

Try running the wrapper by itself and check for any failures.

Might want to set No new tasks then abort then RESET PROJECT

to get the 260.73 versions

Profile tullio
Send message
Joined: 28 Nov 10
Posts: 1611
Credit: 1,560,230
RAC: 134
Message 16241 - Posted: 24 Mar 2014, 6:02:12 UTC
Last modified: 24 Mar 2014, 6:02:47 UTC

I am still getting 260.72 on my Ubuntu Virtual Machine. "virtualbox/vboxheadless is no longer running", this is the report in stderr.txt. CPU time very high, few credits given, no CERN work done.
Tullio

JelleNZ
Send message
Joined: 26 Sep 11
Posts: 63
Credit: 1,559,395
RAC: 0
Message 16242 - Posted: 24 Mar 2014, 7:50:58 UTC - in response to Message 16239.

I got home and have same problem on another machine. Like my other machine, this also runs Ubuntu Linux 12.04 64-bit. Last stderr.txt:

2014-03-24 02:22:07 (7735): ../../projects/lhcathome2.cern.ch_test4theory/vboxwrapper_26072_x86_64-pc-linux-gnu__vbox64: starting
2014-03-24 02:22:10 (7735): Detected: VirtualBox 4.3.8r92456
2014-03-24 02:22:15 (7735): Create VM. (boinc_53291cf6e0fd7e75, slot#0)
2014-03-24 02:22:16 (7735): Setting CPU Count for VM. (1)
2014-03-24 02:22:17 (7735): Setting Memory Size for VM. (256MB)
2014-03-24 02:22:17 (7735): Setting Chipset Options for VM.
2014-03-24 02:22:17 (7735): Setting Boot Options for VM.
2014-03-24 02:22:17 (7735): Setting Network Configuration for VM.
2014-03-24 02:22:17 (7735): Disabling USB Support for VM.
2014-03-24 02:22:17 (7735): Disabling COM Port Support for VM.
2014-03-24 02:22:18 (7735): Disabling LPT Port Support for VM.
2014-03-24 02:22:18 (7735): Disabling Audio Support for VM.
2014-03-24 02:22:18 (7735): Disabling Clipboard Support for VM.
2014-03-24 02:22:18 (7735): Disabling Drag and Drop Support for VM.
2014-03-24 02:22:18 (7735): Hardware acceleration CPU extensions not detected. Disabling VirtualBox hardware acceleration support.
2014-03-24 02:22:18 (7735): Disabling hardware acceleration support for virtualization.
2014-03-24 02:22:18 (7735): Adding storage controller to VM.
2014-03-24 02:22:19 (7735): Adding virtual disk drive to VM. (vm_image.vdi)
2014-03-24 02:22:19 (7735): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2014-03-24 02:22:19 (7735): Adding virtual floppy disk drive to VM.
2014-03-24 02:22:19 (7735): Enabling network access for VM.
2014-03-24 02:22:19 (7735): Enabling VM firewall rules.
2014-03-24 02:22:20 (7735): Enabling remote desktop for VM.
2014-03-24 02:22:20 (7735): Starting VM.
2014-03-24 02:22:23 (7735): Successfully started VM. (PID = '8174')
2014-03-24 02:22:23 (7735): Reporting VM Process ID to BOINC.
2014-03-24 02:22:23 (7735): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 02:22:23 (7735): Lowering VM Process priority.
2014-03-24 02:22:23 (7735): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 02:22:23 (7735): VM state change detected. (old = 'poweroff', new = 'running')
2014-03-24 02:22:23 (7735): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 02:22:23 (7735): Preference change detected
2014-03-24 02:22:23 (7735): Setting CPU throttle for VM. (100%)
2014-03-24 02:22:24 (7735): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 02:22:25 (7735): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 02:22:26 (7735): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 02:22:27 (7735): Status Report: virtualbox/vboxheadless is no longer running.

Last line repeats endlesslly.

I aborted the WU and reset the project. Took a long time to re-download the VM. Same result. I was still getting the 26.072 wrapper after the reset.

I have now aborted again, and also set my home machine to no new tasks for T4T.

Again, please advise when the project may be able to run again on Linux 64 bit.

Profile Ben Segal
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 1 Nov 10
Posts: 1228
Credit: 264
RAC: 0
Message 16243 - Posted: 24 Mar 2014, 8:13:25 UTC - in response to Message 16241.

I am still getting 260.72 on my Ubuntu Virtual Machine. "virtualbox/vboxheadless is no longer running", this is the report in stderr.txt. CPU time very high, few credits given, no CERN work done.
Tullio


I have asked Rom for advice on this error under Linux (and perhaps on Mac too).

Ben

Profile tullio
Send message
Joined: 28 Nov 10
Posts: 1611
Credit: 1,560,230
RAC: 134
Message 16244 - Posted: 24 Mar 2014, 8:21:47 UTC - in response to Message 16243.
Last modified: 24 Mar 2014, 8:52:01 UTC

Ben,v015 was running OK on both SuSE and Ubuntu.
Tullio
On yesterday's La Repubblica I saw a picture of Carlo Rubbia (80 years old) with Abdus Salam and Paolo Budinich, my professor of Theoretical Physics and Thesis Advisor, although most of my advising was done by Giordano Bisiacchi, who died early in his career in a car crash. Physicists also die.

Profile Ray Murray
Volunteer moderator
Avatar
Send message
Joined: 10 Aug 11
Posts: 248
Credit: 2,156,082
RAC: 649
Message 16253 - Posted: 24 Mar 2014, 11:23:11 UTC
Last modified: 24 Mar 2014, 11:36:46 UTC

I've been getting similar with 26072 on my Virtual Xubuntu running within its own VM.
In Boinc Manger:
Waiting to run (Scheduler wait: VM job unmanageable, restarting later.)
But it never does restart. The VM (and Headless) continue to run and process events but no new snapshots are created. Boinc doesn't seem to able to see it.

From event log:
Mon 24 Mar 2014 10:03:16 GMT | Test4Theory@Home | [task] Process for wu_1395388338_3768_0 exited, status 0, task state 1
Mon 24 Mar 2014 10:03:16 GMT | Test4Theory@Home | [task] task called temporary_exit(86400.000000)
Mon 24 Mar 2014 10:03:16 GMT | Test4Theory@Home | [task] task_state=UNINITIALIZED for wu_1395388338_3768_0 from temporary exit

Exiting Boinc Manager, the Client and the VM continues to run. I've been doing manual snapshots so that it doesn't rewind to the hour-ago (or whatever) snapshot. Starting up Boinc again, it runs happily for a while before doing the same again.

I tried renicing Headless and the wrapper to higher priorities but it might be another process that needs fiddling with.

I tried writing an app_info to see if I could get an earlier wrapper but I don't really know what I'm doing with that (especially for Linux) so I could only get as far as it not starting the VM.

[Couple of edits for clarity.]

Maeax
Send message
Joined: 30 Jan 12
Posts: 277
Credit: 7,437,600
RAC: 925
Message 16266 - Posted: 24 Mar 2014, 16:39:58 UTC - in response to Message 16253.

Helo Ray,

had the same with WIN 8.1 pro (x64).

The T4T-Task go in sleep-modus.

I set one ore more manually Snapshots, until the next boinc-number was reached. Don't know how you call it (860, 1234...etc)

Then the boinc-Client was closed from me and started again.

When the next boinc-number was reached, the T4t-Job in Boinc run's normal and closed after 10 minutes the old snapshots.


maeax

Rom Walton (BOINC)
Avatar
Send message
Joined: 25 Nov 10
Posts: 281
Credit: 39,018
RAC: 0
Message 16267 - Posted: 24 Mar 2014, 20:36:07 UTC

Scheduler wait: VM job unmanageable, restarting later


Normally something like this happens if a requested suspend or resume request fails.

Vboxwrapper attempts to shut everything down and tells BOINC to reschedule the job in 24 hours. In theory this gives the VirtualBox system time to reset itself so that Vboxwapper can manage the VM again in the next attempt.

More details can be found in the stderr.txt file. If you are feeling adventurous you can look at vbox_trace.txt to see what commands were executed and what return values vboxmanage gave us.

----- Rom

Profile Ray Murray
Volunteer moderator
Avatar
Send message
Joined: 10 Aug 11
Posts: 248
Credit: 2,156,082
RAC: 649
Message 16268 - Posted: 24 Mar 2014, 22:31:48 UTC
Last modified: 24 Mar 2014, 23:31:10 UTC

Hi Rom,
Got a new WU just to catch those outputs. I'll kill it after this post.
From stderr:

2014-03-24 20:52:29 (1611): Enabling remote desktop for VM.
2014-03-24 20:52:30 (1611): Starting VM.
2014-03-24 20:52:32 (1611): Successfully started VM. (PID = '2383')
2014-03-24 20:52:32 (1611): Reporting VM Process ID to BOINC.
2014-03-24 20:52:32 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:32 (1611): Lowering VM Process priority.
2014-03-24 20:52:32 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:32 (1611): VM state change detected. (old = 'poweroff', new = 'running')
2014-03-24 20:52:32 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:33 (1611): Preference change detected
2014-03-24 20:52:33 (1611): Setting CPU throttle for VM. (100%)
2014-03-24 20:52:33 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:34 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:35 (1611): Status Report: virtualbox/vboxheadless is no longer running.

Looks normal startup but Boinc seems to detect a problem right from the off.
This continues every second or so, interspersed by perfectly normal checkpoints right down to the ultimate halt condition:

2014-03-24 21:32:16 (1611): Creating new snapshot for VM.
2014-03-24 21:32:16 (1611): Restoring VM Process priority.
2014-03-24 21:32:21 (1611): Lowering VM Process priority.
2014-03-24 21:32:21 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 21:32:22 (1611): Deleting stale snapshot.
2014-03-24 21:32:22 (1611): Error in delete stale snapshot for VM: -2147024809
Command:
VBoxManage -q snapshot "boinc_bb8754e3ef638f7c" delete "384ff72d-a616-4daa-98d4-375d90c4475d"
Output:
VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available)
VBoxManage: error: Context: "DeleteSnapshot(bstrSnapGuid.raw(), pProgress.asOutParam())" at line 421 of file VBoxManageSnapshot.cpp

2014-03-24 21:32:22 (1611): ERROR: Checkpoint maintenance failed, rescheduling task for a later time. (-2147024809)
2014-03-24 21:32:22 (1611): Powering off VM.
2014-03-24 21:32:22 (1611): Error in poweroff VM for VM: -2135228414
Command:
VBoxManage -q controlvm "boinc_bb8754e3ef638f7c" poweroff
Output:
VBoxManage: error: Invalid machine state: DeletingSnapshotOnline (must be Running, Paused or Stuck)
VBoxManage: error: Details: code VBOX_E_INVALID_VM_STATE (0x80bb0002), component Console, interface IConsole, callee nsISupports
VBoxManage: error: Context: "PowerDown(progress.asOutParam())" at line 222 of file VBoxManageControlVM.cpp
2014-03-24 21:32:22 (1611): VM did not power off when requested.

At which point the "Scheduler wait" appears in Boinc Manager. Event log says:

Mon 24 Mar 2014 21:32:23 GMT | Test4Theory@Home | [task] Process for wu_1395388338_5281_0 exited, status 0, task state 1
Mon 24 Mar 2014 21:32:23 GMT | Test4Theory@Home | [task] task called temporary_exit(86400.000000)
Mon 24 Mar 2014 21:32:23 GMT | Test4Theory@Home | [task] task_state=UNINITIALIZED for wu_1395388338_5281_0 from temporary exit

The last few lines of vbox_trace:
2014-03-24 21:32:22 (1611):
Command: VBoxManage -q snapshot "boinc_bb8754e3ef638f7c" delete "384ff72d-a616-4daa-98d4-375d90c4475d"
Exit Code: -2147024809
Output:
VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available)
VBoxManage: error: Context: "DeleteSnapshot(bstrSnapGuid.raw(), pProgress.asOutParam())" at line 421 of file VBoxManageSnapshot.cpp

2014-03-24 21:32:22 (1611):
Command: VBoxManage -q controlvm "boinc_bb8754e3ef638f7c" poweroff
Exit Code: -2135228414
Output:
VBoxManage: error: Invalid machine state: DeletingSnapshotOnline (must be Running, Paused or Stuck)
VBoxManage: error: Details: code VBOX_E_INVALID_VM_STATE (0x80bb0002), component Console, interface IConsole, callee nsISupports
VBoxManage: error: Context: "PowerDown(progress.asOutParam())" at line 222 of file VBoxManageControlVM.cpp

The full stderr is with the task here and I've kept a full copy of the vbox_trace in case there might be something else that you might want to look for.
Repeatable every time so I can do the same again if you want me to look for something else.
Closing Boinc Manager, the VM still runs. Starting Boinc Manager again rewinds to the last checkpoint, or a manually created one, and it runs apparently fine for c.40 mins before going into "wait" again.

Xubuntu 12.04.4 with all updates inside a VM on a Windows7, host which ran fine with 0.15 (whatever wrapper that was), if a little slow because of the overheads. Host is happy with 26073.
The task doesn't end at that point, events continue to be processed (possibly faster due to not being encumbered by Boinc), I ended it gracefully myself rather than waiting to see what might happen after the 24Hr backoff, although that can be done if desired.

Hope all the above might be of some use to you.

[Couple of edits 'cause I'm getting tired. Will look out for replies 2moro.]

Rom Walton (BOINC)
Avatar
Send message
Joined: 25 Nov 10
Posts: 281
Credit: 39,018
RAC: 0
Message 16271 - Posted: 25 Mar 2014, 14:49:46 UTC

Okay, that is interesting.

Ray, as an experiment, using a new task, could you verify that the process id written in stderr.txt actually exists is the output of 'ps -A'?

I wonder if I am grabbing the log file to early on Linux and picking up the previous process id and not the new one. Or something like that.

----- Rom

Profile Ray Murray
Volunteer moderator
Avatar
Send message
Joined: 10 Aug 11
Posts: 248
Credit: 2,156,082
RAC: 649
Message 16275 - Posted: 25 Mar 2014, 17:04:37 UTC
Last modified: 25 Mar 2014, 17:48:00 UTC

I won't be able to do that for about 3 hours but a quick look at the stderr below, copied while the task was still "running" but in the wait state shows 2383 but the stderr reported on the website shows 1492 near the bottom. I'll look at the saved file when I get home to see if there is a change in the ID with the top of the file not being reported. I'll grab another task and keep a close eye on it.

[Edit]
It may have been assigned a new ID in the Close and reopen of Boinc during the graceful shutdown ?

Rom Walton (BOINC)
Avatar
Send message
Joined: 25 Nov 10
Posts: 281
Credit: 39,018
RAC: 0
Message 16276 - Posted: 25 Mar 2014, 18:08:56 UTC

I've just committed this:


Revision: 6c42a3fff7bfe5c3ff6e410179e1fb6cbb8b8ba2
Author: Rom Walton <rwalton@ssl.berkeley.edu>
Date: 3/25/2014 1:14:45 PM
Message:
LIB: Possible fix for process_exists() on Linux. Newer versions of Linux appear to have stricter parameter validation requirements for waitpid().
----
Modified: lib/util.cpp


You can download the new wrappers from here:
x86: http://boinc.berkeley.edu/dl/vboxwrapper_26074_i686-pc-linux-gnu.zip
x64: http://boinc.berkeley.edu/dl/vboxwrapper_26074_x86_64-pc-linux-gnu.zip

I believe this will resolve the issue with the spamming of these messages:

2014-03-24 20:52:33 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:34 (1611): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-24 20:52:35 (1611): Status Report: virtualbox/vboxheadless is no longer running.


Tullio is checking out if rolling back to 4.2.16 fixes the delete stale snapshot issue where you see error messages like this:

2014-03-19 18:19:15 (23595): Error in delete stale snapshot for VM:
-2147024809
Command:
VBoxManage -q snapshot "boinc_2e9d1d83323a9436" delete "4574e372-f867-499c-8112-afc131532d41"
Output:
VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available)
VBoxManage: error: Context: "DeleteSnapshot(bstrSnapGuid.raw(),
pProgress.asOutParam())" at line 421 of file VBoxManageSnapshot.cpp

2014-03-19 18:19:15 (23595): ERROR: Checkpoint maintenance failed, rescheduling task for a later time. (-2147024809)


----- Rom

Profile tullio
Send message
Joined: 28 Nov 10
Posts: 1611
Credit: 1,560,230
RAC: 134
Message 16277 - Posted: 25 Mar 2014, 18:39:25 UTC

It's been running one hour on 4.2.16. Let's cross our fingers.
Tullio

Profile Ray Murray
Volunteer moderator
Avatar
Send message
Joined: 10 Aug 11
Posts: 248
Credit: 2,156,082
RAC: 649
Message 16278 - Posted: 25 Mar 2014, 19:30:15 UTC
Last modified: 25 Mar 2014, 20:29:41 UTC

stderr correctly shows the same PID as ps -A

16067 ? 00:00:03 vboxwrapper_260 ....(not 26072 or 260.72 ? )
16860 ? 00:07:51 VBoxHeadless

and Boinc event log shows
[task] ACTIVE_TASK::start(): forked process: pid 16067

stderr:
2014-03-25 18:33:45 (16067): Starting VM.
2014-03-25 18:33:48 (16067): Successfully started VM. (PID = '16860')
2014-03-25 18:33:48 (16067): Reporting VM Process ID to BOINC.
2014-03-25 18:33:48 (16067): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 18:33:48 (16067): Lowering VM Process priority.
2014-03-25 18:33:48 (16067): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 18:33:48 (16067): VM state change detected. (old = 'poweroff', new = 'running')
2014-03-25 18:33:48 (16067): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 18:33:48 (16067): Preference change detected
2014-03-25 18:33:48 (16067): Setting CPU throttle for VM. (100%)
2014-03-25 18:33:49 (16067): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 18:33:50 (16067): Status Report: virtualbox/vboxheadless is no longer running.

3 normal checkpoints reported in Boinc then the wait state at the point of checkpoint 4 and the wrapper no longer shows in the ps -A terminal.

With Headless and therefore the VM still running, I'll wait for it to finish the current Cern job then gracefully end it.

My previous attempt at editing an app_info from windows to Linux didn't work but if someone could supply one I'd be happy to check out the 26074 wrapper.

Further to earlier speculation; the wrapper does indeed get a new PID when Boinc Manager is closed and restarted.

Profile Ray Murray
Volunteer moderator
Avatar
Send message
Joined: 10 Aug 11
Posts: 248
Credit: 2,156,082
RAC: 649
Message 16280 - Posted: 26 Mar 2014, 0:01:12 UTC
Last modified: 26 Mar 2014, 0:06:19 UTC

Finally got it to accept 26074 but with the same initial result:

2014-03-25 23:38:08 (16996): Successfully started VM. (PID = '17430')
2014-03-25 23:38:08 (16996): Reporting VM Process ID to BOINC.
2014-03-25 23:38:08 (16996): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 23:38:08 (16996): Lowering VM Process priority.
2014-03-25 23:38:08 (16996): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 23:38:08 (16996): VM state change detected. (old = 'poweroff', new = 'running')
2014-03-25 23:38:08 (16996): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 23:38:09 (16996): Preference change detected
2014-03-25 23:38:09 (16996): Setting CPU throttle for VM. (100%)
2014-03-25 23:38:09 (16996): Status Report: virtualbox/vboxheadless is no longer running.
2014-03-25 23:38:10 (16996): Status Report: virtualbox/vboxheadless is no longer running.
.
.
still repeating every second.

VM running and processing events.
I'm off to bed so I'll leave it running and see in the morning whether it goes into "wait" at the 4th checkpoint/snapshot again.
Job for 2moro will be to downgrade VBox to the old, but reliable, 4.2.16 that Tullio is trying.

Rom Walton (BOINC)
Avatar
Send message
Joined: 25 Nov 10
Posts: 281
Credit: 39,018
RAC: 0
Message 16281 - Posted: 26 Mar 2014, 0:58:31 UTC

Oh Linux, why must you vex me so?

Okay, I've posted a new build:
x86: http://boinc.berkeley.edu/dl/vboxwrapper_26075_i686-pc-linux-gnu.zip
x64: http://boinc.berkeley.edu/dl/vboxwrapper_26075_x86_64-pc-linux-gnu.zip

I've added an additional trace statement so that every time I call waitpid() it dumps the inputs and outputs to the stderr.txt log file. It is temp code and should help me figure out what is going on.

Here is what the additional call looks like:


fprintf(stderr, "process_exists(): pid = '%d', p = '%d', status = '%d'\n", pid, p, status);


----- Rom

Profile tullio
Send message
Joined: 28 Nov 10
Posts: 1611
Credit: 1,560,230
RAC: 134
Message 16283 - Posted: 26 Mar 2014, 2:53:08 UTC
Last modified: 26 Mar 2014, 3:21:00 UTC

Still running on 4.2.16 after 9 hours.
Tullio
Yes, Linux is a strange beast, with SuSE trying to upgrade my BOINC to 6.12.34 while the old wrapper runs beautifully on the laptop withj BOINC 6.10.58 and VBox 4.3.8.

1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Scheduler wait: VM job unmanageable, restarting later