Context Navigation

Version 8 (modified by blyth, 15 years ago)
--

TracNav menu

System Links
Edit Wiki Text to URL mappings
Dayabay Search
swish dyb search
Offline User Manual, OUM (auto updated by build slaves)
BNL
NUU often updated ~hrs before BNL
NTU usually outdated, used for testing
IHEP repositories
dybsvn:/
dybaux
image gallery
NTU repositories
env:/
tracdev:/
aberdeen:/
DB interfaces
ODM DBI Records
optical/radioactivity measurements
http://dayabay.ihep.ac.cn/dbi/
http://web.dyb.ihep.ac.cn/phpMyAdmin/ retired?
http://dybdb1.ihep.ac.cn/phpMyAdmin/index.php
http://dcs2.dyb.ihep.ac.cn/index.php
Monitoring
DQ Comments
dybruns
PQM
dybprod_temp
//e/scm/monitor/ihep/
doc:5050 DAQ dryrun runlist
Documentation
BNL Wiki Offline Documentation
Doxygen Style Documentation
NuWaDoxygen
caltech Doxygen
Mail Archives
offline sympa archive
simulation sympa archive
gaudi-talk
Chat Logs
caltech ChatLogs
Help
NuWa_Trac
Testing_Quickstart
BNL copies
db:NuWa_Trac
db:Testing_Quickstart
PDSF
warehouse
ELogs
LBL elog:/
LBL elog:Antineutrino_Detectors/
LBL elog:MDC/
IHEP http://dayabay.ihep.ac.cn:8099/
OnSite http://web.dyb.ihep.ac.cn:8099/
Photo Galleries
IHEP Gallery
Calendars
Google Calendar
DocDB Calendar
Dayabay Shifts
Daya Bay Shifter Home Page
Shift Scheduling
doc:7487 Shift Starters Guide
twiki:Shift
twiki:ShiftTraining
twiki:ShiftCheck
http://web.dyb.ihep.ac.cn:8099/Shift/
BNL Shifting page Outdated BNL wiki page
Dayabay Wikis
BNL public wiki timeline
BNL private wiki timeline
IHEP external twiki
IHEP Internal TWiki
Dayabay Collaboration
Conferences List
Institute Map
DocDB
DocDB
Dayabay Nightly
dybinst-nightly
Nightly-manual.pdf
IHEP Wiki Pages
ADDryRunGroup
BNL Wiki Pages
db:Offline_Documentation
db:SVN_Statistics
db_:SVN
db:Help:Contents
db:Special:Recentchanges
dbp:Special:Recentchanges
dbp:SimulationGroup
dbp:UserManual

NuWa Slave : automated build/test setup
Slave Status (27 Aug 2010)
How to setup a slave
Getting the slave to do periodic builds
1. Develop/Debug the cron commandline
What happens when builds/tests fail ?
1. Causes of test failure
2. Updating reference output/histograms

NuWa Slave : automated build/test setup

Running a slave provides :

automatically updated and tested dybinst'allation
web interface to the status of the installation including history of build/test status

Slave Status (27 Aug 2010)

location responsible host status
NUU Simon belle7.nuu.edu.tw nearly continuous operation for several years
NTU Simon cms01.phys.ntu.edu.tw nearly continuous operation for several years
BNL ?Jiajie daya0001.rcf.bnl.gov trial runs in process, added to dybinst config
IHEP Miao/Qiumei lxslc.ihep.ac.cn trial runs by Miao, added to dybinst config
Dayabay Miao/Qiumei ? ?
Caltech ?Dan ? ?
LBNL ? ? ?

General Build status and that of dybinst configurations are available at

How to setup a slave

Pre-requisites : python 2.5?, setuptools, bitten ( 0.6dev-r561 )

Although bitten is installed by dybinst into nuwa python as part of the nosebit external, it is more logical to install this into your system python as the slave can then perform green-field dybinst builds without recourse to existing dybinst-allations.

svn checkout http://svn.edgewall.org/repos/bitten/branches/experimental/trac-0.11@561 bitn
cd bitn
python setup.py develop       ## probably with sudo

more recent revisions of bitten have incompatibilites with the trac 0.11 master

Interactive Test Running of the slave

Verify that bitten-slave is installed and in your PATH and is the expected standard version

[blyth@belle7 ~]$ which bitten-slave
/usr/bin/bitten-slave
[blyth@belle7 ~]$ bitten-slave --version
bitten-slave 0.6dev-r561

export dybinst into directory to be used for slave builds (you could use an existing dybinst-allation also)
interactive test run of the slave
```
./dybinst trunk slave 
```
- this should fail complaining of lack of config in your $HOME/.dybinstrc

add or create $HOME/.dybinstrc containing connection credentials

slv_buildsurl=http://dayabay.ihep.ac.cn/tracs/dybsvn/builds
slv_username=slave
slv_password=***
slv_loghost=http://your.address       ## if you are able to publish logfiles

If your credentials are correct the expected startup messages are :

[blyth@cms01 trunk]$ ./dybinst trunk slave
Updating existing installation directory installation/trunk/dybinst.
Updating existing installation directory installation/trunk/dybtest.


Mon Aug  9 16:12:04 CST 2010
Start Logging to /data/env/local/dyb/trunk/dybinst-20100809-161204.log (or dybinst-recent.log)


Starting dybinst commands: slave

Stage: "slave"... 


dybinst-slave invoking : /data/env/local/dyb/trunk/installation/trunk/dybinst/scripts/slave.sh trunk

Contacting the master instance, this will take a while.  Go get muffins...

=== slv-main : derive config /home/blyth/.bitten-slave/dybslv.cfg from source /home/blyth/.dybinstrc
[INFO    ] Setting socket.defaulttimeout : 15.0 
[INFO    ] Setting socket.defaulttimeout : 15.0 
[DEBUG   ] Sending POST request to 'http://dayabay.ihep.ac.cn/tracs/dybsvn/builds'
[INFO    ] No pending builds

Note that slave asked the master if there are any builds to do and got reply No pending builds , the default config is to ask the master every 5 mins if there is anything to do.

In order for the master to instruct the slave to perform builds you must send the hostname to Simon :

[blyth@belle7 ~]$ hostname
belle7.nuu.edu.tw

who will inform add the slave to the master through the Trac Admin web interface.

Running the slave continuously

Supervisord is recommended to keep the slave running,

http://supervisord.org/

Install supervisord into your system python with easy_install or pip :

easy_install supervisor

For tips on using supervisord, see :

http://dayabay.phys.ntu.edu.tw/tracs/env/browser/trunk/base/sv.bash
- ( includes functions to setup redhat init.d scripts that restart supervisord and all its children when your machine is rebooted )

An example of the supervisord config used to keep the dybslv running :

[program:dybslv]
environment=HOME=/home/blyth,BITTEN_SLAVE=/usr/bin/bitten-slave,SLAVE_OPTS=--verbose
directory=/data1/env/local/dyb
command=/data1/env/local/dyb/dybinst -l dybinst-slave.log trunk slave
redirect_stderr=true
redirect_stdout=true
autostart=true
autorestart=true
priority=999
user=blyth

Refreshing the slave build

For reasons of efficiency the slave build (which can be performed multiple times each day) is done as an update build. Certain types of commits are known to be likely to cause issues with update builds, including :

changes to DataModel classes

In order to freshen up the build you can try rebuilding after removing various directories, in progressively increasing levels of cleanliness :

rm -rf NuWa-trunk/dybgaudi/DybRelease/$CMTCONFIG
rm -rf NuWa-trunk/dybgaudi/InstallArea
rm -rf NuWa-trunk/dybgaudi/* ; svn up NuWa-trunk/dybgaudi

To trigger a slave build after the removal, invalidate the last build on the node in question using the web interface (BUILD_ADMIN privilege required)

Monitoring the slave node

After many failures on a slave, it is wise to check running processes ps aux, it can happen that many tens of stuck nuwa.py processes can kill your node. Clean up with pgrep -f nuwa.py ; pkill -f nuwa.py

Getting the slave to do periodic builds

To zeroth order only a few steps are needed to convert a standard update-build bitten slave into a periodic (daily/weekly) builder.

Develop/Debug the cron commandline

Starting point ... interactive trials with :

SLAVE_OPTS="--single --dry-run" ./dybinst -b singleshot_\\\${revision} -l /dev/stdout  trunk slave

dybinst options
-l /dev/stdout send logging to stdout, for debugging
-b singleshot_\\\${revision} option propagated to bitten-slave --build-dir
(variables evaluated in build context supplied by the master)

The SLAVE_OPTS are incorporated into the bitten-slave commandline,

--dry-run is for debugging only : builds are performed but not reported to the master.
--single perform a single build before exiting

While debugging increase verbosity by adding line to ~/.dybinstrc :

slv_verbose=yes

Issues Forseen / Things TODO

may need more escaping \\\${revision} of the build-dir
the cron command might not get a build to perform within the period (if no qualifying commits),
- process pile-up will occur ...
  - maybe avoid by exiting if existing slave process ?
  - perhaps add a first step that checks

will need some purging to avoid filling the disk with builds
- could add a build step to do this cleanup

failed builds need to be marked as such in the file system as well as in the web interface
- add a final build step that checks status and takes action for failures ...
  - renaming of build directories

Understanding how `./dybinst trunk slave` works

dybinst invokes the below which construct and evaluate the bitten-slave commandline to talk to the master and perform builds

bitten-slave options

[blyth@belle7 dyb]$ bitten-slave --help
Usage: bitten-slave [options] url

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --name=NAME           name of this slave (defaults to host name)
  -f FILE, --config=FILE
                        path to configuration file
  -u USERNAME, --user=USERNAME
                        the username to use for authentication
  -p PASSWORD, --password=PASSWORD
                        the password to use when authenticating

  building:
    -d DIR, --work-dir=DIR
                        working directory for builds
    --build-dir=BUILD_DIR
                        name pattern for the build dir to use inside the
                        working dir ["build_${config}_${build}"]
    -k, --keep-files    don't delete files after builds
    -s, --single        exit after completing a single build
    -n, --dry-run       don't report results back to master
    -i SECONDS, --interval=SECONDS
                        time to wait between requesting builds

  logging:
    -l FILENAME, --log=FILENAME
                        write log messages to FILENAME
    -v, --verbose       print as much as possible
    -q, --quiet         print as little as possible
    --dump-reports      whether report data should be printed

What happens when builds/tests fail ?

Failures result in notification emails and an entry on the timeline. Following the link in the email gets you to the build status page, such as :

http://dayabay.ihep.ac.cn/tracs/dybsvn/build/dybinst/3800

Examining the error reporting there and on the summary page

http://dayabay.ihep.ac.cn/tracs/dybsvn/build/dybinst

will tell you which step of the build/tests failed.

You can confirm the error by running pkg tests via dybinst, eg for rootiotest

./dybinst trunk tests rootiotest

and investigate futher by getting into the environment and directory of the pkg running the tests

nosetests -v

Causes of test failure

Non-Run tests can fail by

an assertion/exception in the test being triggered

Run-style tests have many additional ways to fail...

stdout + stderr from command matches a pattern with integer code > 0
time taken by the command exceeds the limit
command returns with non-zero exit code
memory(maxrss) taken by the command exceeds limit
for reference=True tests, the output does not match the reference
for histref=path/to/hists.root tests, any of created histograms do not match the reference path/to/histref_hists.root

Updating reference output/histograms

To update reference outputs or histograms :

simply delete the old one, a new reference will be created at next run, subsequent runs will compare against the new reference

Find test_name.ref and histref_*.root by :

[blyth@cms01 ~]$ cd $DYB/NuWa-trunk/dybgaudi
[blyth@cms01 dybgaudi]$ find . -name '*.ref'

./Simulation/GenTools/test_diffuser.ref
./Simulation/GenTools/test_gun.ref
./Simulation/DetSim/test_historian.ref
./Simulation/DetSim/test_basic_physics.ref
./DataModel/Conventions/test_Conventions.ref
./Production/MDC10b/test_dby0.ref
./RootIO/RootIOTest/test_dybo.ref
./RawData/RawDataTest/share/rawpython.log.ref
./DybAlg/test_dmp.ref
./Tutorial/Quickstart/test_printrawdata_output.ref
./Database/DbiTest/scripts/TestDbiIhep.log.ref
./Database/DbiValidate/tests/test_Conventions.ref

[blyth@cms01 dybgaudi]$ find . -name 'histref_*.root'
./Production/MDC10b/histref_dby1test.root
./Tutorial/Quickstart/histref_rawDataResult.root
[blyth@cms01 dybgaudi]$

Download in other formats:

Plain Text

location	responsible	host	status
NUU	Simon	belle7.nuu.edu.tw	nearly continuous operation for several years
NTU	Simon	cms01.phys.ntu.edu.tw	nearly continuous operation for several years
BNL	?Jiajie	daya0001.rcf.bnl.gov	trial runs in process, added to dybinst config
IHEP	Miao/Qiumei	lxslc.ihep.ac.cn	trial runs by Miao, added to dybinst config
Dayabay	Miao/Qiumei	?	?
Caltech	?Dan	?	?
LBNL	?	?	?

dybinst options
-l /dev/stdout	send logging to stdout, for debugging
-b singleshot_\\\${revision}	option propagated to bitten-slave --build-dir
	(variables evaluated in build context supplied by the master)

Context Navigation

TracNav menu

System Links

Dayabay Search

Offline User Manual, OUM (auto updated by build slaves)

IHEP repositories

NTU repositories

DB interfaces

Monitoring

Documentation

Doxygen Style Documentation

Mail Archives

Chat Logs

Help

BNL copies

PDSF

ELogs

Photo Galleries

Calendars

Dayabay Shifts

Dayabay Wikis

Dayabay Collaboration

DocDB

Dayabay Nightly

IHEP Wiki Pages

BNL Wiki Pages