[[TracNav(NuwaNav)]] [[PageOutline]] = NuWa Slave : automated build/test setup = Running a slave provides : * automatically updated and tested dybinst'allation * web interface to the status of the installation including history of build/test status = Slave Status (Dec 2010. Updated Aug2011) = || '''location''' || '''responsible''' || '''host''' || '''supervisord''' || '''status''' || || NUU || Simon || belle7.nuu.edu.tw || Y || nearly continuous operation for several years || || NTU || Simon || cms01.phys.ntu.edu.tw || Y || nearly continuous operation for several years || || BNL || Jiajie/DavidJ || daya0001.rcf.bnl.gov || Y || operational || || IHEP || Miao/Qiumei || lxslc\d\d.ihep.ac.cn || Y || operational || || Dayabay || Miao || offline.dyb.local || || ? || || LBNL || Cheng-Ju || pdyb-\d\d.nersc.gov || Y || operational || || VT || Deb Mohapatra || ? || || initial investigations || || Wisconsin || ?Wei || ? || || ? || || Shandong || ? || ? || || ? || || Caltech || ?Dan || ? || || ? || General Build status and that of '''dybinst''' configurations are available at * http://dayabay.ihep.ac.cn/tracs/dybsvn/build * http://dayabay.ihep.ac.cn/tracs/dybsvn/build/dybinst CMTCONFIG for operational slaves (for '''opt''' ones just swap the '''dbg''' ) : || belle7 || i686-slc5-gcc41-dbg || || cms01 || i686-slc4-gcc34-dbg || || RACF/BNL|| x86_64-slc5-gcc43-dbg || || PDSF/LBL|| x86_64-slc5-gcc41-dbg || = How to setup a slave = == Decisions : how many nodes ? which configs ? which nodes ? == Currently the typical configs to build are : * dybinst (debug version) * opt.dybinst (optimized version) These can both be auto-built using a slave on a single node, or the configs can be split between two nodes and builds can then proceed in parallel. You will need to install a few packages into the system python on the nodes, and let Simon know which configs should to handled by which hostnames (the exact output of the {{{hostname}}} command on the nodes is needed). Requirements for slave nodes * able to access [/ http://dayabay.ihep.ac.cn/tracs/dybsvn] * able to install a few packages into system python 2.4-2.7 If your institute policies etc.. allow you to make the node web accessible (eg by running nginx/lighttpd/apache) then your slave node will be more useful, as it can publish : documentation, build logs etc.. == Before you start : do a greenfield dybinst build on each node == This will verify that your intended slave nodes are ready to be auto-builders, and prime your build directory ready for first auto-builds. {{{ cd /path/to/build/dir svn export http://dayabay.ihep.ac.cn/svn/dybsvn/installation/trunk/dybinst/dybinst screen ./dybinst trunk all }}} == Pre-requisites : python 2.4-2.6 , setuptools, bitten ( 0.6dev-r561 ) == A precise version of bitten is required (note this SVN branch no longer exists), so ensure you use this precise URL : {{{ svn checkout http://svn.edgewall.org/repos/bitten/branches/experimental/trac-0.11@561 bitn cd bitn python setup.py develop ## probably with sudo }}} * more recent revisions of bitten have incompatibilites with the trac 0.11 master Bitten is no longer installed by dybinst into nuwa python, as is more logical to install this into your system python as the slave can then perform ''green-field'' dybinst builds without recourse to existing dybinst-allations. If your slave runs on a shared node you are recommended to use the alternative install with security patch that avoids passwords in the process list. === alternative install with patching of the slave for secure running === See #580 for background. {{{ svn checkout http://svn.edgewall.org/repos/bitten/branches/experimental/trac-0.11@561 bitn ## you may need to accept a certificate cd bitn svn export http://dayabay.phys.ntu.edu.tw/repos/env/trunk/trac/patch/bitten/bitten-trac-0.11-561.patch patch -p0 < bitten-trac-0.11-561.patch python setup.py develop ## probably with sudo }}} To configure secure running set the below in your '''~/.dybinstrc''' , and stop and start the slave to test : {{{ slv_secure=yes }}} === pre-requisite troubleshooting === Bitten needs to be installed into the system python, check if that is done using : {{{ [blyth@tbird bitn]$ which python /usr/bin/python [blyth@tbird ~]$ python -c "import bitten ; print bitten.__file__ " Traceback (most recent call last): File "", line 1, in ? ImportError: No module named bitten }}} Desired response is something like: {{{ [blyth@belle7 ~]$ python -c "import bitten ; print bitten.__file__ " /data1/env/local/env/trac/bitn/bitten/__init__.pyc }}} In order to install bitten into the system python, it must have '''setuptools''' already, check if this is the case with: {{{ [blyth@tbird bitn]$ python setup.py --help Traceback (most recent call last): File "setup.py", line 13, in ? from setuptools import setup, find_packages ImportError: No module named setuptools }}} For help on installing setuptools see http://pypi.python.org/pypi/setuptools === porting to py27 === The slave is known not to work with py27. Porting to allow this is not straightforward. Attempts to get the slave running at Virginia Tech and IHEP on py27 failed. A few low-hanging fixes were done and propagated into the bitten patch but the process is not complete. The porting is not easy due to our use of an ancient bitten version and we cannot change that without changing the version of the Trac master. This is because the HTTP based protocol that the slave and master use to communicate ties together their versions. Thus stick with py24/py25/py26 == Interactive Test Running of the slave == * Verify that '''bitten-slave''' is installed and in your PATH and is the expected ''standard'' version {{{ [blyth@belle7 ~]$ which bitten-slave /usr/bin/bitten-slave [blyth@belle7 ~]$ bitten-slave --version bitten-slave 0.6dev-r561 }}} * export dybinst into directory to be used for slave builds (you could use an existing dybinst-allation also) * interactive test run of the slave {{{ ./dybinst trunk slave }}} * this should fail complaining of lack of config in your {{{$HOME/.dybinstrc}}} * add or create {{{$HOME/.dybinstrc}}} containing connection credentials {{{ slv_buildsurl=http://dayabay.ihep.ac.cn/tracs/dybsvn/builds slv_username=slave slv_password=*** slv_loghost=http://your.address ## if you are able to publish logfiles }}} If your credentials are correct the expected startup messages are : {{{ [blyth@cms01 trunk]$ ./dybinst trunk slave Updating existing installation directory installation/trunk/dybinst. Updating existing installation directory installation/trunk/dybtest. Mon Aug 9 16:12:04 CST 2010 Start Logging to /data/env/local/dyb/trunk/dybinst-20100809-161204.log (or dybinst-recent.log) Starting dybinst commands: slave Stage: "slave"... dybinst-slave invoking : /data/env/local/dyb/trunk/installation/trunk/dybinst/scripts/slave.sh trunk Contacting the master instance, this will take a while. Go get muffins... === slv-main : derive config /home/blyth/.bitten-slave/dybslv.cfg from source /home/blyth/.dybinstrc [INFO ] Setting socket.defaulttimeout : 15.0 [INFO ] Setting socket.defaulttimeout : 15.0 [DEBUG ] Sending POST request to 'http://dayabay.ihep.ac.cn/tracs/dybsvn/builds' [INFO ] No pending builds }}} Note that slave asked the master if there are any builds to do and got reply '''No pending builds''' , the default config is to ask the master every 5 mins if there is anything to do. In order for the master to instruct the slave to perform builds you must send the '''hostname''' to Simon : {{{ [blyth@belle7 ~]$ hostname belle7.nuu.edu.tw }}} who will inform add the slave to the master through the Trac Admin web interface. == Getting the Trac Master to request a build == Normally manual slave runs dutifully ask the master if there is anything to do and then report in the negative: {{{ [INFO ] No pending builds }}} To force a build you could make a qualifying commit and then wait for the cooling down period to complete (hoping that no blocking commits impinge). But, a more convenient approach is to invalidate the most recent build through the web interface. For users with BITTEN_ADMIN privilege an '''Invalidate Build''' button appears to the upper left of the individual build pages accessible from * http://dayawane.ihep.ac.cn/tracs/dybsvn/build/dybinst/ == Running the slave continuously == Network glitches or other problems that prevent the slave from contacting the master (every 5min) result in death of the slave. In order to automatically restart the slave following these frequent (typically every few days) stoppages the slave is run as a child of '''supervisord''', which is able to keep the slave running continuously (even surviving reboots if you are able to setup the initd script). * http://supervisord.org/ Install supervisord into your system python with easy_install or pip : {{{ easy_install supervisor }}} For tips on using supervisord, see : * http://dayabay.phys.ntu.edu.tw/tracs/env/browser/trunk/base/sv.bash * ( includes functions to setup redhat init.d scripts that restart supervisord and all its children when your machine is rebooted ) An example of the supervisord config used to keep the dybslv running : {{{ [program:dybslv] environment=HOME=/home/blyth,BITTEN_SLAVE=/usr/bin/bitten-slave,SLAVE_OPTS=--verbose directory=/data1/env/local/dyb command=/data1/env/local/dyb/dybinst -l dybinst-slave.log trunk slave redirect_stderr=true redirect_stdout=true autostart=true autorestart=true priority=999 user=blyth }}} == Slave maxtime and maxrss tuning == Some '''dybtest.Run''' based nosetests impose '''maxtime''' and '''maxrss''' limits on test running. In order to handle slaves with very different performance levels, per-slave factors are implemented (from r11671 and r11672) || '''.dybinstrc variable''' || '''action''' || || slv_factor_cpu || scales maxtime, use greater than 1.0 for slow slaves || || slv_factor_rss || scales maxrss || To configure these per-slave settings add key value pairs to '''~/.dybinstrc''', for example: {{{ slv_factor_cpu=2.0 }}} and stop and restart the slave (eg with {{{supervisorctl restart dybslv}}} ) == Refreshing the slave build == For reasons of efficiency the slave build (which can be performed multiple times each day) is done as an update build. Certain types of commits are known to be likely to cause issues with update builds, including : * changes to DataModel classes In order to freshen up the build you can try rebuilding after removing various directories, in progressively increasing levels of cleanliness : * {{{rm -rf NuWa-trunk/dybgaudi/DybRelease/$CMTCONFIG}}} * {{{rm -rf NuWa-trunk/dybgaudi/InstallArea}}} * {{{rm -rf NuWa-trunk/dybgaudi/* ; svn up NuWa-trunk/dybgaudi}}} To trigger a slave build after the removal, invalidate the last build on the node in question using the web interface (BUILD_ADMIN privilege required) == Greenfield rebuild == If attempts to refresh dybgaudi fail to get auto-building working again, the next thing to try is a full rebuild from scratch, including externals. This will take quite a few hours. Stop the slave using supervisord commandline interface : {{{ N> status dybslv RUNNING pid 5278, uptime 17:13:09 N> stop dybslv dybslv: stopped }}} Move(or delete) the build directory into which dybinst was exported : {{{ cd path/to/dybinst/export/dir/.. mv dyb dyb.old ## dyb is an example name only }}} Do a manual dybinst full run (screen avoids the build terminating when you loose the connection): {{{ mkdir dyb ; cd dyb svn export http://dayabay.ihep.ac.cn/svn/dybsvn/installation/trunk/dybinst/dybinst screen ./dybinst trunk all }}} See {{{man screen}}} for details : * {{{C-a d}}} detach from session, leaving processes running * {{{screen -r}}} re-attach to the session Follow what the build is doing with eg : {{{ [blyth@belle7 dyb]$ tail -f dybinst-20110216-113627.log }}} '''Remember to restart the slave''', from supervisorctl: {{{ N> start dybslv }}} === greenfield optimized build === Once the default debug build is working proceed to optimized build on nodes slated for optimized building. Create an '''opt''' folder within the default '''dbg''' directory: {{{ [blyth@belle7 dyb]$ mkdir opt [blyth@belle7 dyb]$ cd opt [blyth@belle7 opt]$ cp ../dybinst . }}} Force an optimized build by using the '''-O''' option: {{{ [blyth@belle7 opt]$ screen ./dybinst -O trunk all }}} Make the installation opt-by-default as described [#OptimizedBuilds below]. == Monitoring the slave node == After many failures on a slave, it is wise to check running processes {{{ps aux}}}, it can happen that many tens of stuck nuwa.py processes can kill your node. Clean up with {{{pgrep -f nuwa.py ; pkill -f nuwa.py}}} == Restarting a slave after a hiatus == Before restarting after an extended hiatus (more than a few days) it is better to get someone with '''BUILD_ADMIN''' privilege (currently Me, Miao, Cheng-Ju ) to set the build start revision to a recent one to avoid too much catch-up/out-of-order builds of dubious validity. The current starting revisions for each config are visible on the build status page. = Getting the slave to do periodic builds = To zeroth order only a few steps are needed to convert a standard update-build bitten slave into a periodic (daily/weekly) builder. == Develop/Debug the cron commandline == Starting point ... interactive trials with : {{{ SLAVE_OPTS="--single --dry-run" ./dybinst -b singleshot_\\\${revision} -l /dev/stdout trunk slave }}} || '''dybinst''' options || || || '''-l /dev/stdout''' || send logging to stdout, for debugging || || '''-b singleshot_\\\${revision}''' || option propagated to bitten-slave '''--build-dir''' || || || (variables evaluated in build context supplied by the master) || The '''SLAVE_OPTS''' are incorporated into the bitten-slave commandline, * '''--dry-run''' is for debugging only : builds are performed but not reported to the master. * '''--single''' perform a single build before exiting While debugging increase verbosity by adding line to {{{~/.dybinstrc}}} : {{{ slv_verbose=yes }}} === Issues Forseen / Things TODO === * may need more escaping '''\\\${revision}''' of the '''build-dir''' * the cron command might not get a build to perform within the period (if no qualifying commits), * process pile-up will occur ... * maybe avoid by exiting if existing slave process ? * perhaps add a first '''step''' that checks * will need some purging to avoid filling the disk with builds * could add a build step to do this cleanup * failed builds need to be marked as such in the file system as well as in the web interface * add a final build step that checks status and takes action for failures ... * renaming of build directories === Understanding how {{{./dybinst trunk slave}}} works === '''dybinst''' invokes the below which construct and evaluate the bitten-slave commandline to talk to the master and perform builds * source:installation/trunk/dybinst/scripts/dybinst-slave * source:installation/trunk/dybinst/scripts/slave.sh === bitten-slave options === {{{ [blyth@belle7 dyb]$ bitten-slave --help Usage: bitten-slave [options] url Options: --version show program's version number and exit -h, --help show this help message and exit --name=NAME name of this slave (defaults to host name) -f FILE, --config=FILE path to configuration file -u USERNAME, --user=USERNAME the username to use for authentication -p PASSWORD, --password=PASSWORD the password to use when authenticating building: -d DIR, --work-dir=DIR working directory for builds --build-dir=BUILD_DIR name pattern for the build dir to use inside the working dir ["build_${config}_${build}"] -k, --keep-files don't delete files after builds -s, --single exit after completing a single build -n, --dry-run don't report results back to master -i SECONDS, --interval=SECONDS time to wait between requesting builds logging: -l FILENAME, --log=FILENAME write log messages to FILENAME -v, --verbose print as much as possible -q, --quiet print as little as possible --dump-reports whether report data should be printed }}} = What happens when builds/tests fail ? = Failures result in notification emails and an entry on the timeline. Following the link in the email gets you to the build status page, such as : * http://dayabay.ihep.ac.cn/tracs/dybsvn/build/dybinst/3800 Examining the error reporting there and on the summary page * http://dayabay.ihep.ac.cn/tracs/dybsvn/build/dybinst will tell you which '''step''' of the build/tests failed. You can confirm the error by running pkg tests via dybinst, eg for '''rootiotest''' {{{ ./dybinst trunk tests rootiotest }}} and investigate futher by getting into the environment and directory of the pkg running the tests {{{ nosetests -v }}} == Causes of test failure == Non-''Run'' tests can fail by * an assertion/exception in the test being triggered ''Run-style'' tests have many additional ways to fail... * stdout + stderr from command matches a pattern with integer code > 0 * time taken by the command exceeds the limit * command returns with non-zero exit code * memory(maxrss) taken by the command exceeds limit * for '''{{{reference=True}}}''' tests, the output does not match the reference * for '''{{{histref=path/to/hists.root}}}''' tests, any of created histograms do not match the reference '''{{{path/to/histref_hists.root}}}''' == Updating reference output/histograms == To update reference outputs or histograms : * simply delete the old one, a new reference will be created at next run, subsequent runs will compare against the new reference Find '''test_name.ref''' and '''histref_*.root''' by : {{{ [blyth@cms01 ~]$ cd $DYB/NuWa-trunk/dybgaudi [blyth@cms01 dybgaudi]$ find . -name '*.ref' ./Simulation/GenTools/test_diffuser.ref ./Simulation/GenTools/test_gun.ref ./Simulation/DetSim/test_historian.ref ./Simulation/DetSim/test_basic_physics.ref ./DataModel/Conventions/test_Conventions.ref ./Production/MDC10b/test_dby0.ref ./RootIO/RootIOTest/test_dybo.ref ./RawData/RawDataTest/share/rawpython.log.ref ./DybAlg/test_dmp.ref ./Tutorial/Quickstart/test_printrawdata_output.ref ./Database/DbiTest/scripts/TestDbiIhep.log.ref ./Database/DbiValidate/tests/test_Conventions.ref [blyth@cms01 dybgaudi]$ find . -name 'histref_*.root' ./Production/MDC10b/histref_dby1test.root ./Tutorial/Quickstart/histref_rawDataResult.root [blyth@cms01 dybgaudi]$ }}} == Investigating Issues == The primary duty is to isolate the cause and report the problem to the author/responsible in the form of a Trac ticket that enables the investigator to rapidly reproduce the issue. While investigating remember to stop the slave to avoid interference and resource competition from additional builds starting ... eg if using supervisord : {{{ [blyth@cms01 dybgaudi]$ supervisorctl dybslv RUNNING pid 28651, uptime 1 day, 22:27:01 C> stop dybslv dybslv: stopped }}} === attach to python nuwa.py process with gdb === Start the failing test : {{{ [blyth@cms01 MDC10b]$ nosetests tests/test_mdc10b.py:test_dby0 Warning in : duplicate entry =vector.dll> for level 0; ignored Run MDC10b.runLED_Muon.FullChain with double-pulsing of LEDs and no muons to produces 50 readouts ... }}} Attach gdb to the process and continue '''c''' : {{{ [blyth@cms01 dybgaudi]$ gdb `which python` $(pgrep -f $(which nuwa.py)) ... Loaded symbols for /data/env/local/dyb/trunk/NuWa-trunk/dybgaudi/InstallArea/i686-slc4-gcc34-dbg/lib/libG4DataHelpers.so 0xb6687b23 in ParticlePropertySvc::anti (this=0xaa28798, pp=0xaa66a98) at ../src/ParticlePropertySvc/ParticlePropertySvc.cpp:445 445 const ParticleProperty* ap = *it ; (gdb) }}} Unfortunately this approach sometimes gets '''Killed''' for gdb '''Out of Memory'''. === running the command under gdb === Grab the command from the source of the test(if simple) or process table : {{{ ps --no-headers -o command -p $(pgrep -f $(which nuwa.py)) > cmd }}} Edit the cmd file, fixup any missing quotes and prefixing with gdb command : '''set args''' Allowing : {{{ [blyth@cms01 dybgaudi]$ gdb `which python` -x cmd GNU gdb Red Hat Linux (6.3.0.0-1.162.el4rh) Copyright 2004 Free Software Foundation, Inc. ... }}} Capture the backtrace '''bt''' when meet problems : {{{ ElecSimProc INFO Processing hit collections ToolSvc.EsIdealFeeTool INFO Processing 73 pmt pulses. ToolSvc.TsMultTriggerTool INFO Max multiplicity for DayaBayAD1 is 44 *** glibc detected *** malloc(): memory corruption: 0x0fe95d10 *** Program received signal SIGABRT, Aborted. [Switching to Thread -1208318272 (LWP 17858)] 0x00a1e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) (gdb) bt #0 0x00a1e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00a5f915 in raise () from /lib/tls/libc.so.6 #2 0x00a61379 in abort () from /lib/tls/libc.so.6 #3 0x00a93e1a in __libc_message () from /lib/tls/libc.so.6 #4 0x00a9b473 in _int_malloc () from /lib/tls/libc.so.6 #5 0x00a9d0f1 in malloc () from /lib/tls/libc.so.6 #6 0x04fa911e in operator new () from /usr/lib/libstdc++.so.6 #7 0x032762ca in __gnu_cxx::new_allocator > >::allocate (this=0x32798c4, __n=1) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/ext/new_allocator.h:81 #8 0x03276232 in std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::_M_get_node (this=0x32798c4) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_tree.h:356 #9 0x03276159 in std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::_M_create_node (this=0x32798c4, __x=@0xbfe81c88) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_tree.h:365 #10 0x03275ce5 in std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::_M_insert (this=0x32798c4, __x=0x0, __p=0xfe95b88, __v=@0xbfe81c88) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_tree.h:809 #11 0x03275ac9 in std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::insert_unique (this=0x32798c4, __v=@0xbfe81c88) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_tree.h:929 #12 0x0327583f in std::map, std::allocator > >::insert (this=0x32798c4, __x=@0xbfe81c88) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_map.h:360 #13 0x032755cf in DybDaq::FeeTraits::defaultTraits () at ../src/FeeTraits.cc:52 #14 0xb5880e3c in DayaBay::DaqReadoutPmtCrate::channel (this=0xfe97a80, channelId=@0xbfe81dc0) at ../src/DaqReadoutPmtCrate.cc:170 #15 0xb5884bd5 in DayaBay::ReadoutPmtCrate::daqReadout (this=0xfe97780, run=0, event=0) at ../src/ReadoutPmtCrate.cc:77 #16 0xaeb14500 in SingleLoader::execute (this=0xab6ec28) at ../src/SingleLoader.cc:112 #17 0x03f95d2c in Algorithm::sysExecute (this=0xab6ec28) at ../src/Lib/Algorithm.cpp:558 #18 0xaeb1f6fc in DybAlgorithm::sysExecute (this=0xab6ec28) at /data/env/local/dyb/trunk/NuWa-trunk/dybgaudi/InstallArea/include/DybAlg/DybAlgorithmImp.h:59 #19 0x01825d45 in GaudiSequencer::execute (this=0xab6bc00) at ../src/lib/GaudiSequencer.cpp:100 #20 0xb58d3823 in Stage::nextElement (this=0xab6ae78, pIStgData=@0xbfe8248c, erase=true) at ../src/Stage.cc:48 #21 0xb58c0a4e in Sim15::execute (this=0xaae7608) at ../src/Sim15.cc:121 Killed }}} Report findings in Trac tickets such as #565 === why are my added tests not running ? === As a precaution nosetests does not run tests from executable modules unless you do : {{{nosetests --exe}}} OR explicitly specify the path {{{nosetests tests/test_mdc10bfadc.py}}}. Thus you can use {{{chmod ugo-x}}} or {{{chmod ugo+x}}} as a simple way to swap in/out modules of tests from the standard package tests. = Optimized Builds = A new bitten config for doing optimized builds '''opt.dybinst''' * http://dayabay.ihep.ac.cn/tracs/dybsvn/build/opt.dybinst Optimized builds are done in an "opt" directory within the normal dybinst directory : {{{ dybinst external NuWa-trunk opt/ dybinst external NuWa-trunk }}} The master can be configured to distribute "dybinst" and/or "opt.dybinst" configs to your slave. It is not necessary to setup two slaves to perform the "opt" builds, although if you have another node available it has the advantage that "dbg" and "opt" builds can then proceed in parallel. Otherwise with a single slave you will have to wait for the "dbg" build to complete before the "opt" build starts (or vv). If you want to setup parallel "dbg" and "opt" builds then send me 2 lists of hostnames for "opt.dybinst" and "dybinst" builds. == ''opt-by-default'' setup == For the slave test steps to work a manual step is required, to setup your opt installation to be ''opt-by-default'', one line needs to be added to {{{opt/NuWa-trunk/setup/default/cmt/requirements}}} as described at * https://wiki.bnl.gov/dayabay/index.php?title=Installing_Optimized_Software_Build An easy way to do this, from the '''opt''' folder containing '''dybinst''' : {{{ [blyth@belle7 opt]$ . installation/trunk/dybinst/scripts/dybinst-common.sh ## source bash funcs [blyth@belle7 opt]$ type opt-by-default- ## check what the func is going to do opt-by-default- is a function opt-by-default- () { local msg="=== $FUNCNAME :"; local req=${1:-.}/NuWa-trunk/setup/default/cmt/requirements; [ ! -f "$req" ] && echo $msg ABORT cannot find req $req && return 0; local add="macro host-optdbg 'opt'"; echo $msg insert \"$add\" into $req; perl -pi -e 'BEGIN{ undef $/ ; }' -e "s,(^use LCG_Settings v\*\n)(^set CMTCONFIG .*\$),\$1$add\n\$2,msg" $req } [blyth@belle7 opt]$ opt-by-default- ## run the func === opt-by-default- : insert "macro host-optdbg 'opt'" into ./NuWa-trunk/setup/default/cmt/requirements [blyth@belle7 opt]$ cat ./NuWa-trunk/setup/default/cmt/requirements ## check package default version v0 use LCG_Settings v* macro host-optdbg 'opt' set CMTCONFIG ${host-cmtconfig} }}} = dybinst copy step = The final copy step of builds allows the update build directory to be copied ( using ''dybbin pack/unpack/setup'' ) into a revision named directory. When enabled this prevents breakage of trunk from hindering progress by allowing users to trivially shift a recent prior revision. == when builds/tests fail == If a build fails (eg dybgaudi fails to compile) then the copy step is not reached and no copy is made. However if the build completes but some of the tests fail then the copy is still done by the name of the copied directory is changed to indicate the number of failed tests. === debugging slvmon results === The return code from {{{installation/trunk/dybinst/scripts/slvmon.py}}} records the number of test failures discerned from the xml logfiles written by the slave. If you are surprised by this return code and resulting renamed directory then debug the issue by turning up the debug ... {{{ cd /dybinst/export/dir python installation/trunk/dybinst/scripts/slvmon.py dybinst/4059_9542 -l DEBUG cd /dybinst/export/dir/opt python installation/trunk/dybinst/scripts/slvmon.py opt.dybinst/4059_9542 -l DEBUG }}} The single required argument needed is the '''BUILD_SLUG''' which identifies the configuration, build number and revision. '''slvmon''' can also be used in ''scan'' mode to report on the status of all builds for which logfiles are available, see the help for details ... {{{ installation/trunk/dybinst/scripts/slvmon.py --help }}} == configuration of the copy step == The copy is configured by means of variables '''dyb_copy..''' in envfiles such as '''~/.dybinstrc'''. To allow separate configuration for debug and opt builds variants of the config vars ending with '''_opt''' or '''_dbg''' are accepted that take precendence over the generic vars. * '''dyb_copybase''' : directory to which revision directories are copyied, not configuring this or the '''_dbg/_opt''' variant prevents the copying from being done * '''dyb_copykeep''' : number of revision directories to be retained (defaults to 10, can have different opt/dbg settings using '''_dbg/_opt'''), others are purged Currently the purge algorithm decides what to purge/retain based on * modification time of the revision directory * number of symbolic links within '''dyb_copybase''' that point to the revision directory === debugging copy step config === Following r9845 you can debug a mis-behaving copy step using the '''DYBCOPY_DBG''' envvar, for example : {{{ [blyth@cms01 trunk]$ export DYBCOPY_DBG=1 [blyth@cms01 trunk]$ ./dybinst trunk copy dummy Wed Oct 27 11:46:08 CST 2010 Start Logging to /data/env/local/dyb/trunk/dybinst-20101027-114608.log (or dybinst-recent.log) Starting dybinst commands: copy Stage: "copy"... Found CMTCONFIG="i686-slc4-gcc34-dbg" from lcgcmt Checking your CMTCONFIG="i686-slc4-gcc34-dbg"... ...ok. DYBCOPY_DBG : 1 CMTCONFIG : i686-slc4-gcc34-dbg relver : trunk target : dummy slug : dyb_copybase : /data/env/local/dyb dyb_copybase_opt : dyb_copybase_dbg : dyb_copykeep : 8 dyb_copykeep_opt : dyb_copykeep_dbg : dybinst-copy: trunk installation to directory "/data/env/local/dyb/dummy" derived from base:"/data/env/local/dyb" and target:"dummy" slug:"" slvmon:"" as DYBCOPY_DBG is defined skipping : do-copy trunk /data/env/local/dyb/dummy base : /data/env/local/dyb copyto : /data/env/local/dyb/dummy slvmon : }}} == planting daily links == Coordinated ''blessed-for-24hrs'' revisions for configurations : '''dybinst''' and '''opt.dybinst''' are provided by the bitten master at : * http://dayabay.ihep.ac.cn/tracs/dybsvn/daily/dybinst * http://dayabay.ihep.ac.cn/tracs/dybsvn/daily/opt.dybinst The revisions listed correspond to the last revision that was successfully built by all operational slaves for the corresponding configuration prior to the cutoff time : || '''18:00 Dayabay time''' || The '''slvmgr.py''' script accesses these pages in order to determine the blessed revisions for each configuration when it is invoked with the '''--diabolic''' option. The planting of daily links to revision dirs is best done from a cron job running at a coordinated time rather than as part of the copy step. For coordinated diabolic links it is recommended that cron invokes the diabolic option 15-min after the cutoff time, eg with cron command line ( with time converted to your machines timezone). {{{ HOME=/home/joe 15 18 * * * ( cd /path/to/dybinst/export/dir ; python installation/trunk/dybinst/scripts/slvmgr.py --diabolic dybinst opt.dybinst ) > $HOME/diabolic.log 2>&1 }}} Note that diabolic calls outside the time window (cutoff + 10min, cutoff +20 min) do not plant links. Thus to avoid having to change cron config twice a year for daylight saving time changes you can add cron entries an hr ahead and behind the target time 18:15 in your timezone.