You are viewing sharkcz

Tired of ads? Upgrade to paid account and never see ads again!

How to debug weird build issues

When working on a secondary arch Fedora like s390x, we witness interesting build issues sometimes. Like a sudden test failure in e2fsprogs in rawhide. No issue with previous build, no issue with the same sources in F-22. So we started to look what has changed and one thing in Rawhide was enabling the hardened builds globally for all builds. With the hardening disabled the test case passed. It can mean two possible causes - first the code is somehow bad, second there is a bug in the compiler. And when a new major gcc version is released we usually find a couple of bugs, sometimes even general ones, not specific for our architecture. When the issue should be in gcc, then it often depends on the optimization level, so I've tried to switch from the Fedora default -O2 to -O1. And voila, the test passed again. But this is now a global option, but we need to find the piece of code that might be mis-compiled. We call the procedure that follows "bisecting", inspired by bisecting in git as a method to find an offending commit, Here it means limiting the lower optimization level to a specific directory, then to one source file, and then to a single function. It is a time consuming process and requires modifying compiler flags in the buildsystem, using #pragma GCC optimize("O1") in files or adding __attribute__(optimize(("O1"))) to functions. In the case of the test in e2fsprogs we were quite sure it should be either the resize2fs binary or the e2fsck binary. At the end we have identified 3 function in rehash.c source file of e2fsprogs that had to be built with -O1 for the test case to pass. It looked a bit strange to me, usually it is one function that gcc mis-compiles. But from the past I knew another possible cause of interesting failures could be aliasing in combination with wrong code, like here. A quick test build with -fno-strict-aliasing also made the problem to away. The gcc maintainer then identified some pieces of the code that are clearly not aliasing safe and after a short discussion with the e2fsprogs developer we decided to disable strict aliasing for this package as an interim solution as the code is complex and it will take time to fix it properly. And what's the conclusion - using non-mainstream architectures helps in discovering bugs in applications. And also in the toolchain, but that will be another story :-)
Recently my dated HP LaserJet 2300dtn printer stopped to cooperate, the paper was getting stuck in the duplex unit. Might be just a matter of cleaning the internals, but I started to think how to replace the LaserJet and also an old multi-function HP Inkjet which was now used as scanner only. My requirements were laser (or rather no ink), color, duplex printing, wired network and scanner/copier. And my choice went to the Brother DCP-9020CDW. It offered also a reasonable price. Setting the printer wasn't hard, it was about selecting a best matched PPD profile from the foomatic database. Unfortunately the Fedora 20 version missed anything close the 9020, so I have updated the database from a Fedora 21 package and selected DCP-9045CDN as its type. It seems to work fine. Brother offers own Linux drivers, but they include a blob and doesn't seem to be necessary. Using the scanner has been a different story, there a blob is necessary. I've used a guide from ArchLinux and the scanner now work in eg. simple-scan. In my opinion there is a chance for an open-sourced scanner driver, because the driver for the brscan3 family (the DCP-9020 is brscan4) is distributed also in source form.

OpenVPN and NetworkManager conflict

I was trying to configure a system wide VPN using OpenVPN on a F-22 Alpha system by editing config file under /etc/openvpn and got into situation where not all routes sent by the OpenVPN server were applied on the client. After looking at system journal the cause seems to be a conflict between NetworkManager and OpenVPN where OpenVPN opens a new tun0 interface and NM wants to own it. The solution was to create a /etc/sysconfig/network-scripts/ifcfg-tun0 file with 2 lines like
DEVICE=tun0
NM_CONTROLLED=no
Having multipathed storage is quite common in the server world. Multipath means that a storage device is accessible for the host via multiple paths, usually via Fibre Channel links. But who has a FC array at home :-) Good thing is that this kind of setup can be tested also on your local host using a guest under KVM. I will now describe how this can be done using virt-manager.


  • I have started by updating my Fedora 20 system to the latest and greatest QEMU and libvirt from http://fedoraproject.org/wiki/Virtualization_Preview_Repository

  • then I created a empty guest

  • then added first disk with SCSI (virtio-scsi) type and set its serial number in "Advanced Options" pointing to a logical volume, see http://fedora.danny.cz/kvm-mpath-1.png

  • then I added second disk of the same type, pointing to the same logical volume (will work with disk image too) , you have to ignore the warning virt-manager gives you, and set the same serial number

  • as last step I updated the boot options so the guest would first boot from the disks, then from PXE

This is how the multipathed disk looks in libvirt's XML guest description:
...
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/Linux/kvm-tmp'/>
      <target dev='sda' bus='scsi'/>
      <serial>0001</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/Linux/kvm-tmp'/>
      <target dev='sdb' bus='scsi'/>
      <serial>0001</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
 ...


When I booted the installation media, then I selected the detected multipathed disk as target device and left everything in defaults. After a while I got an installed system :-)

[root@localhost ~]# multipath -l
mpatha (0QEMU_QEMU_HARDDISK_0001) dm-0 QEMU    ,QEMU HARDDISK 
size=20G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:0:0:0 sda 8:0  active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 2:0:0:1 sdb 8:16 active undef running


When you are not a friend with virt-manager, then you can achieve similar result by using the following command:
qemu-kvm -m 1024 -device virtio-scsi-pci,id=scsi -drive if=none,id=hda,file=foo.img,serial=0001 -device scsi-hd,drive=hda -drive if=none,id=hdb,file=foo.img,serial=0001 -device scsi-hd,drive=hdb

Empty build group for a koji-shadow build

Recently I struggled with a strange error when working on Fedora/s390x. I think it was the second time when I saw an error from yum that there are no packages in the build group.

from root.log of python3-3.4.0-7.fc21 in s390 koji:
...
DEBUG util.py:332:  Executing command: ['/usr/bin/yum', '--installroot', '/var/lib/mock/SHADOWBUILD-f21-python-362783-236636/root/', 'groupinstall', 'build', '--setopt=tsflags=nocontexts'] with env {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'HOSTNAME': 'mock', 'PROMPT_COMMAND': 'echo -n ""', 'HOME': '/builddir', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin'}
DEBUG util.py:282:  There is no installed groups file.
DEBUG util.py:282:  Maybe run: yum groups mark convert (see man yum)
DEBUG util.py:282:  Warning: Group build does not have any packages to install.
DEBUG util.py:282:  Maybe run: yum groups mark install (see man yum)
DEBUG util.py:372:  Child return code was: 0
...


For the first occurence I just queued a regular (non-shadow) build, but now I wanted to find the reason. So I started with adding debug output into the koji-shadow script and after some time I got it. koji-shadow populates the build group based on the common content of all buildroots (from buildArch tasks on all architectures) from the primary build. There should be 3 buildArch tasks (and buildroots) in primary, one for i686, one for x86_64 and one for armhfp. Unfortunately in these 2 cases (doxygen-1.8.7-1.fc21, python3-3.4.0-7.fc21) one of the buildroots got its content removed in the database, meaning there were only 2 occurences of the buildroot packages, not 3 as expected and as a result no package got included in the build group for the shadow build.

edit 2014-09-19: the workaround is to edit koji-shadow with the following change:

@@ -524,11 +527,13 @@
         #        repo and others the new one.
         base = []
         for name, brlist in bases.iteritems():
+#            print("DEBUG: name=%s brlist=%s" % (name, brlist))
             #We want to determine for each name if that package was present
             #in /all/ the buildroots or just some.
             #Because brlist is constructed only from elements of buildroots, we
             #can simply check the length
             assert len(brlist) <= len(buildroots)
+##            if len(brlist) == len(buildroots)-1:
             if len(brlist) == len(buildroots):
                 #each buildroot had this as a base package
                 base.append(name)

Disable font anti-aliasing in XFCE Terminal

After upgrading my laptop to Fedora 19 I found that the XFCE Terminal application lost the capability to disable font anti-aliasing, the check box went away. And because using anti-aliased Anonymous Pro font made it unreadable I started looking what went wrong. First was a commit in xfce4-terminal git tree. I looked for a solution and found that fontconfig has broad possibilities to change the default behaviour, the answer I looked for was for example here.

And the solution is to store the following snippet in /etc/fonts/conf.d/29-msimonson-anonymouspro.conf

<?xml version='1.0'?>
<!DOCTYPE fontconfig SYSTEM 'fonts.dtd'>
<fontconfig>
  <dir>~/.fonts</dir>
  <match target="pattern">
    <test name="family">
      <string>Anonymous Pro</string>
    </test>
    <edit mode="assign" name="antialias">
      <bool>false</bool>
    </edit>
  </match>
</fontconfig>

Installing Fedora 20 in Hercules

Hercules is a software implementation of the IBM mainframe architectures and serves as a viable solution for various tasks for people who don't have access to the real mainframe hardware. Using these steps you can install the latest Fedora in Hercules on one emulated ECKD DASD device with CTC adapter used for networking. The procedure doesn't require manual intervention as it is using a kickstart file for unattended installation. LVM is used for managing the storage, so it's easy to add new DASDs to expand the available space. The resulting product of the procedure bellow can be found at http://s390.koji.fedoraproject.org/test/hercules/20/

  1. create directory structure
    cd somewhere
    mkdir dasd images
    
  2. get the Hercules config file
    wget http://s390.koji.fedoraproject.org/test/ks/fedora.cnf
    
  3. create an empty ECKD DASD image
    cd dasd
    dasdinit -bz2 -linux linux-ckd.130 3390-9 LNX000
    cd ..
    
  4. get the installer kernel and initrd
    cd images
    wget http://s390.koji.fedoraproject.org/tree/releases/20/Fedora/s390x/os/images/kernel.img
    wget http://s390.koji.fedoraproject.org/tree/releases/20/Fedora/s390x/os/images/initrd.img
    wget http://s390.koji.fedoraproject.org/tree/releases/20/Fedora/s390x/os/images/initrd.addrsize
    
  5. get the parameters file
    wget http://s390.koji.fedoraproject.org/test/ks/generic.prm.kslvm
    cd ..
    
  6. get the LPAR ins file
    wget http://s390.koji.fedoraproject.org/test/ks/ks-lvm.ins
    
  7. the resulting directory structure is then
    ├── dasd
    │   └── linux-ckd.130
    ├── fedora.cnf
    └── images
        ├── generic.prm.kslvm
        ├── initrd.addrsize
        ├── initrd.img
        └── kernel.img
    
  8. add the masquerade rule to your local firewall and enable forwarding
    sudo iptables -t nat -A POSTROUTING -s 192.168.200.0/24 -d 0.0.0.0/0 -j MASQUERADE
    sudo echo 1 > /proc/sys/net/ipv4/ip_forward
    
    You should also check whether there are other firewall rules the could conflict with the Hercules traffic.
  9. start Hercules
    sudo hercules -f fedora.cnf
    
    and check that you see the devices 0130 (DASD) and 0600-0601 (CTC network interface)

  10. IPL Fedora installer
    ipl ks-lvm.ins
    
  11. the installation is now running

  12. log into the running installation as root (no password is set) and apply the workaround for bug 904245
    You need to wait until you see these messages on the console
    ...
    Started OpenSSH server daemon.
    ...
    Started Network Manager.
    ...
    
    then do:
    ssh -l root 192.168.200.3
    chmod 0644 /sys/firmware/reipl/ccw/loadparm
    exit
    
  13. wait some time (cca 3 hours on my dated workstation) until Hercules starts to throw some exceptions (from MSCH opcode), then quit Hercules and you are done
    HHCCP014I CPU0000: Operand exception CODE=0015 ILC=4
    CPU0000:  PSW=00001000 80000000 00000000001167F0 INST=B232D116     MSCH  278(13)                modify_subchannel
    
    EDIT 2014-01-20: after an advice from Robert Knight I changed the reboot command in the kickstart to poweroff, so the guest will shutdown correctly after installation, no more tons of error messages

  14. enjoy Fedora 20 on your virtual mainframe
    sudo hercules -f fedora.cnf
    ipl 130
    
    and from another terminal run
    ssh -l root 192.168.200.3 (password=fedora as set in the kickstart file)
    

More information

  • if you see on console
    Warning: /dev/root does not exist
    ...
    Starting Dracut Emergency Shell...
    
    then the root= parameter doesn't exist in your generic.prm or points to inaccessible file or firewall rules block IP traffic, either the HTTP connection or DNS queries

  • description of Anaconda options
  • read the Release notes, also follow the links to the previous releases

Ideal Fedora bug workflow

This is how an ideal workflow for a bug in Fedora could look like
  • report a bug for a package
  • realize you know how to fix the bug
  • prepare a patch and get it accepted by upstream
  • add the patch to Fedora package
  • create a new update with the fixed package
    and all done by one person, I know such examples ;-)
  • Caching buildroot rpms in Koji 1.7+

    Recent Koji changed the URL that are used in the repositories used to populate buildroots from $pkgurl/$name/$version/$release/... to $topurl/$tag/$repoid/toplink/$name/... The presence of repoid means the rpms can't be easily cached, every new repo created has a unique path for the rpms. This isn't such a big problem for regular builds in Koji, where a stable buildroot is used and updates are pushed in batches, but makes a serious problem when koji-shadow is used, because it creates a new repository as close as possible to the original buildroot for every build. We tried to find a solution the Koji developer and although there were some ideas, they would require a normalization of a path containing relative locations, which won't generally work in http clients. So I came up with another solution, which is to rewrite the URLs directly inside a Squid cache to cacheable ones. The rewritten URL is used to identify the files in the cache, so it works nicely.

    Add these 2 options to your /etc/squid.conf
    url_rewrite_program /etc/squid/koji-redirect.pl
    url_rewrite_children 2
    

    and create /etc/squid/koji-redirect.pl with the following content
    #!/usr/bin/perl
        
        $|=1;
        while (<>) {
    	s@kojifiles/.*/toplink/@kojifiles/@;
    	print;
        }
    
    If you are involved as a package maintainer in Fedora you have probably already heard "do the fix in primary and koji-shadow will pick it up". The procedure is that all builds in primary Fedora (x86) are replicated in the secondary Fedoras (ARM, PPC, s390) using as close buildroots as possible. Generally only when a build doesn't exist in secondary, because it fails to build, it is replaced by a newer build. The 2 levels of architectures exist because the secondary architectures are not so widespread among Fedora users and a failing build on secondary architecture doesn't affect the package flow in primary. I was told that the original koji-shadow wasn't meant for building secondary architectures and this functionality (the ability to include newer builds in buildroots) was only later added there by Dennis Gilmore. Another useful feature is importing of noarch builds instead of their rebuilding, it saves important portion of the buildsystem resources.

    The koji-shadow tool can run in 2 modes - doing a single build (all missing dependencies will be also built) or scanning a tag and queue all missing builds. I will now describe the improvement koji-shadow received in the previous months and also some existing features.

  • koji-shadow supports 2 lists for packages that won't be built, these lists are merged together and the packages listed there are skipped, but it allows to use one list generated automagically from the ExcludeArch/ExclusiveArch tags in source RPMs and the second list can be manually maintained and contain packages where queueing a build is waste of resources, but it still wasn't decided whether it will be fixed or properly excluded,

  • you can use globs in the ignore list, so I can have "nodejs-*" on s390 and nothing from the nodejs stack will be included on s390 because it depends on v8 and v8 is exclusive for x86 and
    ARM architectures

  • originaly koji-shadow required all builds to be rebuilt to finish, which is not always the situation in the prefer-new mode, some of them can be skipped, now it always finishes when there are no more packages to built,

  • the single build mode wasn't compatible with the prefer-new option, it required all dependencies to be present, now when a tag is specified on command line, it is able to run in prefer-new mode, the tag is here to allow searching the replacements for the missing builds


  • running in the prefer-new mode originaly resulted in including the newest build when the exact build wasn't present, often causing dependency problem when a too new build was included, now the closest newer build is included

  • this is an existing capability, but became more important now, sometimes a build in secondary Koji is successful, but causes additional broken dependencies or simply doesn't work, this is the use-case for "substitute", so the bad build can be replaced by an older (or newer), but good one

    And what you can expect in the future? Karsten prepared a patch to print information which failed builds are blocking the most other build and also has logging improvements in development, right now I use the logging feature of screen to capture the output. In the longer term work is being done to integrate koji-shadow with fedmsg and start builds when they finish in primary and not to rely on periodical scans. Naturally additional ideas are welcome, when patches will be attached the better :-)