dhorak1

Why it's useful to use a deskop on ppc64le

I want provide a short example what I've met in the past weeks when dog-fooding a ppc64le Fedora desktop environment on my OpenPOWER based Talos II. We have experienced segfaults coming from a smashed stack in some desktop components, although no one using the mainstream arches noticed them. The toolchain guys will be able to explain why eg. x86_64 is immune (or just lucky), but the problems were real issues in the projects' source code. The common denominator was an incorrect callback signature for GTK+ based apps, the callbacks expected different parameters than were passed by their callers. And this kind of inconsistency can't be found during compile time. IMHO it opens possibilities for some static analysis before producing the binaries by looking at the signal definitions in GTK+ and what functions/callbacks are then attached to them in the projects. Or for some AI that will analyze the crashes and look for the common pattern and recommend a solution. And what's the conclusion - as usually, heterogenity helps to improve quality :-)

Now the details:

  • gcr-promter - this annoyed me a lot, every time I used "ssh" :-)
    https://bugzilla.redhat.com/show_bug.cgi?id=1631759
    https://gitlab.gnome.org/GNOME/gcr/merge_requests/16

  • LibreOffice Draw - kudos to Caolán for the prompt fix
    https://bugzilla.redhat.com/show_bug.cgi?id=1719378
    https://gerrit.libreoffice.org/#/c/73829/

  • gthumb
    https://bugzilla.redhat.com/show_bug.cgi?id=1720701
    https://gitlab.gnome.org/GNOME/gthumb/merge_requests/6
  • dhorak1

    How to debug weird build issues

    When working on a secondary arch Fedora like s390x, we witness interesting build issues sometimes. Like a sudden test failure in e2fsprogs in rawhide. No issue with previous build, no issue with the same sources in F-22. So we started to look what has changed and one thing in Rawhide was enabling the hardened builds globally for all builds. With the hardening disabled the test case passed. It can mean two possible causes - first the code is somehow bad, second there is a bug in the compiler. And when a new major gcc version is released we usually find a couple of bugs, sometimes even general ones, not specific for our architecture. When the issue should be in gcc, then it often depends on the optimization level, so I've tried to switch from the Fedora default -O2 to -O1. And voila, the test passed again. But this is now a global option, but we need to find the piece of code that might be mis-compiled. We call the procedure that follows "bisecting", inspired by bisecting in git as a method to find an offending commit, Here it means limiting the lower optimization level to a specific directory, then to one source file, and then to a single function. It is a time consuming process and requires modifying compiler flags in the buildsystem, using #pragma GCC optimize("O1") in files or adding __attribute__((optimize(("O1")))) to functions. In the case of the test in e2fsprogs we were quite sure it should be either the resize2fs binary or the e2fsck binary. At the end we have identified 3 function in rehash.c source file of e2fsprogs that had to be built with -O1 for the test case to pass. It looked a bit strange to me, usually it is one function that gcc mis-compiles. But from the past I knew another possible cause of interesting failures could be aliasing in combination with wrong code, like here. A quick test build with -fno-strict-aliasing also made the problem to away. The gcc maintainer then identified some pieces of the code that are clearly not aliasing safe and after a short discussion with the e2fsprogs developer we decided to disable strict aliasing for this package as an interim solution as the code is complex and it will take time to fix it properly. And what's the conclusion - using non-mainstream architectures helps in discovering bugs in applications. And also in the toolchain, but that will be another story :-)

    EDIT 2016-03-01:
    Other useful things to try are
  • __attribute__((noinline, noclone)) to make sure function is not inlined
  • -mno-lra option to disable LRA in case code is miscompiled due register allocations

    EDIT 2016-03-22
  • -fno-delete-null-pointer-checks and/or -fno-lifetime-dse (or -flifetime-dse=1) for detecting potentially buggy C/C++ code

    EDIT 2017-02-09
  • GCC has own FAQ entry
  • dhorak1

    Brother DCP-9020CDW multi-function printer in Fedora 20

    Recently my dated HP LaserJet 2300dtn printer stopped to cooperate, the paper was getting stuck in the duplex unit. Might be just a matter of cleaning the internals, but I started to think how to replace the LaserJet and also an old multi-function HP Inkjet which was now used as scanner only. My requirements were laser (or rather no ink), color, duplex printing, wired network and scanner/copier. And my choice went to the Brother DCP-9020CDW. It offered also a reasonable price. Setting the printer wasn't hard, it was about selecting a best matched PPD profile from the foomatic database. Unfortunately the Fedora 20 version missed anything close the 9020, so I have updated the database from a Fedora 21 package and selected DCP-9045CDN as its type. It seems to work fine. Brother offers own Linux drivers, but they include a blob and doesn't seem to be necessary. Using the scanner has been a different story, there a blob is necessary. I've used a guide from ArchLinux and the scanner now work in eg. simple-scan. In my opinion there is a chance for an open-sourced scanner driver, because the driver for the brscan3 family (the DCP-9020 is brscan4) is distributed also in source form.
    dhorak1

    OpenVPN and NetworkManager conflict

    I was trying to configure a system wide VPN using OpenVPN on a F-22 Alpha system by editing config file under /etc/openvpn and got into situation where not all routes sent by the OpenVPN server were applied on the client. After looking at system journal the cause seems to be a conflict between NetworkManager and OpenVPN where OpenVPN opens a new tun0 interface and NM wants to own it. The solution was to create a /etc/sysconfig/network-scripts/ifcfg-tun0 file with 2 lines like
    DEVICE=tun0
    NM_CONTROLLED=no
    
    dhorak1

    Using KVM for testing multipath storage in Fedora 21

    Having multipathed storage is quite common in the server world. Multipath means that a storage device is accessible for the host via multiple paths, usually via Fibre Channel links. But who has a FC array at home :-) Good thing is that this kind of setup can be tested also on your local host using a guest under KVM. I will now describe how this can be done using virt-manager.


    • I have started by updating my Fedora 20 system to the latest and greatest QEMU and libvirt from http://fedoraproject.org/wiki/Virtualization_Preview_Repository

    • then I created a empty guest

    • then added first disk with SCSI (virtio-scsi) type and set its serial number in "Advanced Options" pointing to a logical volume, see http://fedora.danny.cz/kvm-mpath-1.png

    • then I added second disk of the same type, pointing to the same logical volume (will work with disk image too) , you have to ignore the warning virt-manager gives you, and set the same serial number

    • as last step I updated the boot options so the guest would first boot from the disks, then from PXE

    This is how the multipathed disk looks in libvirt's XML guest description:
    ...
      <devices>
        <emulator>/usr/bin/qemu-kvm</emulator>
        <disk type='block' device='disk'>
          <driver name='qemu' type='raw' cache='none' io='native'/>
          <source dev='/dev/Linux/kvm-tmp'/>
          <target dev='sda' bus='scsi'/>
          <serial>0001</serial>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <disk type='block' device='disk'>
          <driver name='qemu' type='raw' cache='none' io='native'/>
          <source dev='/dev/Linux/kvm-tmp'/>
          <target dev='sdb' bus='scsi'/>
          <serial>0001</serial>
          <address type='drive' controller='0' bus='0' target='0' unit='1'/>
        </disk>
     ...


    When I booted the installation media, then I selected the detected multipathed disk as target device and left everything in defaults. After a while I got an installed system :-)

    [root@localhost ~]# multipath -l
    mpatha (0QEMU_QEMU_HARDDISK_0001) dm-0 QEMU    ,QEMU HARDDISK 
    size=20G features='0' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | `- 2:0:0:0 sda 8:0  active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 2:0:0:1 sdb 8:16 active undef running


    When you are not a friend with virt-manager, then you can achieve similar result by using the following command:
    qemu-kvm -m 1024 -device virtio-scsi-pci,id=scsi -drive if=none,id=hda,file=foo.img,serial=0001 -device scsi-hd,drive=hda -drive if=none,id=hdb,file=foo.img,serial=0001 -device scsi-hd,drive=hdb
    
    
    dhorak1

    Empty build group for a koji-shadow build

    Recently I struggled with a strange error when working on Fedora/s390x. I think it was the second time when I saw an error from yum that there are no packages in the build group.

    from root.log of python3-3.4.0-7.fc21 in s390 koji:
    ...
    DEBUG util.py:332:  Executing command: ['/usr/bin/yum', '--installroot', '/var/lib/mock/SHADOWBUILD-f21-python-362783-236636/root/', 'groupinstall', 'build', '--setopt=tsflags=nocontexts'] with env {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'HOSTNAME': 'mock', 'PROMPT_COMMAND': 'echo -n ""', 'HOME': '/builddir', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin'}
    DEBUG util.py:282:  There is no installed groups file.
    DEBUG util.py:282:  Maybe run: yum groups mark convert (see man yum)
    DEBUG util.py:282:  Warning: Group build does not have any packages to install.
    DEBUG util.py:282:  Maybe run: yum groups mark install (see man yum)
    DEBUG util.py:372:  Child return code was: 0
    ...
    


    For the first occurence I just queued a regular (non-shadow) build, but now I wanted to find the reason. So I started with adding debug output into the koji-shadow script and after some time I got it. koji-shadow populates the build group based on the common content of all buildroots (from buildArch tasks on all architectures) from the primary build. There should be 3 buildArch tasks (and buildroots) in primary, one for i686, one for x86_64 and one for armhfp. Unfortunately in these 2 cases (doxygen-1.8.7-1.fc21, python3-3.4.0-7.fc21) one of the buildroots got its content removed in the database, meaning there were only 2 occurences of the buildroot packages, not 3 as expected and as a result no package got included in the build group for the shadow build.

    edit 2014-09-19: the workaround is to edit koji-shadow with the following change:

    @@ -524,11 +527,13 @@
             #        repo and others the new one.
             base = []
             for name, brlist in bases.iteritems():
    +#            print("DEBUG: name=%s brlist=%s" % (name, brlist))
                 #We want to determine for each name if that package was present
                 #in /all/ the buildroots or just some.
                 #Because brlist is constructed only from elements of buildroots, we
                 #can simply check the length
                 assert len(brlist) <= len(buildroots)
    +##            if len(brlist) == len(buildroots)-1:
                 if len(brlist) == len(buildroots):
                     #each buildroot had this as a base package
                     base.append(name)
    
    dhorak1

    Disable font anti-aliasing in XFCE Terminal

    After upgrading my laptop to Fedora 19 I found that the XFCE Terminal application lost the capability to disable font anti-aliasing, the check box went away. And because using anti-aliased Anonymous Pro font made it unreadable I started looking what went wrong. First was a commit in xfce4-terminal git tree. I looked for a solution and found that fontconfig has broad possibilities to change the default behaviour, the answer I looked for was for example here.

    And the solution is to store the following snippet in /etc/fonts/conf.d/29-msimonson-anonymouspro.conf
    
    <?xml version='1.0'?>
    <!DOCTYPE fontconfig SYSTEM 'fonts.dtd'>
    <fontconfig>
      <dir>~/.fonts</dir>
      <match target="pattern">
        <test name="family">
          <string>Anonymous Pro</string>
        </test>
        <edit mode="assign" name="antialias">
          <bool>false</bool>
        </edit>
      </match>
    </fontconfig>
    
    dhorak1

    Installing Fedora 20 in Hercules

    Hercules is a software implementation of the IBM mainframe architectures and serves as a viable solution for various tasks for people who don't have access to the real mainframe hardware. Using these steps you can install the latest Fedora in Hercules on one emulated ECKD DASD device with CTC adapter used for networking. The procedure doesn't require manual intervention as it is using a kickstart file for unattended installation. LVM is used for managing the storage, so it's easy to add new DASDs to expand the available space. The resulting product of the procedure bellow can be found at http://s390.koji.fedoraproject.org/test/hercules/20/

    1. create directory structure
      cd somewhere
      mkdir dasd images
      
    2. get the Hercules config file
      wget http://s390.koji.fedoraproject.org/test/ks/fedora.cnf
      
    3. create an empty ECKD DASD image
      cd dasd
      dasdinit -bz2 -linux linux-ckd.130 3390-9 LNX000
      cd ..
      
    4. get the installer kernel and initrd
      cd images
      wget http://s390.koji.fedoraproject.org/tree/releases/20/Fedora/s390x/os/images/kernel.img
      wget http://s390.koji.fedoraproject.org/tree/releases/20/Fedora/s390x/os/images/initrd.img
      wget http://s390.koji.fedoraproject.org/tree/releases/20/Fedora/s390x/os/images/initrd.addrsize
      
    5. get the parameters file
      wget http://s390.koji.fedoraproject.org/test/ks/generic.prm.kslvm
      cd ..
      
    6. get the LPAR ins file
      wget http://s390.koji.fedoraproject.org/test/ks/ks-lvm.ins
      
    7. the resulting directory structure is then
      ├── dasd
      │   └── linux-ckd.130
      ├── fedora.cnf
      └── images
          ├── generic.prm.kslvm
          ├── initrd.addrsize
          ├── initrd.img
          └── kernel.img
      
    8. add the masquerade rule to your local firewall and enable forwarding
      sudo iptables -t nat -A POSTROUTING -s 192.168.200.0/24 -d 0.0.0.0/0 -j MASQUERADE
      sudo echo 1 > /proc/sys/net/ipv4/ip_forward
      
      You should also check whether there are other firewall rules the could conflict with the Hercules traffic.
    9. start Hercules
      sudo hercules -f fedora.cnf
      
      and check that you see the devices 0130 (DASD) and 0600-0601 (CTC network interface)

    10. IPL Fedora installer
      ipl ks-lvm.ins
      
    11. the installation is now running

    12. log into the running installation as root (no password is set) and apply the workaround for bug 904245
      You need to wait until you see these messages on the console
      ...
      Started OpenSSH server daemon.
      ...
      Started Network Manager.
      ...
      
      then do:
      ssh -l root 192.168.200.3
      chmod 0644 /sys/firmware/reipl/ccw/loadparm
      exit
      
    13. wait some time (cca 3 hours on my dated workstation) until Hercules starts to throw some exceptions (from MSCH opcode), then quit Hercules and you are done
      HHCCP014I CPU0000: Operand exception CODE=0015 ILC=4
      CPU0000:  PSW=00001000 80000000 00000000001167F0 INST=B232D116     MSCH  278(13)                modify_subchannel
      
      EDIT 2014-01-20: after an advice from Robert Knight I changed the reboot command in the kickstart to poweroff, so the guest will shutdown correctly after installation, no more tons of error messages

    14. enjoy Fedora 20 on your virtual mainframe
      sudo hercules -f fedora.cnf
      ipl 130
      
      and from another terminal run
      ssh -l root 192.168.200.3 (password=fedora as set in the kickstart file)
      

    More information

    • if you see on console
      Warning: /dev/root does not exist
      ...
      Starting Dracut Emergency Shell...
      
      then the root= parameter doesn't exist in your generic.prm or points to inaccessible file or firewall rules block IP traffic, either the HTTP connection or DNS queries

    • description of Anaconda options
    • read the Release notes, also follow the links to the previous releases
    EDIT 2016-07-11: the referenced files should be all available here
    dhorak1

    Ideal Fedora bug workflow

    This is how an ideal workflow for a bug in Fedora could look like
  • report a bug for a package
  • realize you know how to fix the bug
  • prepare a patch and get it accepted by upstream
  • add the patch to Fedora package
  • create a new update with the fixed package
    and all done by one person, I know such examples ;-)
  • dhorak1

    Caching buildroot rpms in Koji 1.7+

    Recent Koji changed the URL that are used in the repositories used to populate buildroots from $pkgurl/$name/$version/$release/... to $topurl/$tag/$repoid/toplink/$name/... The presence of repoid means the rpms can't be easily cached, every new repo created has a unique path for the rpms. This isn't such a big problem for regular builds in Koji, where a stable buildroot is used and updates are pushed in batches, but makes a serious problem when koji-shadow is used, because it creates a new repository as close as possible to the original buildroot for every build. We tried to find a solution the Koji developer and although there were some ideas, they would require a normalization of a path containing relative locations, which won't generally work in http clients. So I came up with another solution, which is to rewrite the URLs directly inside a Squid cache to cacheable ones. The rewritten URL is used to identify the files in the cache, so it works nicely.

    Add these 2 options to your /etc/squid.conf
    url_rewrite_program /etc/squid/koji-redirect.pl
    url_rewrite_children 2
    

    and create /etc/squid/koji-redirect.pl with the following content
    #!/usr/bin/perl
        
        $|=1;
        while (<>) {
    	s@kojifiles/.*/toplink/@kojifiles/@;
    	print;
        }