Tuesday, December 14, 2010

SBR600: To Thumb? or Not to Thumb? (v0.3)


This is my last entry for Fedora-ARM and SBR600 @ Seneca College project "To Thumb or Not To Thumb?".

When I first read about the project, frankly, I had no idea where to begin from. The company ARM and their chips, guruplugs, beagle boards and Fedora-ARM architecture; all of these were new to me and as I was trying to find connections between them, it became more interesting. So my project Description is as follows,

Fedora-ARM does not use thumb. The purpose of this project is to discover whether thumb provides any significant savings in terms of code size, whether programs compiled to thumb execute more quickly or more slowly than non-thumb programs on common ARM processors, whether a thumb compilation takes more or less time than non-thumb, and whether there are any other factors that would influence the decision to support thumb. Ultimately, this project should make a recommendation on the use of the thumb instruction set for the Fedora-ARM secondary architecture.

What I have done mainly to complete this project was
 - Download different source rpms that are compiled with gcc.
 - Using rpmbuild, recompile the source rpm in Thumb mode.
 - Using rpmbuild, recompile the same source rpm without Thumb mode.
 - Compare if Thumb speeded up the process or not.
 - Compare if Thumb shrunk the file size or not.
 - Compare if applications compiled with Thumb runs faster or not.

I worked alone for this project, using cdot-guru-4-1 (Fedora 13, armv5tel), cdot-beagleXM-0-3 and hongkong.proximity.on.ca at Seneca Centre for Development of Open Technology (CDOT). My project page can be found in Seneca Wiki: To Thumb or Not to Thumb.

Major tools, commands and configuration files I used are
 - rpmbuild
 - /usr/lib/rpm/redhat/rpmrc
 - /usr/lib/rpm/redhat/macros
 - time
 - screen
 - yumdownloader
 - rpm2cpio
 
In the beginning, I compiled, benchmarked and posted my result without understanding rpmbuild fully, wasted tones of time. During my SBR600 class, the professor, Chris Tyler told the class that the directories in rpmbuild are for following purposes.

rpmbuild/SOURCES — Contains the original sources, patches, and icon files.
rpmbuild/SPECS — Contains the spec files used to control the build process.
rpmbuild/BUILD — The directory in which the sources are unpacked, and the software is built.
rpmbuild/RPMS — Contains the binary package files created by the build process.
rpmbuild/SRPMS — Contains the source package files created by the build process.

So here are the file size comparisons after compiling source rpm packages in THUMB, No THUMB and THUMB2. The rpm macro settings were identical except -march=armv5tel, -march=armv5tel -mthumb -mthumb-interwork and -march=armv7 -mthumb.


The numbers inside the table represent file/directory sizes in bytes. File sizes are checked with ll and directory sizes are checked with du --bytes.


nled-2.52
Produced RPM: nled-2.52-6.fc12.armv5tel.rpm
Directories and files: 2 directories, 1 file
Binary File: usr/bin/nled


w/o Thumb THUMB THUMB2 THUMB / NON-THUMB THUMB / THUMB2
RPM 20776 20204 20328 0.9724 0.9939
Directory 50892 43084 42684 0.8465 1.0093
Binary 38604 30796 30396 0.7977 1.0131


wget-1.11.4
Produced RPM: wget-1.11.4-5.fc12.armv5tel.rpm
Directories and files: 86 directories, 48 files
Binary File: usr/bin/wget


w/o Thumb THUMB THUMB2 THUMB / NON-THUMB THUMB / THUMB2
RPM 422360 420788 419852 0.9962 1.0022
Directory 1863679 1811467 1810431 0.9719 1.0006
Binary 226464 174252 173216 0.7694 1.0060


httpd-2.2.13
Produced RPM: httpd-2.2.13-4.fc12.armv5tel.rpm
Directories and files: 37 directories, 334 files
Binary File: usr/sbin/httpd


w/o Thumb THUMB THUMB2 THUMB / NON-THUMB THUMB / THUMB2
RPM 748180 738632 736500 0.9872 1.0029
Directory 2752985 2425373 2408025 0.8810 1.0072
Binary 287988 233660 229712 0.8113 1.0172


gimp-2.6.7
Produced RPM: gimp-2.6.7-2.fc12.armv5tel.rpm
Directories and files: 206 directories, 1204 files
Binary File 1: usr/bin/gimp-2.6
Binary File 2: usr/bin/gimp-console-2.6


w/o Thumb THUMB THUMB2 THUMB / NON-THUMB THUMB / THUMB2
RPM 12423104 12431184 12350600 1.0006 1.0065
Directory 45780103 43891439 43730427 0.9587 1.0037
Binary 1 4284848 3450936 3371440 0.8054 1.0236
Binary 2 2131228 1688860 1656092 0.7924 1.0198


abiword-2.8.1
Produced RPM: libabiword-2.8.1-1.fc12.armv5tel.rpm
Directories and files: 39 directories, 617 files
Binary File: usr/lib/libabiword-2.8


w/o Thumb THUMB THUMB2 THUMB / NON-THUMB THUMB / THUMB2
RPM 5561088 4779800 4742828 0.8595 1.0078
Directory 21295155 17055836 16987340 0.8009 1.0040
Binary 5806908 3450936 3371440 0.5943 1.0236


tar-1.22
Produced RPM: tar-1.22-8.fc12.armv5tel.rpm
Directories and files: 79 directories, 49 files
Binary File: bin/tar


w/o Thumb THUMB THUMB2 THUMB / NON-THUMB THUMB / THUMB2
RPM 744340 742784 742180 0.9979 1.0008
Directory 2737195 2681159 2675975 0.9795 1.0019
Binary 251844 195808 190624 0.7775 1.0272


cpio-2.10
Produced RPM: cpio-2.10-5.fc12.armv5tel.rpm
Directories and files: 49 directories, 30 files
Binary File: bin/cpio
w/o Thumb THUMB THUMB / NON-THUMB
RPM 189908 189276 0.9967
Directory 842194 816194 0.9691
Binary 117064 91064 0.7779


gzip-1.3.12
Produced RPM: gzip-1.3.12-15.fc12.armv5tel.rpm
Directories and files: 9 directories, 33 files
Binary File: bin/gzip
w/o Thumb THUMB THUMB / NON-THUMB
RPM 116560 115736 0.9929
Directory 263922 252086 0.9551
Binary 61708 49872 0.8082


bzip2-1.0.5
Produced RPM: bzip2-1.0.5-6.fc12.armv5tel.rpm
Directories and files: 7 directories, 21 files
Binary File: /usr/bin/bzip2
w/o Thumb THUMB THUMB / NON-THUMB
RPM 49476 49232 0.9951
Directory 109233 103449 0.9470
Binary 33024 28076 0.8502


In average, THUMB reduced the sizes about
2.24% for *.rpm files,
7.67% for extracted directories, and
22.16% for executables, compared to NON-THUMB.
While it might save some disk space for executable files, it doesn't make a big different for rpm packages since rpm has it's own compressing mechanism.

In average, THUMB produced files that are about
0.24% for *.rpm files,
0.45% for extracted directories, and
1.86% for executables, greater than THUMB2.
I noticed that THUMB produces files that are almost as small as THUMB2's products.


In terms of performance of binaries that are built with THUMB and non-THUMB, there weren't too many programs that I could efficiently test in the command line interface.

I've downloaded source code for openoffice (openoffice.org-3.1.1-19.34.fc12.src.rpm), extracted twice to two separate directories, and compiled each one of them with tar binary that was compiled with THUMB, and without THUMB. I've done every test 4 times for better accuracy.

System: cdot-guru-4-1
File: Extracted openoffice.org-3.1.1-19.34.fc12.src.rpm
Command Used: time ./tar -cvf nothumb.tar ~/temp


w/o Thumb THUMB
1st Attempt real 0m33.013s
user 0m0.090s
sys 0m3.800s
real 0m45.746s
user 0m0.170s
sys 0m3.710s
2nd Attempt real 0m38.410s
user 0m0.150s
sys 0m3.860s
real 0m40.039s
user 0m0.130s
sys 0m3.760s
3rd Attempt real 0m37.572s
user 0m0.110s
sys 0m3.820s
real 0m42.034s
user 0m0.140s
sys 0m3.820s
4th Attempt real 0m36.934s
user 0m0.150s
sys 0m3.670s
real 0m43.051s
user 0m0.100s
sys 0m3.830s

The tar that was compiled with THUMB was compressing little bit slower in every attempts. Just to confirm that both tars compressed correctly, I've checked sizes of tarballs.

-rw-rw-r-- 1 thlee3 thlee3 218183680 2010-11-26 10:43 nothumb.tar
-rw-rw-r-- 1 thlee3 thlee3 218183680 2010-11-26 10:41 thumb.tar
And they seemed ok.

As a second test, I've tried decompressing one of the tarballs I created in the first test, using both tars.

System: cdot-guru-4-1
File: nothumb.tar
Command Used: time ~/tar.mthumb-interwork/bin/tar -xvf nothumb.tar


w/o Thumb THUMB
1st Attempt real 0m48.399s
user 0m0.170s
sys 0m3.560s
real 0m54.197s
user 0m0.160s
sys 0m3.480s
2nd Attempt real 0m46.306s
user 0m0.170s
sys 0m3.280s
real 0m48.234s
user 0m0.160s
sys 0m3.680s
3rd Attempt real 0m45.602s
user 0m0.130s
sys 0m3.800s
real 0m49.501s
user 0m0.150s
sys 0m3.490s
4th Attempt real 0m46.546s
user 0m0.110s
sys 0m3.420s
real 0m52.897s
user 0m0.120s
sys 0m3.460s

Again, the tar that was built with THUMB decompressed the tarball little bit slower. I should focus more into performance difference between binaries that are built with THUMB and non-THUMB in this project.

In addition to the benchmark above, I've also compared performances of two bzip2 and gzip that are compiled with/without THUMB.
I extracted & decompressed gimp-2.6.7-2.fc12.armv5tel.rpm file, archived the extracted directory with tar and tried compressing & decompressing using bzip2 and gzips that are compiled with/without THUMB.

System: cdot-guru-4-1
File: gimp.tar (44MB)
Command Used: time ./bzip2 ~/gimp.tar


w/o Thumb THUMB
1st Attempt real    1m4.616s
user    0m59.250s
sys    0m0.650s
real    1m1.379s
user    0m57.240s
sys    0m0.720s
2nd Attempt real    1m4.985s
user    1m0.530s
sys    0m0.790s
real    0m59.899s
user    0m57.190s
sys    0m0.600s
3rd Attempt real    1m5.214s
user    0m58.050s
sys    0m0.620s
real    1m7.306s
user    1m0.100s
sys    0m0.530s
4th Attempt real    1m4.292s
user    0m56.880s
sys    0m0.700s
real    1m4.859s
user    0m57.690s
sys    0m0.770s
5th Attempt real    0m59.811s
user    0m57.520s
sys    0m0.470s
real    1m1.930s
user    0m57.210s
sys    0m0.540s
6th Attempt real    1m3.452s
user    0m57.040s
sys    0m0.590s
real    1m1.532s
user    0m57.930s
sys    0m0.790s
7th Attempt real    1m0.604s
user    0m57.710s
sys    0m0.650s
real    1m2.430s
user    0m57.420s
sys    0m0.730s

For compression with bzip2, it shows almost no difference in performance.


System: cdot-guru-4-1
File: gimp.tar.bz2
Command Used: time ./bzip2 -d ~/gimp.tar.bz2


w/o Thumb THUMB
1st Attempt real    0m18.828s
user    0m17.430s
sys    0m0.820s
real    0m19.662s
user    0m17.850s
sys    0m0.990s
2nd Attempt real    0m19.549s
user    0m17.610s
sys    0m1.160s
real    0m19.647s
user    0m17.940s
sys    0m0.850s
3rd Attempt real    0m19.462s
user    0m17.600s
sys    0m0.950s
real    0m19.271s
user    0m17.620s
sys    0m0.870s
4th Attempt real    0m23.303s
user    0m17.800s
sys    0m0.850s
real    0m22.863s
user    0m17.290s
sys    0m1.120s
5th Attempt real    0m24.544s
user    0m18.120s
sys    0m1.060s
real    0m19.155s
user    0m17.560s
sys    0m1.030s
6th Attempt real    0m19.397s
user    0m17.540s
sys    0m0.800s
real    0m22.998s
user    0m17.400s
sys    0m0.970s
7th Attempt real    0m19.462s
user    0m17.460s
sys    0m1.090s
real    0m19.688s
user    0m17.910s
sys    0m0.870s

Similar to compression, decompression with bzip2 shows no noticeable difference in performance.


System: cdot-guru-4-1
File: gimp.tar
Command Used: time ./gzip ~/gimp.tar


w/o Thumb THUMB
1st Attempt real    0m32.809s
user    0m30.620s
sys    0m0.780s
real    0m35.226s
user    0m33.390s
sys    0m0.770s
2nd Attempt real    0m43.640s
user    0m38.300s
sys    0m0.790s
real    0m38.300s
user    0m34.380s
sys    0m0.650s
3rd Attempt real    0m45.471s
user    0m33.280s
sys    0m0.670s
real    0m42.994s
user    0m36.880s
sys    0m0.710s
4th Attempt real    0m42.212s
user    0m32.690s
sys    0m0.510s
real    0m41.896s
user    0m34.540s
sys    0m0.740s
5th Attempt real    0m43.309s
user    0m31.400s
sys    0m0.740s
real    0m39.998s
user    0m35.240s
sys    0m0.840s
6th Attempt real    0m45.217s
user    0m31.850s
sys    0m0.550s
real    0m39.942s
user    0m34.540s
sys    0m0.670s

The numbers were up and down for compressing with tar. I've actually made about 15 attempts and the results weren't similar at all.



System: cdot-guru-4-1
File: gimp.tar.gz
Command Used: time ./gzip -d ~/gimp.tar.gz


w/o Thumb THUMB
1st Attempt real    0m3.341s
user    0m2.520s
sys    0m0.750s
real    0m3.688s
user    0m3.040s
sys    0m0.530s
2nd Attempt real    0m3.702s
user    0m2.740s
sys    0m0.630s
real    0m4.876s
user    0m2.950s
sys    0m0.650s
3rd Attempt real    0m3.372s
user    0m2.680s
sys    0m0.600s
real    0m4.889s
user    0m2.970s
sys    0m0.580s
4th Attempt real    0m3.512s
user    0m2.760s
sys    0m0.470s
real    0m4.800s
user    0m2.890s
sys    0m0.650s
5th Attempt real    0m4.413s
user    0m2.800s
sys    0m0.470s
real    0m4.016s
user    0m2.930s
sys    0m0.620s
6th Attempt real    0m3.316s
user    0m2.430s
sys    0m0.800s
real    0m4.033s
user    0m2.960s
sys    0m0.640s

The numbers didn't fluctuate as much for decompressing with tar. And it seems that the tar that is compiled with THUMB performs little bit slower overall.


Lastly, I have compared THUMB and NON-THUMB using cpio. I tried archiving same extracted gimp directory as above benchmarks and measured the time.

System: cdot-guru-4-1
File: ~/gimp/ (206 directories, 1204 files)
Command Used: time find ~/gimp/ -depth -print | ./cpio -ov -O ~/gimp.cpio


w/o Thumb THUMB
1st Attempt real    0m7.311s
user    0m0.330s
sys    0m2.300s
real    0m7.432s
user    0m0.290s
sys    0m2.480s
2nd Attempt real    0m8.064s
user    0m0.300s
sys    0m2.460s
real    0m7.574s
user    0m0.230s
sys    0m2.510s
3rd Attempt real    0m6.415s
user    0m0.280s
sys    0m2.430s
real    0m6.954s
user    0m0.300s
sys    0m2.510s
4th Attempt real    0m7.167s
user    0m0.250s
sys    0m2.500s
real    0m7.129s
user    0m0.240s
sys    0m2.520s
5th Attempt real    0m8.562s
user    0m0.220s
sys    0m2.540s
real    0m6.228s
user    0m0.290s
sys    0m2.500s
6th Attempt real    0m6.851s
user    0m0.210s
sys    0m2.580s
real    0m7.883s
user    0m0.240s
sys    0m2.610s
7th Attempt real    0m7.382s
user    0m0.290s
sys    0m2.420s
real    0m7.855s
user    0m0.320s
sys    0m2.430s

2 cpios did not show major difference in performance.


At this point, I came up with a conclusion that THUMB does not greatly effect in software's performance, but it varies depending on software.

I've also made records of the duration rpmbuild took for compiling. I'd like to say that I did not notice any major difference in compiling times, although it'd be hard measure this accurately because compiling durations fluctuated in every attempts I made.


w/o Thumb THUMB
nled real 0m13.701s
user 0m12.190s
sys 0m1.020s
real 0m14.821s
user 0m11.830s
sys 0m1.060s
wget real 3m57.207s
user 3m1.920s
sys 0m40.610s
real 3m59.251s
user 3m0.750s
sys 0m39.810s
gimp real 112m40.989s
user 92m48.050s
sys 10m34.130s
real 110m46.166s
user 90m13.740s
sys 10m32.150s
abiword real 148m58.630s
user 130m32.050s
sys 8m51.320s
real 140m30.739s
user 123m38.600s
sys 8m46.910s


So far I have seen mostly improvement that THUMB makes compared to NO-THUMB. Obviously the numbers of experiments I made are far from being enough to come up with a firm solid answer, but for now, I'm positive that THUMB for Fedora-ARM is worth further development.

Please check out my project page in Seneca Cdot Wiki at

This concludes my project To Thumb? or Not to Thumb? and SBR600 in Seneca College. I would like to thank Chris Tyler and Paul Whalen for all the assistance. It has been a great experience to be in SBR600 and be part of CDOT. Thank you.

No comments:

Post a Comment