Saturday, December 1, 2012

Using performance counters on linux

One extremely useful feature linux has to offer is the ability to profile your system (user space and kernel) using the perf utility.

In a nutshell this utility allows you to count hardware and software events in linux kernel. Additional bonus is that when you count these events you can record cpu instruction pointer at the time of the event. Instruction pointer records, later can be used to generate concise execution profile of the kernel/user space code.

Similarly to git, perf uses sub-utils to introduce various functionality:

The first step is to list available events:

root@dev-12:# perf list

List of pre-defined events (to be used in -e):
  cpu-cycles OR cycles                               [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]

  cpu-clock                                          [Software event]
  task-clock                                         [Software event]
  page-faults OR faults                              [Software event]

There are tons of events which we can divide into the following main categories:
  • Hardware events
  • Hardware cache events
  • Software events
  • Tracepoint events
Tracepoints events are special places in kernel that were specified by developers as a good position to trace. Stopping there usually brings you to the location where some important kernel functions starts or completes.

For example block:block_rq_complete  trace event is passed when block i/o request completes.

Hardware events make use of special cpu hardware registers, which count cpu cpecific hardware events and trigger interrupt when certain threshold is passed. Software events do not require special hardware support and usually generated by kernel handlers which process special events such as page fault.

To watch statistics of events in the system you can use stat command:

root@dev-12:# perf stat -e syscalls:sys_enter_write,page-faults \
                                       echo hello world > 1.txt

 Performance counter stats for 'echo hello world':

                 1 syscalls:sys_enter_write                                    
               165 page-faults                                                 

       0.001013974 seconds time elapsed

filtering using -e flag is possible and it is possible to count events on existing process using -p flag

Another step in using perf is to realize that events can be recorded using record command:

root@dev-12:# perf record -f hdparm -t /dev/vda > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB (~1673 samples) ]

And later code execution profile can be created using report command (each time event occur the current instruction pointer is recorded as well):

root@dev-12:/home/local/a/hdparm-9.37# perf report
# Events: 1K cpu-clock
# Overhead  Command      Shared Object                                Symbol
# ........  .......  .................  ....................................
    25.80%   hdparm  [kernel.kallsyms]  [k] copy_user_generic_string
     8.21%   hdparm  [kernel.kallsyms]  [k] blk_flush_plug_list
     4.69%   hdparm  [kernel.kallsyms]  [k] get_page_from_freelist
     3.62%   hdparm  [kernel.kallsyms]  [k] add_to_page_cache_locked
     3.09%   hdparm  hdparm             [.] read_big_block
     2.77%   hdparm  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     2.51%   hdparm  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.87%   hdparm  [kernel.kallsyms]  [k] __mem_cgroup_commit_charge
     1.76%   hdparm  [kernel.kallsyms]  [k] file_read_actor
     1.76%   hdparm  [kernel.kallsyms]  [k] __alloc_pages_nodemask
     1.55%   hdparm  [kernel.kallsyms]  [k] alloc_pages_current

We can filter the kernel code and see where in our program we spend most of our time

root@dev-12:# perf report -d hdparm
# Events: 64  cpu-clock
# Overhead  Command          Symbol
# ........  .......  ..............
    90.62%   hdparm  read_big_block
     9.38%   hdparm  time_device

Also by default we use cpu-clock event as the point where we stop to look at our code execution. We can use different events to find out interesting things. For example what are the places in our code that cause page faults:

root@dev-12:# perf record -f -e page-faults -F 100000 hdparm -t /dev/vda > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.006 MB (~278 samples) ]
root@dev-12:# perf report
# Events: 87  page-faults
# Overhead  Command      Shared Object                    Symbol
# ........  .......  .................  ........................
    72.05%   hdparm  hdparm             [.] prepare_timing_buf
    10.96%   hdparm         [.] 0x16b0          
     5.34%   hdparm       [.] 0x86fc0         
     2.53%   hdparm  [kernel.kallsyms]  [k] copy_user_generic_string
     1.26%   hdparm       [.] __ctype_init
     1.26%   hdparm       [.] _IO_vfscanf
     1.26%   hdparm       [.] strchrnul
     1.26%   hdparm       [.] mmap64
     1.26%   hdparm  hdparm             [.] main
     1.26%   hdparm  hdparm             [.] get_dev_geometry
     1.26%   hdparm  [kernel.kallsyms]  [k] __strncpy_from_user
     0.28%   hdparm  [kernel.kallsyms]  [k] __clear_user

To conclude, perf utility is extremely powerful tool. It provides accurate statistics with very little overhead, doesn't require recompilation and can be used to completely understand the overheads of kernel modules and user space application.

What a nice addition to the arsenal of your favorite hacking tools :)

Saturday, November 24, 2012

Software Craftsmanship vs Hackers

Recently it came to my mind that there are two prime schools of programmers in the world. These two groups have opposite values and possess orthogonal skills.

The first group called "Software craftsmanship" is focused on the art of writing beautiful and flexible code. Talented people, who belong to this group, build beautiful software designs, practice TDD/BDD and re-factor their code until the last code-smell disappears. The role models for this group are people like uncle bob and they all have read pragmatic programmer and clean code books.

The other group is what we call "hackers". Not only in the sense of people practicing computer security and reverse engineering but also in the broader sense of people who have low level orientation, can quickly understand how closed system is working and modify it to server their purposes. These extraordinary people can write truly amazing software in short period of time and use one line of perl to destroy somebody's world (try the next perl peace of code if you feel lucky) .

# /usr/bin/env perl
# Danger !!! This script can kill your root file system

$? ? s:;s:s;;$?: : s;;=]=>%-{<-|}<&|`{; ;
y; -/:-@[-`{-};`-{/" -; ;

Now if you ask your fellow "hacker" to design production system there is high probability that you will get your system soon enough written in ASSEMBLY :). Or you will get it in your favorite c++ without classes, inheritance, templates, stl and all this fancy stuff real hackers never use. And god forbid no unit testing or comments in code because it is written in plain c++ or c or whatever... And you should know how to read code! And it works! Unless there is bug which you can fix in 1 sec :)

If you ask your enlightened  friend from the "Software craftsmanship" group to find out why your mouse stops working whenever you send UDP packet to port 666. There is big chance that you will find him after 10 hours of googling and sending mails to vendors and support teams all around the world, telling you that you mouse is stupid and there is no real service on port 666 and that there is no problem with port 667 so who cares ...

So what should you do ? Whom should you join ? As with everything in the world there is no black and white only. In my opinion a good programmer should have a balance of the two types of skills. Don't compromise on the quality of your code and don't afraid of low level stuff. Create quick prototypes when you need but remember that later real people will have to read and maintain your code. And most important remember that it is better to be healthy and rich then poor and sick (or just poor or just sick).

Friday, November 9, 2012

Another one bites the blog

So after many many years of not doing what the rest of the world is doing (eating hamburgers ? ), I finally decided to join the community and start my own technical blog.

In this blog I hope to focus on some of the interesting stuff I enjoy in my profession: low level programming, kernel, storage, virtualization and functional programming.

The name of the blog: meta-x86 is a merge of two things: the shortcut in my favorite editor (meta+x) and the architecture I enjoy hacking.

Get ready for quality posts :)