[maemo-developers] Profiling on Nokia N810

From: Eero Tamminen eero.tamminen at nokia.com
Date: Fri Sep 5 18:29:30 EEST 2008
Hi,

Please keep the maemo-devel on CC.

ext Bruno wrote:
> I tried to use opreport, but the result are showed on the nokia
> screen, and unreadable.

Sorry I don't understand.  How they are unreadable?
(you're using ssh to the device, aren't you?)


 > On top of that it doesn't seem to gives much
> information (i dit opreport -c -l like said in the doc).

You need to have debug symbol packages (available from the maemo
repositories) installed if you want function names.  If you want
Oprofile call-graphs from ARM, you need to re-compile all the related
software with frame-pointers, but that's really too much pain
especially considering how much nicer Kcachegrind UI[1] is for
analysing them and the source code:
	http://kcachegrind.sourceforge.net/cgi-bin/show.cgi


Btw. Oprofile has also an UI:
	http://maemo.org/development/tools/doc/diablo/oprofileui/

You can run the UI on desktop and connect to the oprofile-server
running on the device (the debug symbol files will still need to
be on the device for function names).


> Is it possible to have a text output with opreport ?

By default it outputs text?


> Concerning profiling on x86, the result will be similar to what I
> would get profiling on Nokia ?

Yes, in general that seems to be true.


> What's the point of profiling in a X86 scratchbox environment ? It
> would be the same on any normal linux x86 computer isn't it ?

The point is to get same environment as on the target.  Same versions
of all the libraries, X server with 16-bit display (Xephyr) and certain
set of X extensions etc.  The more differencies you have in your
environment, the less the results will correspond.


> I wanted to profile it on ARM because some part will be harder to
> process with that kind of architecture, especially floating point
> calculations parts I guess.
 >
> But if this results are proportional to what I would get on the arm,
> then I will probably do that on x86 !

I don't know how true that is for floating point operations, ARM VFP is
a bit limited compared to Intel FPUs.  So, I would do speed measurements
on ARM (which tests you need anyway to validate the optimizations), find
bottleneck functions on ARM with Oprofile+debug symbol packages (or
instead of installing debug packages, re-building the sources with "-g"
added to compiler flags) and do the main work in performance analysis
i.e. getting an understanding of how the code really works :-) with
Kcachegrind (AFAIK the best available open source perf tool and
important when reading lots of code written by others).

It's important to use different methods as they can point out different
things.  Valgrind/cachegrind profiles only a single process, but
Oprofile profiles the whole system.  I.e. from Oprofile data you can
also see if you're stressing some other part of the system than your
own program and then try optimizing use of that.


	- Eero

> Bruno
> 
> 
> 2008/9/5, Eero Tamminen <eero.tamminen at nokia.com>:
>> Hi,
>>
>>  ext Bruno wrote:
>>
>>>  I've been trying to profile my program on the nokia N810 for some
>>>  times now, and I'm not able to get good results. I installed the
>>>  oprofile package and the oprofile modified kernel, and then tried to
>>>  run gprof.
>>>
>>>  I tried 2 ways to get my profiling information (my program is called
>>>  src ... not really explicit I know ! ) :
>>>
>>>  compile with normal compiler paramaters, no -pg. Then :
>>>
>>>  Nokia-N810:~# opcontrol --init
>>>  Nokia-N810:~# opcontrol --no-vmlinux
>>>  Nokia-N810:~# opcontrol -e=CPU_CYCLES:100000
>>>  Nokia-N810:~# opcontrol --start
>>>  Nokia-N810:~# ./src
>>>  Nokia-N810:~# opcontrol --stop
>>>  Nokia-N810:~# opgprof src
>>>  Nokia-N810:~# gprof src > lala.txt
>>>  gprof: gmon.out is file is missing call-graph data
>>>  Nokia-N810:~# gprof -Q src > lala.txt
>>>
>>  Why not just use "opreport" like suggested in the documentation:
>>
>> http://maemo.org/development/tools/doc/diablo/oprofile/
>>  ?
>>
>>  FYI: if I want callgraphs, I'll profile on x86 with Valgrind+callgrind
>>  (in Scratchbox) and view the results with Kcachegring (outside
>>  Scratchbox).  Callgrind gives *much* better callgraphs and UI/usability
>>  than oprofile or gprof.
>>
>>  This of course assumes that your source code works the same on ARM
>>  and x86.
>>
>>
>>  Summary:
>>  - Oprofile for finding ARM bottleneck functions
>>  - Timings code to measure the performance on ARM (profiling disturbs
>>   the code functionality so it's not to be trusted too much)
>>  - Valgrind/Callgrind/Kcachegrind on x86 to _analyze_ the bottlenecks
>>   (why/how the bottlenecks are used by the running code)
>>
>>  I've found that x86 profiling results are mostly accurate even for
>>  ARM, it's rare for major bottlenecks to differ between these two
>>  architectures (although that may happen due to cache size
>>  differencies and VFP vs. FPU) if you've otherwise guaranteed
>>  that the execution environments match.
>>
>>
>>         - Eero
>>


More information about the maemo-developers mailing list