[maemo-developers] Ogg vorbis/tremor dsp task questions

Thu Sep 6 01:40:38 EEST 2007

Hello all,

Don't get too excited, I'm writing code, I don't have anything working. 
I should also add that I need to check about the copyright for some of 
the ASM code I'm using before I can release anything very much, so this 
is a theoretical discussion more than anything else.

There are some bits of c/pseudo-c code here (which is me getting my 
ideas down more than anything else) if anyone wants to look at the 
structures I am/was planning to use: 
http://people.bath.ac.uk/enpsgp/nokia770/dsp/vorbisdec/

Anyway, my email is really to pick people's brains as to how to 
implement the split between the ARM and the DSP. I'm picking up the way 
that Tremor works as I go along, but am by no means an expert.

The problem is really how to split the work across the two processors. 
Using the DSP gateway one usually sends either buffers of data or 
single word data to the DSP and it then processes them (or data in 
shared memory) and returns. It is (I believe) bad form to run a 
function on the DSP that never returns. For one thing polling must be 
disabled, and for another I'm not sure that any other DSP tasks would 
be able to run at the same time (e.g. a pcm dsp sink).

Nevertheless my first try was to run the whole of the Tremor code on 
the DSP. I wrote some callbacks for use with the ov_open_callbacks() 
function so that data could be written to the shared memory buffer when 
requested by the DSP. The DSP also signals the ARM (which blocks and 
waits for signals from the DSP) when it needs more data or has pcm data 
to be read from the output buffer.

This code is not completely finished - I need to sort out the block 
allocation code on the DSP side, but it's heading the right way once I 
clean it up (see link above). But... this doesn't really seem to be the 
"right way" to do this job. The DSP is called using one of the usual 
word-receive callbacks (i.e. the ARM can send one message to the DSP 
and then cannot ever again via the DSP gateway mechanisms) and then 
enters a function from which it never returns. I've no idea what this 
will do to other DSP tasks, but it just feels wrong.

My next thought was to try to separate out the file opening and leave 
that on the ARM and to send the DSP complete vorbis packets to process 
and use. This should work nicely for sound data as a single vorbis 
packet is processed at one time by ov_read() and output. Therefore the 
ARM could signal the DSP that a new packet is available and that it 
should process it and then return. Unfortunately it's a bit more 
complicated as sometimes extra packets are needed to setup the 
codebooks, etc. This doesn't sound too bad in theory (the DSP could 
signal that it needs 3 packets for the decoder setup and then wait for 
the ARM to send them over, then continue), but the code is pretty well 
mixed in together (in ov_read() and the functions it calls). This is 
where I'm asking for some help/advice/pointers to useful docs/different 
code. I think that this approach is probably the cleanest split, it 
just needs either more thought on my part, or some outside input from 
vorbis/tremor experts.

Since trying this approach, and being thoroughly frustrated by my lack 
of understanding of how to split up the code, I thought I'd just 
implement the *_dsp_* functions on the DSP and use wrapper functions on 
the ARM side so that the code doesn't need to be altered too much/at 
all. I had thought this would work well, but am now encountering the 
wonders of needing to copy across a vorbis_info struct (and all its 
associated pointers and data) to the DSP side (This code is at the url 
above). I suppose this is not so bad, but it is hassle and it brings me 
back to the second idea and makes me wonder if it wouldn't be better to 
do it that way (and avoid constructing these structures on the ARM-side 
at all).

I should add that with all this copying (caused by creating things on 
the ARM and copying them to the DSP), one needs to perform endianness 
changes as the DSP is bigendian and the ARM is littleendian. This makes 
things a bit more complex and messy.

Therefore, I'm interested in any input as to what would be the best way 
to attack this problem. If anyone needs further clarification of the 
way one needs to interface with the DSP then I'm more than happy to 
help, either on email or on IRC.

Many thanks,

Simon