DTraceProbes
From Plex
This is a primer of how to embed DTrace Probes into the PLEX codebase. It is based on a very good tutorial http://cocoasamurai.blogspot.com/2009/01/debug-cocoa-with-dtrace-guide-embedding.html Tutorial from Colin Wheeler (a must read).
This example shows how to measure the transfer rate of 2 image copy functions (based on memcpy) inside the DVDPlayer code. I was just using this to get warm to DTrace in Leopard/Plex. Since PLEX is Mac 10.5 only we can safely assume that DTrace can be used for the next years.
PS: this code example is only valid for INTEL based Macs !!! (it uses SSE registers/operands)
DEFINE THE PROBES
At first you have to create a definition of the probes you want to embed in the source. This is done via a a ".d" file. So i just added a new folder into xbmc/osx called dTraceProbes and put "DVDCodecUtilsProbes.d" into it - here is what it looks like:
/*
* DVDCodecUtils probes definitions
*/
provider DVDPlayerCodecUtils{
/*
* Whenever an image copy is performed inside DVDPlayerCodecUtils this probe is fired
*
* arg0 = copy type (0=complete image in one rush, 1=line per line because of different strides)
*/
probe img_copy_start(int);
/*
* Whenever an image copy has been performed, this probe is fired.
*
* arg0 = number of bytes copied
*/
probe img_copy_end(int);
};
Really simple - the start probe is fired everytime a new copy operation is about to start and the second probe is fired whenever a copy operation has been finished, giving the number of bytes it copied. Add this file to the folder and manually compile it via the context-sensitive menu. XCode will create a .h file (here DVDCodecUtilsProbes.h) into the derivedSources directory inside your Build dir. Now all you need to do is to include the header file of the probe definition and add your probes:
#include "DVDCodecUtilsProbes.h"
and place the probes in the right functions/methods:
see attached source (here an excerpt):
#include "stdafx.h"
#include "DVDCodecUtils.h"
#include "cores/VideoRenderers/RenderManager.h"
#include "DVDCodecUtilsProbes.h"
// to switch between mmx and std memcopy
#define __MMX_MCOPY__
void x_memcpy(void* d, const void* s, unsigned n);
void x_memcpyx(void* d, const void* s, unsigned n, unsigned lines, unsigned dDiff,unsigned sDiff);
...
bool CDVDCodecUtils::CopyPicture(YV12Image* pImage, DVDVideoPicture *pSrc)
{
BYTE *s = pSrc->data[0];
BYTE *d = pImage->plane[0];
int w = pSrc->iWidth;
int h = pSrc->iHeight;
if ((w == pSrc->iLineSize[0]) && ((unsigned int) pSrc->iLineSize[0] == pImage->stride[0]))
{
DVDPLAYERCODECUTILS_IMG_COPY_START(0);
x_memcpy(d, s, w*h);
}
else
{
DVDPLAYERCODECUTILS_IMG_COPY_START(1);
x_memcpyx(d,s,w,h,pImage->stride[0],pSrc->iLineSize[0]);
}
s = pSrc->data[1];
d = pImage->plane[1];
w = pSrc->iWidth >> 1;
h = pSrc->iHeight >> 1;
if ((w==pSrc->iLineSize[1]) && ((unsigned int) pSrc->iLineSize[1]==pImage->stride[1]))
{
x_memcpy(d, s, w*h);
}
else
{
x_memcpyx(d,s,w,h,pImage->stride[1],pSrc->iLineSize[1]);
}
s = pSrc->data[2];
d = pImage->plane[2];
if ((w==pSrc->iLineSize[2]) && ((unsigned int) pSrc->iLineSize[2]==pImage->stride[2]))
{
x_memcpy(d, s, w*h);
}
else
{
x_memcpyx(d,s,w,h,pImage->stride[2],pSrc->iLineSize[2]);
}
DVDPLAYERCODECUTILS_IMG_COPY_END(w*h*6);
return true;
}
The x_memcpyx function is inline assembler code to do aligned MMX transfers (using 1 mmx register):
void x_memcpy(void* d,const void* s,unsigned n){
x_memcpyx(d,s,n,1,0,0);
}
void x_memcpyx(void* d, const void* s, unsigned n, unsigned lines, unsigned dDiff,unsigned sDiff)
{
BYTE* dd = (BYTE*) d;
BYTE* ss = (BYTE*) s;
#ifndef __MMX_MCOPY__
for(int i = 0 ; i < lines ; i++){
memcpy(dd, ss, n);
dd+=dDiff;
ss+=sDiff;
}
return;
#else
do{
// around 50% faster than memcpy
// only worthwhile if the destination buffer is not likely to be read back immediately
// and the number of bytes copied is >16
// somewhat faster if the source and destination are a multiple of 16 bytes apart
__asm {
mov edx, n
mov esi, ss
prefetchnta [esi]
prefetchnta [esi + 32]
mov edi, dd
// pre align
mov eax, edi
mov ecx, 16
and eax, 15
sub ecx, eax
and ecx, 15
cmp edx, ecx
jb fmc_exit_main
sub edx, ecx
test ecx, ecx
fmc_start_pre:
jz fmc_exit_pre
mov al, [esi]
mov [edi], al
inc esi
inc edi
dec ecx
jmp fmc_start_pre
fmc_exit_pre:
mov eax, esi
and eax, 15
jnz fmc_notaligned
// main copy, aligned
mov ecx, edx
shr ecx, 4
fmc_start_main_a:
jz fmc_exit_main
prefetchnta [esi + 32]
movaps xmm0, [esi]
movntps [edi], xmm0
add esi, 16
add edi, 16
dec ecx
jmp fmc_start_main_a
fmc_notaligned:
// main copy, unaligned
mov ecx, edx
shr ecx, 4
fmc_start_main_u:
jz fmc_exit_main
prefetchnta [esi + 32]
movups xmm0, [esi]
movntps [edi], xmm0
add esi, 16
add edi, 16
dec ecx
jmp fmc_start_main_u
fmc_exit_main:
// post align
mov ecx, edx
and ecx, 15
fmc_start_post:
jz fmc_exit_post
mov al, [esi]
mov [edi], al
inc esi
inc edi
dec ecx
jmp fmc_start_post
fmc_exit_post:
}
dd+=dDiff;
ss+=sDiff;
}while(--lines);
#endif
}
This code was taken from memutils.c to provide better locality and to be able to change it (copying all lines in one rush).
Now compile everything and start your freshly built PLEX. Start playing some 1080p MKV.
Enter the following into any terminal window:
sudo dtrace -n ':Plex::{printf("HIT");}'
and you will see something like:
... 3 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 4 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 4 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 4 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 4 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 4 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 5 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 5 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 5 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 6 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 6 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 6 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 6 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT 6 19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT 7 19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT ...
Ok, the dTrace probes are working. Now lets do something useful with them:
Start Instruments with a blank template
open the library and add a blank DTrace instrument:
double click the instrument and change it to:
In Probe1 we match any img_copy_start probes and record the current time (in nanoseconds). - We do not record any data so the probe wont show later.
In Probe2 we match any img_copy_end probes and calculate the time the copy operation took in total and use the arg0 (number of bytes copied) argument to calculate the memory throughput of the copy operation.
Save this instrument and after a short time the figures will show up in the instruments history:
remember to change the definition from
#define __MMX_MCOPY__1
to
#define __MMX_MCOPY__
to activate MMX based memcpy.
Since you can export your DTrace instrument as a DTrace script for commandline usage i also added this file to the attachments.
References:
Colins transcript
Sources for this example: Media:DTRACE_Sources.zip


