DTraceProbes

DTraceProbes

From Plex

Jump to: navigation, search

This is a primer of how to embed DTrace Probes into the PLEX codebase. It is based on a very good tutorial http://cocoasamurai.blogspot.com/2009/01/debug-cocoa-with-dtrace-guide-embedding.html Tutorial from Colin Wheeler (a must read).

This example shows how to measure the transfer rate of 2 image copy functions (based on memcpy) inside the DVDPlayer code. I was just using this to get warm to DTrace in Leopard/Plex. Since PLEX is Mac 10.5 only we can safely assume that DTrace can be used for the next years.

PS: this code example is only valid for INTEL based Macs !!! (it uses SSE registers/operands)


DEFINE THE PROBES

At first you have to create a definition of the probes you want to embed in the source. This is done via a a ".d" file. So i just added a new folder into xbmc/osx called dTraceProbes and put "DVDCodecUtilsProbes.d" into it - here is what it looks like:


/*
* DVDCodecUtils probes definitions
*/
provider DVDPlayerCodecUtils{
	
	/*
	 * Whenever an image copy is performed inside DVDPlayerCodecUtils this probe is fired
	 *
	 * arg0 = copy type (0=complete image in one rush, 1=line per line because of different strides)
	 */
	probe img_copy_start(int);
	
	/*
	 * Whenever an image copy has been performed, this probe is fired.
	 *
	 * arg0 = number of bytes copied
	 */
	probe img_copy_end(int);
	
};

Really simple - the start probe is fired everytime a new copy operation is about to start and the second probe is fired whenever a copy operation has been finished, giving the number of bytes it copied. Add this file to the folder and manually compile it via the context-sensitive menu. XCode will create a .h file (here DVDCodecUtilsProbes.h) into the derivedSources directory inside your Build dir. Now all you need to do is to include the header file of the probe definition and add your probes:


#include "DVDCodecUtilsProbes.h"

and place the probes in the right functions/methods:

see attached source (here an excerpt):


#include "stdafx.h"
#include "DVDCodecUtils.h"
#include "cores/VideoRenderers/RenderManager.h"
#include "DVDCodecUtilsProbes.h"

// to switch between mmx and std memcopy
#define __MMX_MCOPY__
void x_memcpy(void* d, const void* s, unsigned n);
void x_memcpyx(void* d, const void* s, unsigned n, unsigned lines, unsigned dDiff,unsigned sDiff);

...

bool CDVDCodecUtils::CopyPicture(YV12Image* pImage, DVDVideoPicture *pSrc)
{
  BYTE *s = pSrc->data[0];
  BYTE *d = pImage->plane[0];
  int w = pSrc->iWidth;
  int h = pSrc->iHeight;
  if ((w == pSrc->iLineSize[0]) && ((unsigned int) pSrc->iLineSize[0] == pImage->stride[0]))
  {
	  DVDPLAYERCODECUTILS_IMG_COPY_START(0);
	  x_memcpy(d, s, w*h);
  }
  else
  {
	  DVDPLAYERCODECUTILS_IMG_COPY_START(1);
	  x_memcpyx(d,s,w,h,pImage->stride[0],pSrc->iLineSize[0]);
  }
  s = pSrc->data[1];
  d = pImage->plane[1];
  w = pSrc->iWidth >> 1;
  h = pSrc->iHeight >> 1;
  if ((w==pSrc->iLineSize[1]) && ((unsigned int) pSrc->iLineSize[1]==pImage->stride[1]))
  {
    x_memcpy(d, s, w*h);
  }
  else
  {
	  x_memcpyx(d,s,w,h,pImage->stride[1],pSrc->iLineSize[1]);
  }
  s = pSrc->data[2];
  d = pImage->plane[2];
  if ((w==pSrc->iLineSize[2]) && ((unsigned int) pSrc->iLineSize[2]==pImage->stride[2]))
  {
    x_memcpy(d, s, w*h);
  }
  else
  {
	  x_memcpyx(d,s,w,h,pImage->stride[2],pSrc->iLineSize[2]);
  }
	DVDPLAYERCODECUTILS_IMG_COPY_END(w*h*6);

	return true;
}

The x_memcpyx function is inline assembler code to do aligned MMX transfers (using 1 mmx register):

void x_memcpy(void* d,const void* s,unsigned n){
	x_memcpyx(d,s,n,1,0,0);
}

void x_memcpyx(void* d, const void* s, unsigned n, unsigned lines, unsigned dDiff,unsigned sDiff)
{

	BYTE* dd = (BYTE*) d;
	BYTE* ss = (BYTE*) s;
#ifndef __MMX_MCOPY__	
	for(int i = 0 ; i < lines ; i++){
		memcpy(dd, ss, n);
		dd+=dDiff;
		ss+=sDiff;
	}
	return;
#else
	do{
	// around 50% faster than memcpy
	// only worthwhile if the destination buffer is not likely to be read back immediately
	// and the number of bytes copied is >16
	// somewhat faster if the source and destination are a multiple of 16 bytes apart
	__asm {
		mov edx, n
		mov esi, ss
		prefetchnta [esi]
		prefetchnta [esi + 32]
		mov edi, dd
		
		// pre align
		mov eax, edi
		mov ecx, 16
		and eax, 15
		sub ecx, eax
		and ecx, 15
		cmp edx, ecx
		jb fmc_exit_main
		sub edx, ecx
		
		test ecx, ecx
	fmc_start_pre:
		jz fmc_exit_pre
		
		mov al, [esi]
		mov [edi], al
		
		inc esi
		inc edi
		dec ecx
		jmp fmc_start_pre
		
	fmc_exit_pre:
		mov eax, esi
		and eax, 15
		jnz fmc_notaligned
		
		// main copy, aligned
		mov ecx, edx
		shr ecx, 4
	fmc_start_main_a:
		jz fmc_exit_main
		
		prefetchnta [esi + 32]
		movaps xmm0, [esi]
		movntps [edi], xmm0
		
		add esi, 16
		add edi, 16
        dec ecx
        jmp fmc_start_main_a
		
	fmc_notaligned:
        // main copy, unaligned
        mov ecx, edx
        shr ecx, 4
	fmc_start_main_u:
        jz fmc_exit_main
		
        prefetchnta [esi + 32]
        movups xmm0, [esi]
        movntps [edi], xmm0
		
        add esi, 16
		add edi, 16
		dec ecx
		jmp fmc_start_main_u
		
	fmc_exit_main:
		
		// post align
		mov ecx, edx
		and ecx, 15
	fmc_start_post:
		jz fmc_exit_post
		
		mov al, [esi]
		mov [edi], al
		
		inc esi
		inc edi
		dec ecx
		jmp fmc_start_post
		
	fmc_exit_post:
	}
		dd+=dDiff;
		ss+=sDiff;
	}while(--lines);
#endif
}

This code was taken from memutils.c to provide better locality and to be able to change it (copying all lines in one rush).

Now compile everything and start your freshly built PLEX. Start playing some 1080p MKV.

Enter the following into any terminal window:

sudo dtrace -n ':Plex::{printf("HIT");}'

and you will see something like:

...
  3  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  4  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  4  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  4  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  4  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  4  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  5  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  5  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  5  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  6  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  6  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  6  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  6  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
  6  19586 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_end HIT
  7  19588 _ZN14CDVDCodecUtils11CopyPictureEP9YV12ImageP17stDVDVideoPicture:img_copy_start HIT
...

Ok, the dTrace probes are working. Now lets do something useful with them:

Start Instruments with a blank template

open the library and add a blank DTrace instrument: DTRACE Instruments1.jpg

double click the instrument and change it to:

DTRACE Instruments2.jpg

In Probe1 we match any img_copy_start probes and record the current time (in nanoseconds). - We do not record any data so the probe wont show later.

In Probe2 we match any img_copy_end probes and calculate the time the copy operation took in total and use the arg0 (number of bytes copied) argument to calculate the memory throughput of the copy operation.

Save this instrument and after a short time the figures will show up in the instruments history:

DTRACE Instruments3.jpg (mmx memcpy for mkv 1080p)

DTRACE Instruments4.jpg (std. memcpy for mkv 1080p)

remember to change the definition from

#define __MMX_MCOPY__1

to

#define __MMX_MCOPY__

to activate MMX based memcpy.

Since you can export your DTrace instrument as a DTrace script for commandline usage i also added this file to the attachments.


References:

Colins transcript  

Sources for this example: Media:DTRACE_Sources.zip