=========================================================================== GPU information. =========================================================================== About this document. --------------------------------------------------------------------------- This document is a collection of all info on the GPU i could find and my own notes. Most of this is the result of experiment, so not all info might be correct. This document is most probably not complete, and not all capabilities and quirks of the GPU are documented. No responsibility is taken for anything that might occur using the information in this document. The K-communications text and the one by Nagra/Blackbag are the basis of this document. Notations and conventions When the format of data is given it's shown as a bitwise representation like this: pixel| | bit |0f|0e 0d 0c 0b 0a|09 08 07 06 05|04 03 02 01 00| desc.|S |Blue |Green |Red | The "pixel" row shows how large the data is in the frame buffer. Each mark one this line denotes the size of the data in frame buffer pixels, as that is the mininum size that kind be addressed. The bit row shows which bits of the data are used, and separators are used to show where the different elements of the data stop and start. MSB is on the left, LSB is on the right. Stuff like |0f-08| means bit $0f to bit $08. The desc. row shows the description of the different elements. With separators where the element starts and ends. -------------------------------------------------------------------------- The Graphics Processing Unit (GPU) - overview. -------------------------------------------------------------------------- The GPU is the unit responsible for the graphical output of the PSX. It handles display and drawing of all graphics. It has the control over an 1MB frame buffer and contains a 2Kb texture cache. It has a command and data port. It has a 64 byte command FIFO buffer, which can hold up to 3 commands and is connected to a DMA channel for transfer of image data and linked command lists and a DMA channel for reverse clearing an OT. --------------------------------------------------------------------------- The Frame Buffer. --------------------------------------------------------------------------- The frame buffer is the memory which stores all grpahic data which the GPU can access and manipulate, while drawing and displaying an image . The memory is under the GPU and cannot be accessed by the CPU directly. It is operated solely by the GPU. The frame buffer has a size of 1 MB and is treated as a space of 1024 pixels wide and 512 pixels high. Each "pixel" has the size of one word (16 bit). It is not treated linearly like usual memory, but is accessed through coordinates, with an upperleft corner of (0,0) and a lower right corner of (1023,511). When data is displayed from the frame buffer, a rectangular area is read from the specified coordinate within this memory. The size of this area can be chosen from several hardware defined types. Note that these hardware sizes are only valid when the X and Y stop/start registers are at their default values. This display area can be displayed in two color formats, being 15bit direct and 24bit direct. The data format of one pixel is as follows: 15bitDirect display. pixel| | bit |0f|0e 0d 0c 0b 0a|09 08 07 06 05|04 03 02 01 00| desc.|M |Blue |Green |Red | This means each color has a value of 0-31. The MSB of a pixel (M) is used to mask the pixel. 24bit Direct Display. The GPU can also be set to 24bit mode, in which case 3 bytes form one pixel, 1 byte for each color. Data in this mode is arranged as follows: pixel|0 |1 |2 | Bit |F-8|7-0|F-8|7-0|F-8|7-0| desc.|G0 |R0 |R1 |B0 |B1 |G1 | Thus 2 display pixels are encoded in 3 frame buffer pixels. They are displayed as follows: [R0,G0,B0] [R1,G1,B1] --------------------------------------------------------------------------- Primitives. --------------------------------------------------------------------------- A basic firgure which the GPU can draw is called a primitive, and it can draw the following: * Polygon The GPU can draw 3 point and 4 point polygons. Each point of the polygon specifies a point in the frame buffer. The polygon can be gouroud shaded. The correct order of vertices for 4 point polygons is as follows: 1--2 Note: A 4 point polygon is processed internally as two 3 point | | polygons. 3--4 Note: When drawing a polygon the GPU will not draw the right most and bottom edge. So a (0,0)-(32,32) rectangle will actually be drawn as (0,0)-(31,31). Make sure adjoining polygons have the same coordinates if you want them to touch eachother!. Haven't checked how this works with 3 point polygons. * Polygon with texture A primitive of this type is the same as above, except that a texture is applied. Each vertex of the polygon maps to a point on a texture page in the frame buffer. The polygon can be gouroud shaded. Note: Because a 4 point polygon is processed internally as two 3 point polygons, texture mapping is also done independently for both halfs. This has some annoying consequences. * Rectangle A rectangle is defined by the location of the top left corner and its width and height. Width and height can be either free, 8*8 or 16*16. It's drawn much faster than a polygon, but gouroud shading is not possible. * Sprite A sprite is a textured rectangle, defined as a rectangle with coordinates on a texture page. Like the rectangle is drawn much faster than the polygon equivalent. No gouroud shading possible. Note: Even though the primitive is called a sprite, it has nothing in common with the traditional sprite, other than that it's a rectangular piece of graphics. Unlike the psx sprite, the traditional sprite is NOT drawn to the bitmap, but gets sent to the screen instead of the actual graphics data at that location at display time. * Line A line is a straight line between 2 specified points. The line can be gouroud shaded. A special form is the polyline, for which an arbitrary number of points can be specified. * Dot The dot primitive draws one pixel at the specified coordinate and in the specified color. It is actually a special form of rectangle, with a size of 1*1. --------------------------------------------------------------------------- Texture --------------------------------------------------------------------------- A texture is an image put on a polygon or sprite. It is necessary to prepare the data beforehand in the frame buffer. This image is called a texture pattern. The texture pattern is located on a texture page which has a standard size and is located somewhere in the frame buffer, see below. The data of a texture can be stored in 3 different modes: * 15bitDirect mode. bit |0f|0e 0d 0c 0b 0a|09 08 07 06 05|04 03 02 01 00| desc.|S |Blue |Green |Red | This means each color has a value of 0-31. The MSB of a pixel (S)is used to specify it the pixel is semi transparent or not. More on that later. * 8bit CLUT mode, Each pixel is defined by 8bits and the value of the pixel is converted to a 15bit color using the CLUT(color lookup table) much like standard vga pictures. So in effect you have 256 colors which are in 15bit precision. Bit: |0F-08|07-00| desc:|I1 |I0 | I0 is the index to the CLUT for the left pixel, I1 for the right. * 4bitCLUT mode, Same as above except that only 16 colors can be used. Data is arranged as follows: Bit |F-C|B-8|7-4|3-0| desc. |I3 |I2 |I1 |I0 | 0 is drawn to the left * Texture Pages Texture pages have a unit size of 256*256 pixels, regardless of colormode. This means that in the frame buffer they will be 64 pixels wide for 4bit CLUT, 128 pixels wide for 8bit CLUT and 256 pixels wide for 15bit direct. The pixels are addressed with coordinates relative to the location of the texture page, not the framebuffer. So the topleft texture coordinate on a texture page is (0,0) and the bottom right one is (255,255) The pages can be located in the frame buffer on X multiples of 64 and Y multiples of 256. More than one texture page can be set up, but each primitive can only contain texture from one page. * Texture Windows The area within a texture window is repeated throughout the texture page. The data is not actually stored all over the texture page but the GPU reads the repeated patterns as if they were there. The X and Y and H and W must be multiples of 8. * CLUT (Color Lookup Table) The clut is a the table where the colors are stored for the image data in the CLUT modes. The pixels of those images are used as indexes to this table. The clut is arranged in the frame buffer as a 256x1 image for the 8bit clut mode, and a 16x1 image for the 4bit clut mode. Each pixel as a 16 bit value, the first 15 used of a 15 bit color, and the 16th used for semitransparency. The clut data can be arranged in the frame buffer at X multiples of 16 (X=0,16,32,48,etc) and anywhere in the Y range of 0-511. More than one clut can be prepared but only one can be used for each primitive. * Texture Caching If polygons with texture are displayed, the GPU needs to read these from the frame buffer. This slows down the drawing process, and as a result the number of polygons that can be drawn in a given timespan. To speed up this process the GPU is equipped with a texture cache, so a given piece of texture needs not to be read multiple times in succession. The texture cache size depends on the color mode used for the textures. In 4 bit CLUT mode it has a size of 64x64, in 8 bit CLUT it's 32x64 and in 15bitDirect is 32x32. A general speed up can be achieved by setting up textures according to these sizes. For further speed gain a more precise knowledge of how the cache works is necessary. - Cache blocks The texture page is divided into non-overlapping cache blocks, each of a unit size according to color mode. These cache blocks are tiled within the texture page. +-----+-----+-----+-- |cache| | | |block| | | 0| 1 | 2 .. +-----+-----+-- | | | .. - Cache entries Each cache block is divided into 256 cache entries, which are numbered sequentially, and are 8 bytes wide. So a cache entry holds 16 4bit clut pixels 8 8bit clut pixels, or 4 15bitdirect pixels. 4bit and 8bit clut: 15bitdirect: +----+----+----+----+ +----+----+----+----+----+----+----+----+ | 0| 1| 2| 3| | 0| 1| 2| 3| 4| 5| 6| 7| +----+----+----+----+ +----+----+----+----+----+----+----+----+ | 4| 5| 6| 7| | 8| 9| a| b| c| d| e| f| +----+----+----+----+ +----+----+----+----+----+----+----+----+ | 8| 9| .. | 10| 11| .. +----+----+-- +----+----+-- | c| ..| | 18| ..| +----+-- +----+-- | .. | .. The cache can hold only one cache entry by the same number, so if f.e. a piece of texture spans multiple cache blocks and it has data on entry 9 if block 1, but also on entry 9 of block 2, these cannot be in the cache at once. --------------------------------------------------------------------------- Rendering options. --------------------------------------------------------------------------- There are 3 modes which affect the way the GPU renders the primitives to the frame buffer. * Semi Transparency When semi transparency is set for a pixel, the GPU first reads the pixel it wants to write to, and then calculates the color it will write from the 2 pixels according to the semitransparency mode selected. Processing speed is lower in this mode because additional reading and calculating are necessary. There are 4 semitransparency modes in the GPU. B= the pixel read from the image in the frame buffer, F = the halftransparent pixel * 0.5 x B + 0.5 x F * 1.0 x B + 1.0 x F * 1.0 x B - 1.0 x F * 1.0 x B +0.25 x F A new semi transparency mode can be set for each primitive. For primitives without texture semi transparency can be selected. For primitives with texture semi transparency is stored in the MSB of each pixel, so some pixels can be set to STP others can be drawn opaque. For the CLUT modes the STP bit is obtained from the CLUT. So if a color index points to a color in the CLUT with the MSB set, it will be drawn semi transparent. When the color is black(BGR=0), STP is processed different from when it's not black (BGR<>0). The table below shows the differences: transparency proccessing (bit 1 of command packet) BGR STP off on 0,0,0 0 Transparent Transparent 0,0,0 1 Non-transparent Non-Transparent x,x,x 0 Non-Transparent Non-Transparent x,x,x 1 Non-Transparent Transparent * Shading The GPU has a shading function, which will scale the color of a primitive to a specified brightness. There are 2 shading modes: Flat shading, and gouraud shading. Flat shading is the mode in which one brightness value is specified for the entire primitive. In Gouraud shading mode, a different brightness value can be given for each vertex of a primitive, and the brightness between these points is automatically interpolated. * Mask The mask function will prevent to GPU to write to specific pixels when drawing in the framebuffer. This means that when the gpu is drawing a primitive to a masked area, it will first read the pixel at the coordinate it wants to write to, check if it's masking bit is set, and if so refrain from writing to that particular pixel. The masking bit is the MSB of the pixel, just like the STP bit. To set this masking bit, the GPU provides a mask out mode, which will set the MSB of any pixel it writes. If both mask out and mask evaluation are on, the GPU will not draw to pixels with set MSB's, and will draw pixels with set MSB's to the others, these in turn becoming masked pixels. --------------------------------------------------------------------------- Drawing Environment --------------------------------------------------------------------------- The drawing environment specifies all global parameters the GPU needs for drawing primitives. * Drawing offset. This locates the top left corner of the drawing area. Coordinates of primitives originate to this point. So if the drawing offset is (0,240) and a vertex of a poligon is located at (16,20) it will be drawn to the frame buffer at (0+16,240+20). * Drawing clip area This specifies the maximum range the GPU draws primitives to. So in effect it specifies the top left and bottom right corner of the drawing area. * Dither enable When dither is enabled the GPU will dither areas during shading. It will process internally in 24 bit and ditter the colors when converting back to 15bit. When it is off, the lower 3 bits of each color simply get discarded. * Draw to display enable. This will enable/disable any drawing to the area that is currently displayed. * Mask enable When turned on any pixel drawn to the framebuffer by the GPU will have a set masking bit. (= set MSB) * Mask judgement enable Specifies if the mask data from the frame buffer is evaluated at the time of drawing. --------------------------------------------------------------------------- Display Environment. --------------------------------------------------------------------------- This contains all information about the display, and the area displayed. * Display area in frame buffer This specifies the resolution of the display. The size can be set as follows: Width: 256,320,384,512 or 640 pixels Height: 240 or 480 pixels These sizes are only an indication on how many pixels will be displayed using a default start end. These settings only specify the resolution of the display. * Display start/end. Specifies where the display area is positioned on the screen, and how much data gets sent to the screen. The screen sizes of the display area are valid only if the horizontal/vertical start/end values are default. By changing these you can get bigger/smaller display screens. On most TV's there is some black around the edge, which can be utilised by setting the start of the screen earlier and the end later. The size of the pixels is NOT changed with these settings, the GPU simply sends more data to the screen. Some monitors/TVs have a smaller display area and the extended size might not be visible on those sets.(Mine is capable of about 330 pixels horizontal, and 272 vertical in 320*240 mode) * Interlace enable When enabled the GPU will display the even and odd lines of the display area alternately. It is necessary to set this when using 480 lines as the number of scan lines on a TV screen are not sufficient to display 480 lines. * 15bit/24bit direct display Switches between 15bit/24bit display mode. * Video mode Selects which video mode to use, which are either PAL or NTSC. -------------------------------------------------------------------------- Communication and OT's. -------------------------------------------------------------------------- All data regarding drawing and drawing environment are sent as packets to the GPU. Each packet tells the GPU how and where to draw one primitive, or it sets one of the drawing environment parameters. The display environment is set up through single word commands using the control port of the GPU. Packets can be forwarded word by word through the data port of the GPU, or more efficiently for large numbers of packets through DMA. A special DMA mode was created for this so large numbers of packets can be sent and managed easily. In this mode a list of packets is sent, where each entry in the list contains a header which is one word containing the address of the next entry and the size of the packet and the packet itself. A result of this is that the packets do not need to be stored sequentially. This makes it possible to easily control the order in which packets get processed. The GPU processes the packets it gets in the order they are offered. So the first entry in the list also gets drawn first. To insert a packet into the middle of the list simply find the packet after which you want it to be processed, replace the address in that packet with the address of the new packet, and let that point to the address you replaced. To aid you in finding a location in the list the Ordering Table was invented. At first this is basically a linked list with entries of packet size 0, so it's a list of only listentryheaders, where each entry points to to the next entry. Then as primitives are generated by your program you can then add them to the table at a certain index. Just read the address in the table entry and replace it with the address of the new packet and store the address from the table in the packet. When all packets are generated and you want to draw, just pass the address of the first listentry to the DMA and the packets will get drawn in the order you entered the packets to the table. Packets entered at a higher table index will get drawn after those entered at a lower table index. Packets entered at the same index will get drawn in the order they were entered, the last one first. In 3d drawing it's most common that you want the primitives with the highest Z value to be drawn first, so it would be nice if the table would be drawn the other way around, so the Z value can be used as index. This is a simple thing, just make a table of which each entry points to the previous entry, and start the DMA with the address of the last table entry. To assist you in making such a table, a special DMA channel is available which creates it for you. -------------------------------------------------------------------------- GPU operation -------------------------------------------------------------------------- * GPU control registers. There are 2 32 bit io ports for the GPU, which are: $1f801810 GPU Data $1f801814 GPU control/Status The data register is used to exchange data with the GPU. The control/status register, gives the status of the GPU when read, and sets the control bits when written to. * Control/Status Register $1f801814 Status (Read) ----------------------------------------------------------------------------- |1f |1e 1d|1c |1b |1a |19 18|17 |16 |15 |14 |13 |12 11 |10 | |lcf|dma |com|img|busy| ? ?|den|isinter|isrgb24|Video|Height|Width0|Width1| ----------------------------------------------------------------------------- W0 W1 Width: 00 0 256 pixels 01 0 320 10 0 512 11 0 640 00 1 384 Height: 0 240 pixels 1 480 Video: 0 NTSC 1 PAL isrgb24: 0 15 bit direct mode 1 24 bit direct mode isinter: 0 Interlace off 1 Interlace on den: 0 Display enabled 1 Display disabled busy: 0 GPU is Busy (ie. drawing primitives) 1 GPU is Idle img: 0 Not Ready to send image (packet $c0) 1 Ready com: 0 Not Ready to recieve commands 1 Ready dma: 00 DMA off, communication through GP0 01 10 DMA CPU -> GPU 11 DMA GPU -> CPU lcf: 0 Drawing even lines in interlace mode 1 Drawing uneven lines in interlace mode ---------------------------------------------------- |0f 0e 0d|0c|0b|0a |09 |08 07|06 05|04|03 02 01 00| | ? ? ?|me|md|dfe |dtd|tp |abr |ty|tx | ---------------------------------------------------- tx: 0 0 Texture page X = tx*64 1 64 2 128 3 196 4 ... ty 0 0 Texture page Y 1 256 abr %00 0.5xB+0.5 xF Semi transparent state %01 1.0xB+1.0 xF %10 1.0xB-1.0 xF %11 1.0xB+0.25xF tp %00 4bit CLUT Texture page color mode %01 8bit CLUT %10 15bit dtd 0 Ditter off 1 Ditter on dfe 0 Draw to display area prohibited 1 Draw to display area allowed md 0 off 1 on Apply mask bit to drawn pixels. me 0 off 1 on No drawing to pixels with set mask bit. Control (Write) -------------------------------------------------------------------------- A control command is composed of one word as follows: bit 1f-18 17-0 command parameter. The composition of the parameter is different for each command. -------------------------------------------------------------------------- *Reset GPU command $00 parameter $000000 Description Resets the GPU. Also seems to turn off screen. (sets status to $14802000) -------------------------------------------------------------------------- *Reset Command Buffer command $01 parameter $000000 Description Resets the command buffer. -------------------------------------------------------------------------- *Reset IRQ command $02 parameter $000000 Description Resets the IRQ. No idea of what this means. -------------------------------------------------------------------------- *Display Enable command $03 parameter $000000 Display enable $000001 Display disable Description Turns on/off display. Note that a turned off screen still gives the flicker of NTSC on a pal screen if NTSC mode is selected.. -------------------------------------------------------------------------- *DMA setup. command $04 parameter $000000 DMA disabled $000001 DMA ? $000002 DMA CPU to GPU $000003 DMA GPU to CPU Description Sets dma direction. K-comm also mentions something about parameter $01, but i wasn't able to translate. -------------------------------------------------------------------------- *Start of display area command $05 parameter bit $00-$09 X (0-1023) bit $0A-$12 Y (0-512) = Y<<10 + X description Locates the top left corner of the display area. -------------------------------------------------------------------------- *Horizontal Display range command $06 parameter bit $00-$0b X1 ($1f4-$CDA) bit $0c-$17 X2 = X1+X2<<12 description Specifies the horizontal range within which the display area is displayed. The display is relative to the display start, so X coordinate 0 will be at the value in X1. The display end is not relative to the display start. The number of pixels that get sent to the screen in 320 mode are (X2-X1)/8. How many actually are visible depends on your TV/monitor. (normally $260-$c56) -------------------------------------------------------------------------- *Vertical Display range command $07 parameter bit $00-$09 Y1 bit $0a-$14 Y2 = Y1+Y2<<10 description Specifies the vertical range within which the display area is displayed. The display is relative to the display start, so Y coordinate 0 will be at the value in Y1. The display end is not relative to the display start. The number of pixels that get sent to the display are Y2-Y1, in 240 mode. (Not sure about the default values, should be something like NTSC $010-$100, PAL $023-$123) -------------------------------------------------------------------------- *Display mode command $08 parameter bit $00-$01 Width 0 bit $02 Height bit $03 Videomode See above bit $04 Isrgb24 bit $05 Isinter bit $06 Width1 bit $07 Reverseflag description Sets the display mode. -------------------------------------------------------------------------- *GPU Info command $10 parameter $000000 $000001 $000002 $000003 Draw area top left $000004 Draw area bottom right $000005 Draw offset $000006 $000007 GPU Type, should return 2 for a standard GPU. description Returns requested info. Read result from GP0. 0,1 seem to return draw area top left also 6 seems to return draw offset too. -------------------------------------------------------------------------- *Some other commands i do not know the function of: *????? command $20 parameter ??????? description i've seen it used with value $000504 what it does????? *????? command $09 parameter $000001 ?? description I've seen it used with value $000001 what it does????? -------------------------------------------------------------------------- Command Packets, Data Register. -------------------------------------------------------------------------- Primitive command packets use an 8 bit command value which is present in all packets. They contain a 3 bit type block and a 5 bit option block of which the meaning of the bits depend on the type. Layout is as follows: Type: 000 GPU command 001 Polygon primitive 010 Line primitive 011 Sprite primitive 100 Transfer command 111 Environment command Configuration of the option blocks for the primitives is as follows: Polygon: | 7 6 5 | 4 | 3 | 2 | 1 | 0 | | 0 0 1 |IIP|3/4|Tme|Abe|Tge| Line: | 7 6 5 | 4 | 3 | 2 | 1 | 0 | | 0 1 0 |IIP|Pll| 0 |Abe| 0 | Sprite: | 7 6 5 | 4 3 | 2 | 1 | 0 | | 1 0 0 | Size |Tme|Abe| 0 | IIP 0 Flat Shading 1 Gouroud Shading 3/4 0 3 vertex polygon 1 4 vertex polygon Tme 0 Texture mapping off 1 on Abe 0 Semi transparency off 1 on Tge 0 Brightness calculation at time of texture mapping on 1 off. (draw texture as is) Size 00 Free size (Specified by W/H) 01 1 x 1 10 8 x 8 11 16 x 16 Pll 0 Single line (2 vertices) 1 Polyline (n vertices) * Color information Color information is forwarded as 24 bit data. It is parsed to 15 bit by the GPU. Layout as follows: 17-10 $0f-$08 $07-$00 Blue Green Red * Shading information. For textured primitive shading data is forwarded by this packet. Layout is the same as for color data, the RGB values controlling the brightness of the individual colors ($00-$7f). A value of $80 in a color will take the former value as data. *Texture Page information The Data is 16 bit wide, layout is as follows: |F E D C B A 9|8 7|6 5|4 |3 2 1 0| |0 |tp |abr|ty|tx | tx 0-f X*64 texture page x coord ty 0 0 texture page y coord 1 256 abr 0 0.5xB+0.5 xF Semi transparency mode 1 1.0xB+1.0 xF 2 1.0xB-1.0 xF 3 1.0xB+0.25xF tp 0 4bit CLUT 1 8bit CLUT 2 15bit direct CLUT-ID Specifies the location of the CLUT data. Data is 16bits. F-6 Y coordinate 0-511 5-0 X coordinate X/16 -------------------------------------------------------------------------- abbreviations in packet list -------------------------------------------------------------------------- BGR Color/Shading info see above. xn,yn 16 bit values of X and Y in frame buffer. un,vn 8 bit values of X and Y in texture page tpage texture page information packet, see above clut clut ID, see above. -------------------------------------------------------------------------- Packet list. -------------------------------------------------------------------------- The packets sent to the GPU are processed as a group of data, each one word wide. The data must be written to the GPU data register ($1f801810) sequentially. Once all data has been recieved, the GPU starts operation. Overview of packet commands: Primitive drawing packets $20 monochrome 3 point polygon $24 textured 3 point polygon $28 monchrome 4 point polygon $2c textured 4 point polygon $30 gradated 3 point polygon $34 gradated textured 3 point polygon $38 gradated 4 point polygon $3c gradated textured 4 point polygon $40 monochrome line $48 monochrome polyline $50 gradated line $58 gradated line polyline $60 rectangle $64 sprite $68 dot $70 8*8 rectangle $74 8*8 sprite $78 16*16 rectangle $7c 16*16 sprite GPU command & Transfer packets $01 clear cache $02 frame buffer rectangle draw $80 move image in frame buffer $a0 send image to frame buffer $c0 copy image from frame buffer Draw mode/environment setting packets $e1 draw mode setting $e2 texture window setting $e3 set drawing area top left $e4 set drawing area bottom right $e5 drawing offset $e6 mask setting -------------------------------------------------------------------------- Packet Descriptions -------------------------------------------------------------------------- Primitive Packets -------------------------------------------------------------------------- $20 monochrome 3 point polygon |1f-18|17-10|0f-08|07-00| 1|$20 |BGR |command+color 2|y0 |x0 |vertexes 3|y1 |x1 | 4|y2 |x2 | -------------------------------------------------------------------------- $24 textured 3 point polygon |1f-18|17-10|0f-08|07-00| 1|$24 |BGR |command+color 2|y0 |x0 |vertex 0 3|clut |v0 |u0 |clutid+ texture coords vertext 0 4|y1 |x1 | 5|tpage |v1 |u1 | 6|y2 |x2 | 7| |v2 |u2 | -------------------------------------------------------------------------- $28 monchrome 4 point polygon |1f-18|17-10|0f-08|07-00| 1|$28 |BGR |command+color 2|y0 |x0 |vertexes 3|y1 |x1 | 4|y2 |x2 | 5|y3 |x3 | -------------------------------------------------------------------------- $2c textured 4 point polygon |1f-18|17-10|0f-08|07-00| 1|$2c |BGR |command+color 2|y0 |x0 |vertex 0 3|clut |v0 |u0 |clutid+ texture coords vertext 0 4|y1 |x1 | 5|tpage |v1 |u1 | 6|y2 |x2 | 7| |v2 |u2 | 8|y3 |x3 | 9| |v3 |u3 | -------------------------------------------------------------------------- $30 graduation 3 point polygon |1f-18|17-10|0f-08|07-00| 1|$30 |BGR0 |command+color 2|y0 |x0 |vertexes 3| |BGR1 | 4|y1 |x1 | 5| |BGR2 | 6|y2 |x2 | -------------------------------------------------------------------------- $34 shaded textured 3 point polygon |1f-18|17-10|0f-08|07-00| 1|$34 |BGR0 |command+color 2|y0 |x0 |vertex 0 3|clut |v0 |u0 |clutid+ texture coords vertex 0 4| |BGR1 | 5|y1 |x1 | 6|tpage |v1 |u1 | 7| |BGR2 | 8|y2 |x2 | 9| |v2 |u2 | -------------------------------------------------------------------------- $38 gradated 4 point polygon |1f-18|17-10|0f-08|07-00| 1|$38 |BGR0 |command+color 2|y0 |x0 |vertexes 3| |BGR1 | 4|y1 |x1 | 5| |BGR2 | 6|y2 |x2 | 7| |BGR3 | 8|y3 |x3 | -------------------------------------------------------------------------- $3c shaded textured 4 point polygon |1f-18|17-10|0f-08|07-00| 1|$3c |BGR0 |command+color 2|y0 |x0 |vertex 0 3|clut |v0 |u0 |clutid+ texture coords vertex 0 4| |BGR1 | 5|y1 |x1 | 6|tpage |v1 |u1 |texture page location 7| |BGR2 | 8|y2 |x2 | 9| |v2 |u2 | a| |BGR3 | b|y3 |x3 | c| |v3 |u3 | -------------------------------------------------------------------------- $40 monochrome line |1f-18|17-10|0f-08|07-00| 1|$40 |BGR |command+color 2|y0 |x0 |vertex 0 3|y1 |x1 |vertex 1 -------------------------------------------------------------------------- $48 single color polyline |1f-18|17-10|0f-08|07-00| 1|$48 |BGR |command+color 2|y0 |x0 |vertex 0 3|y1 |x1 |vertex 1 4|y2 |x2 |vertex 2 .|yn |xn |vertex n .|$55555555 Temination code. Any number of points can be entered, end with termination code. -------------------------------------------------------------------------- $50 gradated line |1f-18|17-10|0f-08|07-00| 1|$50 |BGR0 |command+color 2|y0 |x0 | 3| |BGR1 | 4|y1 |x1 | -------------------------------------------------------------------------- $58 gradated line polyline |1f-18|17-10|0f-08|07-00| 1|$58 |BGR0 |command+color 2|y0 |x0 | 3| |BGR1 | 4|y1 |x1 | 5| |BGR2 | 6|y2 |x2 | .| |BGRn | .|yn |xn | .|$55555555 Temination code. Any number of points can be entered, end with termination code. -------------------------------------------------------------------------- $60 rectangle |1f-18|17-10|0f-08|07-00| 1|$60 |BGR |command+color 2|y |x | 3|h |w | -------------------------------------------------------------------------- $64 sprite |1f-18|17-10|0f-08|07-00| 1|$64 |BGR |command+color 2|y |x | 3|clut |v |u |clut location, texture page y,x 4|h |w | -------------------------------------------------------------------------- $68 dot |1f-18|17-10|0f-08|07-00| 1|$68 |BGR |command+color 2|y |x | -------------------------------------------------------------------------- $70 8*8 rectangle |1f-18|17-10|0f-08|07-00| 1|$70 |BGR |command+color 2|y |x | -------------------------------------------------------------------------- $74 8*8 sprite |1f-18|17-10|0f-08|07-00| 1|$74 |BGR |command+color 2|y |x | 3|clut |v |u |clut location, texture page y,x -------------------------------------------------------------------------- $78 16*16 rectangle |1f-18|17-10|0f-08|07-00| 1|$78 |BGR |command+color 2|y |x | -------------------------------------------------------------------------- $7c 16*16 sprite |1f-18|17-10|0f-08|07-00| 1|$7c |BGR |command+color 2|y |x | 3|clut |v |u |clut location, texture page y,x -------------------------------------------------------------------------- GPU command & Transfer packets -------------------------------------------------------------------------- $01 clear cache |1f-18|17-10|0f-08|07-00| 1|$01 |0 |clear cache. Seems to be the same as the GP1 command. -------------------------------------------------------------------------- $02 frame buffer rectangle draw |1f-18|17-10|0f-08|07-00| 1|$02 |BGR |command+color 2|Y |X |Topleft corner 3|H |W |Width & Height Fills the area in the frame buffer with the value in RGB. This command will draw without regard to drawing environment settings. Coordinates are absolute frame buffer coordinates. Max width is $3ff, max height is $1ff. -------------------------------------------------------------------------- $80 move image in frame buffer |1f-18|17-10|0f-08|07-00| 1|$02 | 0|command 2|sY |sX |Source coord. 3|dY |dX |Destination coord. 4|H |W |Height+Width of transfer Copys data within framebuffer -------------------------------------------------------------------------- $01 $a0 send image to frame buffer |1f-18|17-10|0f-08|07-00| |$01 | |Reset command buffer (write to GP1 or GP0) 1|$A0 | | 2|Y |X |Destination coord. 3|H |W |Height+Width of transfer 4|pix1 |pix0 |image data 5.. ?|pixn |pixn-1 | Transfers data from mainmemory to frame buffer If the number of pixels to be sent is odd, an extra should be sent. (32 bits per packet) --------------------------------------------------------------------------- $01 $c0 copy image from frame buffer |1f-18|17-10|0f-08|07-00| |$01 | |Reset command buffer (write to GP1 or GP0) 1|$C0 | | 2|Y |X |Destination coord. 3|H |W |Height+Width of transfer 4|pix1 |pix0 |image data (read from data port) 5.. ?|pixn |pixn-1 | Transfers data from frame buffer to mainmemory. Wait for bit 27 of the status register to be set before reading the image data. When the number of pixels is odd, an extra pixel is read at the end.(because on packet is 32 bits) -------------------------------------------------------------------------- Draw mode/environment setting packets -------------------------------------------------------------------------- Some of these packets can also be by primitive packets, in any case it is the last packet of either that the GPU recieved that is active. so if a primitive sets tpage info, it will over write the existing data, even if it was sent by an $e? packet. -------------------------------------------------------------------------- $e1 draw mode setting |1f-18|17-0b|0a |09 |08 07|06 05|04|03 02 01 00| 1|$e1 | |dfe|dtd|tp |abr |ty|tx | command +values see above for explanations It seems that bit $0b-$0d of the status reg can also be passed with this command on some GPU's other than type 2. (ie. Command $10000007 doesn't return 2) -------------------------------------------------------------------------- $e2 texture window setting |1F-18|17-14|13-0F|0E-0A|09-05|04-00| 1|$E2 |twy |twx |twh |tww | command + value twx Texture window X, (twx*8) twy Texture window Y, (twy*8) tww Texture window width, 256-(tww*8) twh Texture window height, 256-(twh*8) -------------------------------------------------------------------------- $e3 set drawing area top left |1f-18|17-14|13-0a|09-00| 1|$e3 | |Y |X | sets the drawing area topleft corner. X&Y are absolute frame buffer coords. -------------------------------------------------------------------------- $e4 set drawing area bottom right |1f-18|17-14|13-0a|09-00| 1|$e4 | |Y |X | sets the drawing area bottom right. X&Y are absolute frame buffer coords. -------------------------------------------------------------------------- $e5 drawing offset |1f-18|17-14|14-0b|0a-00| 1|$e5 | |OffsY|OffsX| (offset Y = y << 11) sets the drawing area offset within the drawing area. X&Y are offsets in the frame buffer. -------------------------------------------------------------------------- $e6 mask setting |1f-18|17-02|01 |00 | 1|$e6 | |Mask2|Mask1| Mask1 Set mask bit while drawing. 1 = on Mask2 Do not draw to mask areas. 1= on While mask1 is on, the GPU will set the MSB of all pixels it draws. While mask2 is on, the GPU will not write to pixels with set MSB's -------------------------------------------------------------------------- DMA -------------------------------------------------------------------------- The GPU has two DMA channels allocated to it. DMA channel 2 is used to send linked packet lists to the GPU and to transfer image data to and from the frame buffer. DMA channel 6 is sets up an empty linked list, of which each entry points to the previous (ie. reverse clear an OT.) -------------------------------------------------------------------------- D2_MADR DMA base address. $1f8010a0 bit |1f 00| desc|madr | madr pointer to the adress the DMA will start reading from/writing to -------------------------------------------------------------------------- D2_BCR DMA block control $1f8010a4 bit |1f 10|0f 00| desc|ba |bs | ba Amount of blocks bs Blocksize (words) Sets up the DMA blocks. Once started the DMA will send ba blocks of bs words. Don't set a blocksize larger then $10 words, as the command buffer of the GPU is 64 bytes. -------------------------------------------------------------------------- D2_CHCR DMA channel control $1f8010a8 bit |1f-19|18|17-0c|0b|0a|09|08|07 01|00| desc| 0|Tr| 0| 0|Li|Co| 0| 0|Dr| Tr 0 No DMA transfer busy. 1 Start DMA transfer/DMA transfer busy. Li 1 Transfer linked list. Co 1 Transfer continous stream of data. Dr 0 direction to memory 1 direction to GPU This configures the DMA channel. The DMA starts when bit 18 is set. DMA is finished as soon as bit 18 is cleared again. To send or recieve data to/from VRAM send the appriopriate GPU packets first ($a0/$c0) -------------------------------------------------------------------------- D6_MADR DMA base address. $1f8010e0 bit |1f 00| desc|madr | madr Last table entry. -------------------------------------------------------------------------- D6_BCR DMA block control $1f8010e4 bit |1f 00| desc|bc | bc Number of list entries. -------------------------------------------------------------------------- D6_CHCR DMA channel control $1f8010e8 bit |1f-1d|1c|1b-19|18|17-02|01|00| desc| 0|OT| 0|Tr| 0|Ot| 0| Tr 0 No DMA transfer busy. 1 Start DMA transfer/DMA transfer busy. Ot 1 Set to do an OT clear. When this register is set to $11000002, the DMA channel will create an empty linked list of D6_BCR entries ending at the address in D6_MADR. Each entry has a size of 0, and points to the previous. The first entry is So if D6_MADR = $80100010, D6_BCR=$00000004, and the DMA is kicked this will result in a list looking like this: $80100000 $00ffffff $80100004 $00100000 $80100008 $00100004 $8010000c $00100008 $80100010 $0010000c -------------------------------------------------------------------------- DPCR Dma control register $1f8010f0 |1f 1c|1b 18|17 14|13 10|0f 0c|0b 08|07 04|03 00| | |Dma6 |Dma5 |Dma4 |Dma3 |Dma2 |Dma1 |Dma0 | Each register has a 4 bit control block allocated in this register. Bit 3: 1= Dma Enabled 2: ? 1: ? 0: ? Bit 3 must be set for a channel to operate. -------------------------------------------------------------------------- Common GPU functions, step by step. -------------------------------------------------------------------------- * Initializing the GPU. First thing to do when using the GPU is to initialize it. To do that take the following steps: 1 - Reset the GPU (GP1 command $00). This turns off the display aswell. 2 - Set horizontal and vertical start/end. (GP1 command $06, $07) 3 - Set display mode. (GP1 command $08) 4 - Set display offset. (GP1 command $05) 5 - Set draw mode. (GP0 command $e1) 6 - Set draw area. (GP0 command $e3, $e4) 7 - Set draw offset. (GP0 command $e5) 8 - Enable display. * Sending a linked list. The normal way to send large numbers of primitives is by using a linked list dma transfer. This list is built up of entries of which each points to the next. One entry looks like this: dw $nnYYYYYY ; nn = the number of words in the list entry ; YYYYYY = address of next list entry & $00ffffff 1 dw .. ; here goes the primitive. 2 dw .. ; . dw .. ; nn-1 dw .. ; nn dw .. ; The last entry in the list should have $ffffff as pointer, which is the terminator. As soon as this value is found DMA is ended. If the entry size is set to 0, no data will be transferred to the GPU and the next entry is processed. To send the list do this: 1 - Wait for the GPU to be ready to recieve commands. (bit $1c == 1) 2 - Enable DMA channel 2 3 - Set GPU to DMA cpu->gpu mode. ($04000002) 3 - Set D2_MADR to the start of the list 4 - Set D2_BCR to zero. 5 - Set D2_CHCR to link mode, mem->GPU and dma enable. ($01000401) * Uploading Image data through DMA. To upload an image to VRAM take the following steps: 1 - Wait for the GPU to be idle and DMA to finish. Enable DMA channel 2 if necessary. 2 - Send the 'Send image to VRAM' primitive. (You can send this through dma if you want. Use the linked list method described above) 3 - Set DMA to CPU->GPU ($04000002) (if you didn't do so already in the previous step) 4 - Set D2_MADR to the start of the list 5 - Set D2_BCR with : bits 31-16 = Number of words to send (H*W /2) bits 15- 0 = Block size of 1 word. ($01) if H*W is odd, add 1. (Pixels are 2 bytes, send an extra blank pixel in case of an odd amount) 6 - Set D2_CHCR to continuous mode, mem -> GPU and dma enable. ($01000201) Note that H, W, X and Y are always in frame buffer pixels, even if you send image data in other formats. You can use bigger block sizes if you need more speed. If the number of words to be sent is not a multiple of the blocksize, you'll have to send the remainder seperately, because the GPU only accepts an extra halfword if the number of pixels is odd. (ie. of the last word sent, only the low half word is used.) Also take care not to use blocksizes bigger than $10, as the buffer of the GPU is only 64 bytes (=$10 words). * Waiting to send commands You can send new commands as soon as DMA has ceased and the GPU is ready. 1 - Wait for bit $18 to become 0 in D2_CHCR 2 - Wait for bit $1c to become 1 in GP1. * Vsync Step by step for a VSYNC counter coming up (not)soon. Meanwhile you can init the pad driver and as soon as you want to check for VSYNC, fill the return buffer with 0 and wait for it to change. The pad driver checks the pads every VSYNC. Check the greentro source for an example. -------------------------------------------------------------------------- Missing info. -------------------------------------------------------------------------- There's still a lot yet uncovered, so if you have/know anything that's not in here please mail it to me. Things i'm looking for particularly are info on the differences between the various versions and revisions of the GPU, and something about drawing speeds and other timing. -------------------------------------------------------------------------- History: -------------------------------------------------------------------------- 23/apr/1999 First public release. 28/apr/1999 Some bugfixes and rewrites. Info on texture pages corrected. 8/may/1999 Detailed packet composition. 20/may/1999 DMA & Step by steps added. 25/jun/1999 More DMA, OT and lists. 30/aug/1999 Correction. ($03) -------------------------------------------------------------------------- Maintained by doomed/padua. Any errors, additions -> -------------------------------------------------------------------------- --== http://psx.rules.org/ ==-- --== http://www.padua.org/ ==-- -------------------------------------------------------------------------- Thanx & Hello to: Silpheed Groepaz Brainwalker & Hitmen, Antiloop Middy Danzig & Napalm, K-Communications, Blackbag, TDJ Sander & Focus, Burglar LCF & SCS*TRC, Deekay & Crest, Graham NO-XS & Oxyron, MrAlpha Fungus & F4CG, Zealot & Wrath Design, Shape, Naphalm Jazzcat & Onslaught, Reyn Ouwehand, WHW & WOW, all active people on PSX and C64, #psxdev, #c-64. --------------------------------------------------------------------------