/* 0000001c l O .sdata 00000004 rwVSYNCHandlerHid 00000020 l O .sdata 00000004 rwDMAHandlerVIF1Hid 00000024 l O .sdata 00000004 rwDMAHandlerGIFHid 00000028 l O .sdata 00000001 rwDMAOwnVSYNC 00000029 l O .sdata 00000001 rwDMAOwnVIF1 0000002a l O .sdata 00000001 rwDMAOwnGIF */ #define MAKE64(HIGH,LOW) (((uint64)HIGH)<<32 | ((uint64)LOW)) typedef struct rwTypeAddress rwTypeAddress; struct rwTypeAddress { RwUInt32 type void *addr; }; RwUInt128 *_rwDMAAsyncPktPtr; RwUInt128 *_rwDMAPktPtr; RwUInt128 *_rwDMAGateSlot; RwUInt8 _rwDMAUseHalfOffset; volatile RwUInt8 _rwDMACurrentHalfOffset; volatile RwUInt8 _rwDMANumFlipsInQueue; volatile RwUInt8 _rwDMAFlipPending; RwUInt8 _rwDMAFlipId; rwDMA_flipData _rwDMAFlipData; // The whole buffer RwUInt32 rwDMAbufSize = 0; RwUInt128 *rwDMAbuf = NULL; // VIFGate RwUInt128 *rwDMAVIFGate = NULL; RwUInt128 rwDMAFlushAGate; // DMAref, FLUSHA, FLUSHA, gate slot RwUInt128 rwDMAAFlush; // DMAcnt, FLUSHA, FLUSHA RwUInt128 rwDMAFlushAGateFlush; // DMAref, FLUSHA, FLUSHA, combination of the above RwUInt128 rwDMAANull; // empty DMAcnt, temporary for gate slot // PURef RwInt32 **rwDMAPURefList; RwUin16 rwDMAPURefSize = 1024; RwUInt16 rwDMAPURefStart; RwUInt16 rwDMAPURefEnd; // DMA buffers RwUInt8 *rwDMADmaBufs[2]; RwUInt32 rwDMADmaBufSize; // size of one buffer RwUInt8 rwDMACurrentBuf; // the buffer used for building chains RwUInt8 *rwDMABufTopPtr // top of allocated memory (grows upward) rwTypeAddress *rwDMABufCtrlPtr; // next ctrl chain slot RwUInt8 *rwDMABufCtrlTopPtr; // top of current ctrl chain chunk RwUInt8 *rwDMABufDataPtr; // allocated data pointer (grows downward) RwUInt8 *rwDMABufDataTopPtr; // top of last allocated data chunk RwUInt128 *rwDMABufAsyncTopPtr; // DMA packet RwUInt32 rwDMAPktType; RwUInt128 *rwDMAPktBase; // pointer to FIXUP qword RwUInt8 *rwDMAPktRealBase; // base of packet RwUInt8 *rwDMAAsyncPktRealBase; // base of async packet // start of current packet in SPR and RAM RwUInt128 *rwSPRBasePtr; RwUInt128 *rwSPRBaseInMem; // Pending - last closed packet RwUInt32 rwPendingType; rwTypeAddress *rwPendingCtrlPtr; RwUInt8 *rwPendingBase; RwUInt8 *rwPendingAsyncBase; // DMA dispatching RwUInt8 rwDMADispatching; RwUInt8 *rwDMADispatchPtr; // currently dispatching packet RwUInt8 *rwDMADispatchEnd; // end of dispatch list RwUInt8 rwDMAWaited; rwDMAYieldCallback rwDMAHYieldFunc; RwUInt8 rwDMAMinVsyncCnt = 0; /* 0000008c l .sbss 00000004 rwDMASavedGP 00000074 l .sbss 00000004 rwDMAFlipSt 00000081 l .sbss 00000001 rwDMAVsyncCnt */ /* 00000000 l F .text 00000570 rwDMAHandler 00000570 l F .text 00000264 rwDMAVSYNCHandler */ /* VIFgate: ; see George Bain, Texture and Geometry Syncing ; DMA tag FLUSHA ; make sure PATH3 is idle FLUSHA ---- ; ref data NOP (28) 7 FLUSH ; what's that for? NOP NOP MSKPATH3 (enable) ; (open gate, now PATH3 transfers a GIF packet to completion) NOP (96) 20 MSKPATH3 (disable) ; (PATH3 has started, close gate again) NOP (3) ---- .... ---- ; DMAcnt 21 FLUSHA ; synchronize VIF chain with GIF upload FLUSHA FLUSH NOP */ static void _flushPending(void) { if(rwPendingAsyncBase){ rwPendingCtrlPtr->type = RWDMA_PKT_AVIF; rwPendingCtrlPtr->addr = rwPendingBase; rwPendingCtrlPtr++; rwPendingCtrlPtr->type = RWDMA_PKT_AGIF; rwPendingCtrlPtr->addr = rwPendingAsyncBase; }else{ rwPendingCtrlPtr->type = rwPendingType; rwPendingCtrlPtr->addr = rwPendingBase; } } static void _flushSPR(void) { RwUInt32 pktSize = _rwDMAPktPtr - rwSPRBasePtr; if(pktSize == 0) return; RWDMA_SPR_WAIT_ON_FROM(); RWDMA_SPR_CPY_FROM(rwSPRBaseInMem, rwSPRBasePtr, pktSize); // relocate _rwDMAGateSlot to mem if(_rwDMAGateSlot && ((RwUInt32)_rwDMAGateSlot & ~0x3FFF) == RWDMA_SCRATCHPAD) _rwDMAGateSlot = (RwUInt128*)((RwUInt32)(_rwDMAGateSlot - rwSPRBasePtr + rwSPRBaseInMem) | RWDMA_UNCACHED); // SPR buffer flip rwSPRBasePtr = (RwUInt128*)((RwUInt32)rwSPRBasePtr ^ 0x2000); _rwDMAPktPtr = rwSPRBasePtr; // advance mem base rwSPRBaseInMem += pktSize; } static void _flushFixup(void) { RwUint64 tmph, tmpl; RwUInt128 ltmp; RwUInt32 pktSize = ((_rwDMAPktPtr - rwSPRBasePtr) + (rwSPRBaseInMem - rwDMAPktBase)) - 1; if(pktSize == 0){ // no qwords except fixup, adjust pointer back rwSPRBaseInMem--; }else{ // write fixup qword tmph = MAKE64(0x50000000 | pktSize, 0); // DIRECT tmpl = MAKE64(0, 0x10000000 | pktSize); // DMAcnt MAKE128(ltmp, tmph, tmpl); *(RwUInt128)((RwUInt32)rwDMAPktBase | RWDMA_UNCACHED) = ltmp; } rwDMAPktBase = NULL; rwDMAPktType &= ~RWDMA_FIXUP; } void _rwDMAYieldCallbackSet(rwDMAYieldCallback callback) { rwDMAHYieldFunc = callback; } void _rwDMAMinVsyncCntSet(RwUInt8 minCnt) { rwDMAMinVsyncCnt = minCnt; } RwBool _rwDMACallbackRestart(void); void _rwDMAClosePkt(void) { RwUint64 tmph, tmpl; RwUInt128 ltmp; RwUInt16 i, end; RwInt32 dec; if(rwDMAPktType != 0){ if(rwDMAPktType & RWDMA_FIXUP) _flushFixup(); // end DMA chain if(_rwDMAAsyncPktPtr){ tmph = MAKE64(0x06000000, 0x13000000); // MSKPATH3 enable, FLUSHA tmpl = MAKE64(0, 0x70000000); // DMAend MAKE128(ltmp, tmph, tmpl); if(rwDMAPktType != RWDMA_PKT_GIF) *_rwDMAAsyncPktPtr = ltmp; }else{ tmph = MAKE64(0, 0x06000000); // MSKPATH3 enable tmpl = MAKE64(0, 0x70000000); // DMAend MAKE128(ltmp, tmph, tmpl); } RWDMA_ADD_TO_PKT(ltmp); _rwDMAGateSlot = NULL; _flushSPR(); if(rwPendingType != 0) _flushPending(); rwPendingCtrlPtr = rwDMABufCtrlPtr++; rwPendingType = rwDMAPktType; rwPendingBase = rwDMAPktRealBase; rwPendingAsyncBase = rwDMAAsyncPktRealBase; if(rwPendingAsyncBase) rwDMABufCtrlPtr++; _rwDMAAsyncPktPtr = NULL; _rwDMAPktPtr = NULL; rwDMAPktType = 0; rwDMAPktRealBase = NULL; rwDMAAsyncPktRealBase = NULL; } if(rwDMAPURefStart != rwDMAPURefEnd){ // same check again? loop? i = rwDMAPURefStart; end = rwDMAPURefEnd; rwDMAPURefStart = 0; rwDMAPURefEnd = 0; while(i != end){ dec = rwDMAPURefList[i][1]; rwDMAPURefList[i][1] = 0; for(; dec > 0; dec -= RWDMA_PKT_IMM_MAX) _rwDMAAddPkt(rwDMAPURefList[i][0], RWDMA_PKT_RASUREF | (min(dec,RWDMA_PKT_IMM_MAX)<type = type; rwDMABufCtrlPtr->addr = addr; rwDMABufCtrlPtr++; }else{ // No space, add new packet or swap rwTypeAddress *nextCtrlPtr = (rwTypeAddress*)(((RwUInt32)rwDMABufTopPtr + 0x3F) & ~0x3F); RwUInt8 *nextTopPtr = (RwUInt8*)nextCtrlPtr + 0x80; if(nextTopPtr > rwDMABufDataPtr){ // fill last slot and swap rwDMABufCtrlPtr->type = type; rwDMABufCtrlPtr->addr = addr; rwDMABufCtrlPtr++; _rwDMAForceBufferSwap(); }else{ // start new chunk rwDMABufCtrlPtr->type = RWDMA_PKT_GOTO; rwDMABufCtrlPtr->addr = nextCtrlPtr; rwDMABufCtrlPtr = nextCtrlPtr; rwDMABufCtrlTopPtr = (RwUInt8*)rwDMABufCtrlPtr + 0x78; rwDMABufTopPtr = (RwUInt8*)rwDMABufCtrlPtr + 0x80; rwDMABufCtrlPtr->type = type; rwDMABufCtrlPtr->addr = addr; rwDMABufCtrlPtr++; } } } void _rwDMAAddPkt2(void *addrVif, void *addrGif); void _rwDMAAddPURef(RwInt32 *ptr) { rwDMAPURefList[rwDMAPURefEnd++] = ptr; // This is pretty weird... if(rwDMAPURefStart == rwDMAPURefEnd){ RwUInt16 PURefEnd1 = rwDMAPURefEnd; rwDMAPURefEnd = rwDMAPURefStart + rwDMAPURefSize/2; if(rwDMAPURefEnd >= rwDMAPURefSize) rwDMAPURefEnd -= rwDMAPURefSize; RwUInt16 PURefEnd2 = rwDMAPURefEnd; _rwDMAClosePkt(); rwDMAPURefStart = PURefEnd2; rwDMAPURefEnd = PURefEnd1; } } void _rwDMAReqFlip(void *addr, RwUInt8 id); void _rwDMAWaitQueue(void); RwBool _rwDMAOpenGIFPkt(/* RwUInt32 type, */ RwUInt32 size); // TODO: what if size is bigger than SPR buffer? RwBool _rwDMAOpenVIFPkt(RwUInt32 type, RwUInt32 size) { for(;;){ size++; // for the DMAend tag if(type & ~rwDMAPktType & RWDMA_FIXUP) size++; // Check if we can extend the current packet if(rwDMAPktType != 0){ if((rwDMAPktType&~RWDMA_FIXUP) == RWDMA_PKT_VIF_TTE) if(_rwDMAPktPtr - rwSPRBasePtr + rwSPRBaseInMem + size < rwDMABufDataPtr) break; // extend current packet // have to open a new one if(type & ~rwDMAPktType & RWDMA_FIXUP) size--; _rwDMAClosePkt(); } // Open new packet // Need one slot in the ctrl chain before goto RwUInt8 *topPtr = rwDMABufTopPtr; if(rwDMABufCtrlPtr >= rwDMABufCtrlTopPtr-1){ rwTypeAddress *nextCtrlPtr = (rwTypeAddress*)(((RwUInt32)rwDMABufTopPtr + 0x3F) & ~0x3F); RwUInt8 *nextTopPtr = (RwUInt8*)nextCtrlPtr + 0x80; if(nextTopPtr < rwDMABufDataPtr){ // start new chunk rwDMABufCtrlPtr->type = RWDMA_PKT_GOTO; rwDMABufCtrlPtr->addr = nextCtrlPtr; rwDMABufCtrlPtr = nextCtrlPtr; rwDMABufCtrlTopPtr = (RwUInt8*)rwDMABufCtrlPtr + 0x78; rwDMABufTopPtr = (RwUInt8*)rwDMABufCtrlPtr + 0x80; }else // no space to grow topPtr = NULL; } if(type != 0) size++; if(topPtr){ RwUInt128 *nextBase = (RwUInt128*)(((RwUInt32)topPtr + 0x7F) & ~0x7F); RwUInt128 *nextTopPtr = nextBase + size; if(nextTopPtr <= rwDMABufDataPtr){ rwDMABufTopPtr = nextTopPtr; _rwDMAPktPtr = rwSPRBasePtr; rwDMAPktType = RWDMA_PKT_VIF_TTE; rwDMAPktRealBase = nextBase; rwSPRBaseInMem = nextBase; if(type != 0){ rwDMAPktBase = rwSPRBaseInMem++; rwDMAPktType = RWDMA_PKT_VIF_TTE | RWDMA_FIXUP; } return TRUE } } if(type != 0) size--; if(rwDMADmaBufSize < (size+12)*16) // why 12? return FALSE; _rwDMAForceBufferSwap(); size--; } // extend existing packet // no space in scratchpad if((_rwDMAPktPtr - rwSPRBasePtr + size)*16 > 0x2000) _flushSPR(); // allocate rwDMABufTopPtr = (RwUInt*)(_rwDMAPktPtr - rwSPRBasePtr + rwSPRBaseInMem + size); if((rwDMAPktType & RWDMA_FIXUP) == type) return TRUE; // different FIXUP type if(type & ~rwDMAPktType & RWDMA_FIXUP){ // new packet is FIXUP _flushSPR(); rwDMAPktBase = rwSPRBaseInMem++; rwDMAPktType |= RWDMA_FIXUP; } return TRUE; } RwUInt128 *_rwDMADMAAlloca(RwUInt32 size, RwBool sprFlush); RwUInt128 *_rwDMADMAPktAllocHigh(RwUInt32 size, RwBool sprFlush) { RwUInt32 align, curPktSz, pktType; RwUInt8 *data; for(;;){ if(sprFlush){ _flushSPR(); RWDMA_SPR_WAIT_ON_FROM(); } align = 0x7F < size ? 0x7F : 0x3F; size = (size+align) & ~align; data = (RwUInt8*)((RwUInt32)rwDMABufDataPtr & ~align) - size; if(rwDMABufTopPtr >= data){ // there is space, allocate rwDMABufDataTopPtr = rwDMABufDataPtr; rwDMABufDataPtr = data; return (RwUInt128*)rwDMABufDataPtr; } curPktSz = ((rwDMABufTopPtr-rwSPRBaseInMem + 2)*16 + 0x7F) & ~0x7F; // buffer size too small, so even a swap won't help if(0x80+curPktSz+size > rwDMADmaBufSize) return FALSE; pktType = rwDMAPktType; _rwDMAClosePkt(); if(rwDMADmaBufs[rwDMACurrentBuf] != rwDMABufCtrlPtr) _rwDMAForceBufferSwap(); if(pktType == RWDMA_PKT_GIF) _rwDMAOpenGIFPkt(curPktSz/16); else _rwDMAOpenVIFPkt(pktType&RWDMA_FIXUP, curPktSz/16); } } RwBool _rwDMAAddImageUpload(RwBool parallel, RwUInt32 size) { RwUint64 tmph, tmpl; RwUInt128 ltmp; RwUInt128 *base; if(!_rwDMAOpenVIFPkt(0, 2)) return FALSE; if(_rwDMAAsyncPktPtr == NULL || rwDMABufAsyncTopPtr - _rwDMAAsyncPktPtr <= size+3){ // no space, allocate base = _rwDMADMAPktAllocHigh((size+3)*16, FALSE); if(base == NULL) return FALSE; rwDMABufAsyncTopPtr = (RwUInt128*)((RwUInt32)rwDMABufDataTopPtr | RWDMA_UNCACHEACCL); }else{ // extend base = (RwUInt128*)((RwUInt32)(_rwDMAAsyncPktPtr+1) & ~RWDMA_UNCACHEACCL); } // Init channel 2 packet if(_rwDMAAsyncPktPtr){ // last packet is required to end in // DMAcnt(2), NOP, DIRECT(2) why DIRECT? isn't this channel 2? // GIFtag(1), NLOOP=1, EOP, A+D // A: 0x7F D: 0 (NOP) // Overwrite DMAcnt from above with DMAcall and add one more qword tmpl = MAKE64(VIFGate + 0x3F, 0x50000003); // DMAcall MAKE128(ltmp, 0, tmpl); _rwDMAAsyncPktPtr[-3] = ltmp; // GIF tag, after this the called data from above, GIF nops, then return tmph = MAKE64(0, 0xE); // A+D tmpl = MAKE64(0x10000000, 0x40); // 40 qwords A+D (VIFgate+3F0) MAKE128(ltmp, tmph, tmpl); RWDMA_ADD_TO_ASYNC_PKT(ltmp); // link to itself? will be overwritten by next add to this packet tmpl = MAKE64(base, 0x20000000); // DMAnext MAKE128(ltmp, 0, tmpl); RWDMA_ADD_TO_ASYNC_PKT(ltmp); }else rwDMAAsyncPktRealBase = (RwUInt8*)base; _rwDMAAsyncPktPtr = (RwUInt128*)((RwUInt32)base | RWDMA_UNCACHEACCL); // Make channel 1 packet if(parallel && _rwDMAGateSlot){ if(((RwUInt32)_rwDMAGateSlot & ~0x3FFF) == RWDMA_SCRATCHPAD) RWDMA_SPR_WAIT_ON_FROM(); *_rwDMAGateSlot = rwDMAFlushAGate; RWDMA_ADD_TO_PKT(rwDMAAFlush); }else{ RWDMA_ADD_TO_PKT(rwDMAFlushAGateFlush); } _rwDMAGateSlot = _rwDMAPktPtr; RWDMA_ADD_TO_PKT(rwDMAANull); } RwBool _rwDMAHook(void); void _rwDMAUnhook(void); RwBool _rwDMADmaOpen(void) { RwUInt32 bufSize; RwUint64 tmph, tmpl; RwUInt128 ltmp; RwInt32 i; // Allocate DMA buf if there isn't one already bufSize = rwDMAbufSize; if(bufSize == 0) bufSize = 1*1024*1024; if(rwDMAbuf == NULL){ rwDMAbuf = RwMalloc(bufSize + 0x80); if(rwDMAbuf == NULL) return FALSE; rwDMAbufSize = 0; } if(_rwDMAHook() == FALSE){ if(rwDMAbufSize == 0) RwFree(rwDMAbuf); // BUG? rwDMAbuf not reset to NULL return FALSE; } // 0x800 bytes for VIF gate rwDMAVIFGate = (RwUInt128*)(((RwUInt32)rwDMAbuf + 0x7F) & ~0x7F); // after that two DMA buffers rwDMADmaBufs[0] = (RwUInt8*)rwDMAVIFGate + 0x800; rwDMACurrentBuf = 0; rwDMADmaBufSize = (bufSize - 0x800 - rwDMAPURefSize*sizeof(RwInt32*))/2 & ~0x7F; rwDMADmaBufs[1] = rwDMADmaBufs[0] + rwDMADmaBufSize; // after that PURefs rwDMAPURefList = (RwInt32**)(rwDMADmaBufs[1] + rwDMADmaBufSize); rwDMAPURefStart = 0; rwDMAPURefEnd = 0; rwDMABufCtrlPtr = (rwTypeAddress*)rwDMADmaBufs[0]; rwDMABufCtrlTopPtr = (RwUInt8*)rwDMABufCtrlPtr + 0x78; rwDMABufTopPtr = (RwUInt8*)rwDMABufCtrlPtr + 0x80; rwDMABufDataPtr = rwDMADmaBufs[0] + rwDMADmaBufSize; rwDMABufDataTopPtr = rwDMABufDataPtr; _rwDMAPktPtr = NULL; _rwDMAAsyncPktPtr = NULL; rwDMAPktBase = NULL; rwDMAPktRealBase = NULL; rwDMAPktType = 0; // TODO: define // Set VIFGate memset(rwDMAVIFGate, 0, 0x800); tmph = MAKE64(0x06000000, 0); // MSKPATH3 enable tmpl = MAKE64(0, 0x11000000); // FLUSH MAKE128(ltmp, tmph, tmpl); rwDMAVIFGate[7] = ltmp; MAKE128(ltmp, 0, 0x06008000); // MSKPATH3 disable rwDMAVIFGate[0x20] = ltmp; tmph = MAKE64(0, 0x11000000); // FLUSH tmpl = MAKE64(0x13000000, 0x13000000); // FLUSHA, FLUSHA MAKE128(ltmp, tmph, tmpl); rwDMAVIFGate[0x21] = ltmp; // GIF packet MAKE128(ltmp, 0, 0x60000040); // DMAret 40 qws rwDMAVIFGate[0x3F] = ltmp; MAKE128(ltmp, 0x7F, 0); // ?? this should be a GS register for(i = 0; i < 0x40; i++) rwDMAVIFGate[0x40+i] = ltmp; tmph = MAKE64(0x13000000, 0x13000000); // FLUSHA, FLUSHA tmpl = MAKE64(rwDMAVIFGate, 0x30000021); // DMAref MAKE128(ltmp, tmph, tmpl); rwDMAFlushAGate = ltmp; tmpl = MAKE64(rwDMAVIFGate, 0x30000022); // DMAref MAKE128(ltmp, tmph, tmpl); rwDMAFlushAGateFlush = ltmp; MAKE128(ltmp, tmph, 0x10000000); // DMAcnt rwDMAAFlush = ltmp; MAKE128(ltmp, 0, 0x10000000); // DMAcnt rwDMAANull = ltmp; SyncDCache(rwDMAVIFGate, (RwUInt8*)rwDMAVIFGate + rwDMADmaBufSize/2 + 0x7FF); rwDMADispatching = 0; rwSPRBasePtr = (RwUInt128*)RWDMA_SCRATCHPAD; rwSPRBaseInMem = NULL; rwPendingBase = NULL; rwPendingAsyncBase = NULL; rwPendingType = 0; _rwDMAFlipPending = 0; _rwDMAGateSlot = NULL; rwDMAWaited = 0; return TRUE; } void _rwDMADmaClose(void) { _rwDMAUnhook(); rwDMAVIFGate = NULL; if(rwDMAbufSize == 0) RwFree(rwDMAbuf); rwDMAbuf = NULL; rwDMAbufSize = 0; rwDMAHYieldFunc = NULL; rwDMAMinVsyncCnt = 0; } RwBool _rwDMAPreAlloc(RwUInt32 siz, RwUInt16 PURSiz, RwUInt128 *optBuf) { if(rwDMAVIFGate) return FALSE; if(PURSiz == 0) PURSiz = rwDMAPURefSize; if(optBuf){ if(((RwUInt32)optBuf & 0x7F) != 0) return FALSE; rwDMAbuf = optBuf; } rwDMAbufSize = siz; rwDMAPURefSize = PURSiz; return TRUE; }