A short story of C128 ports


Right after  Amaurote 64 release people mentioned that C128 80 col mode would be a perfect platform for this game. I have contacted  Brush and said that although I won't complete such port, I can have a look.

That was about a month ago and I couldn't be more  wrong. There is not just one C128 port, but two versions of the game. One for VIC (40 col) and one for VDC (80 col). VDC is self-explanatory: it's supposed to run in 2MHz mode, but what can be so special about VIC version?

Well, C128's MMU has features that, when combined with some careful coding and fortunate choices of the original designer allowed me to significantly speed up the VIC-based game too. The 2MHz mode had nothing to do with it - it's already used in Amaurote 64 if you run it on C128 in C64 mode. MMU can remap zero page and CPU stack into any other memory page in the whole 128K area. If you respect page boundaries and check for $00/$01 accesses (because CPU port is still there), then any page of RAM, in any bank, can temporarily become zero page and can be accessed using fast addressing mode of the CPU.

There were two very fortunate choices in the original game design that allowed for speedup using zero page mapping. There are 256 pixels in a line. Every line takes exactly 32 bytes and 8 lines fit nicely in a memory page. Furthermore, there is a frame around the game area that covers the first and last 16 pixels. It takes away four bytes in every line and eliminated most of checks for possible conflict with CPU port. On top of that I had the luxury of having extra 64K of RAM. The obvious choice for optimisation is to trade off space for speed and unroll loops. With 64K in in another RAM bank it was possible to do these unrolls on a huge scale. There is no indexing at all in the most time-consuming procedures. All the indexes and offsets were computed in the assembly step. For example, copying data from screen buffer into VIC screen is combined with transformation from linear bitmap into charsets and expressed by thousands of load/store operations with absolute addressing. All those bytes are stored into zero page, constantly shifting its underlying location.

On every screen refresh a hexagonal frame is overlaid on top of game area. Since the frame shape is fixed, its gfx data and bitmasks are predefined in the assembler and every data access is either an immediate value or a zero page load/store, all with absolute addressing.

The C128-VIC port uses over 32K of extra RAM only for speed code. I'd say that it's the first C128 game that combines MMU features: extra memory and zero page mapping this way. The VDC port has its own tricks too. It uses 256x200 monochrome bitmap mode with doubled pixels. On the real hardware (VICE timings are off) this gives much more time for CPU to access VDC RAM: there are fewer columns and no attribute data to fetch.

VDC port benefits a lot from having screen with exactly the same layout as the game screen buffer. Screen copying and sprite rendering is trivial with no translation necessary. In this version I could briefly disable IRQs and more speedups that use stack remapping were possible.

Even with all the optimisations, access to VDC RAM remained a bottleneck. You can notice it when more objects appear on the screen at the same time. This is the reason why the radar icon doesn't swing: I had to save some CPU time.

What is black & white for me may be shades of gray for you, so feel free to use keys 1, 2 to switch screen colors while playing VDC port. Finally the intro. C64 intro displays hires bitmap with a layer or sprites for additional color. It turns out that there is a mode for VDC with 64K RAM that has exactly the same color restrictions: VDC FLI 8x1, 480x252. In order to display that mode I have to switch one VDC register value just before and just after vertical blank. It's completely out of sync with 50Hz PAL. If it would overlap with music IRQ I could miss those events completely, resulting in a flickering screen.

Solution to this was to roughly estimate how often VBLANK happens, measure how long it is, setup NMI for this purpose and let things automatically calibrate. Behind every story of a success there is an even longer story of failures. I had a lot of failures and hit many dead ends trying to improve VDC RAM throughput. This involved using REU to stream data directly into VDC data register, one write every clock cycle. Or using lightpen registers to latch current row and estimate how much time is left until the next vertical blanking period.

I already had a short and elegant solution to push bytes into VDC faster. It turned out that it works only on emulators, not on a real machine. I also tried to keep track of 'dirty' rows, to check if rendering a lot of sprites at different Y positions can be replaced by full screen refresh. It didn't work well. So this is how I spent several evenings and nights of April 2022. I hope that by now you can see how excited I became about this project.

It was a challenge to myself to demonstrate how obscure and rarely used MMU features can be very practical. This C128 port is not just the C64 release with a different loading address. There is some additional quality to it.

I would like to thank Oziphantom for his valuable input.
Special thanks go to Tokra. VDC FLI 8x1 intro picture wouldn't be possible without his work on hacking VDC graphical modes.
Big thanks also go to participants of https://c-128.freeforums.net/ forum. You have gathered incredible amount of knowledge about inner workings of VDC and C128s.

Thank you all!

Maciej Witkowiak, a.k.a. YTM/Elysium

Files

amaurote-c128-vic.d64 170 kB
May 29, 2022
amaurote-c128-vdc.d64 170 kB
May 29, 2022

Get Amaurote 64 & 128

Comments

Log in with itch.io to leave a comment.

What does "completely out of sync with 50 Hz PAL" mean? On a Commodore 1084S (Philips), the intro picture is scrolling down vertically, doubt this is intended, but what kind of display hardware do you need to display it correctly?

Nevertheless... the VDC version is the best version. IMO your heavy efforts were well spent!

I don't have the numbers right now, but VDC mode for the intro picture doesn't have the same total number of vertical lines as a PAL image. I tested it on an old CRT TV and a modern(ish) LCD and both were able to lock to the vertical sync.

Since you have the proper hardware, would you be willing to run some tests with various timings for me? 

Sure, no problem, happy to test.

IMO the exact number of lines itself shouldn't matter that much, a CRT will simply obey the vertical retrace signal when it gets it. However, changing the number of lines also means either you change the fps (which should be 50Hz) or the horizontal frequency (which should be 15625KHz). Since the 1084S accepts anything between 50-60Hz, I wouldn't expect you get issues with vertical frequency that quickly, the horizontal frequency is likely the most critical.

Thanks for mentioning my "VDC FLI". As mentioned on CSDB in the production notes, this mode originated in the "Risen from Oblivion"-demo. With a lower vertical resolution or better timing it should be possible to do exactly 60 or 50 Hz in that mode, but calculating this with the necessary register-switch is a challenge on its own.

Too bad the REU can't access the VDC-registers and DMA only works in 1 Mhz-mode as mentioned in the REU-manual. Emulators are still catching up regarding VDC . Z64K should be the best one in that regard as you probably know.

I know the 128-version of "Attack of the PETSCII-robots" uses MMU zero-page relocation as well. And then there is the "Andropolis"-demo "Doomstage"-demo part that uses 2 Mhz as well as MMU-zero-page-relocation for speed-up.

Will have to take a much closer look at this game :-) Thanks for keeping C128 alive.

Thank you for your kind comment. We hope that this game will be also a trigger and a good test-case for the emu developers to update the VDC support. Let us know what do you think after you take a closer look. Any speed improvement ideas (or any other) are always welcomed.