As an example of what sort of map data I get:
It's a raw conversion (data = 0x00 to 0xFF) from hex to RGB (R=G=B) (0xFF = RGB[255,255,255]; 0x00 = RGB[0,0,0]).
The above image is a compilation of 3 images horizontal by 2 vertical. The region on the map is the Shrine of Fire as indicated by the entrance center top and lava flow on the right (both are kind of black).
The images are 384x240, suggesting it is CLUT stored and not YUV (or any of its variations) or raw RGB.
I found this information in the 'over.rtf' file.
As far as I can tell, each area (single screen) has several blocks of information:
Firstly, there's a block of map data, (suspected CLUT stored).
This block is interrupted by blobs of (audio) data that are 0x0914 bytes long and repeats after 0x3F8C bytes of map data.
So there's 0x0914 bytes of audio, then 0x3F8C bytes of data, then 0x0914 bytes of audio, etc.
The map data block is always exactly 0x019E78 bytes long (including the audio).
Should the audio data be removed, there's enough bytes to fill a 384x240 pixel image at 1 byte per pixel.
This is my basis for assuming CLUT encoding.
This block is followed by several other blocks of NULL data, sometimes interrupted at the same frequency as the first block.
At some point there's readable data that reads something like "play;.voice.info..tree..cycle", shortly followed by its area type "plain_stream".
This is followed by a long block of audio data, interrupted by audio data.. I know it's weird, but that is what it looks like.
After another block of data with an unknown purpose, there's more NULL data, followed by sprite data.
This data is pixel shifted (using ASM).
The data contains the sprites that can be found in this area. This (so far) includes enemies, treasures, NPCs and purchasable items (including price).
Then there's more NULL data and a block of (I think) script triggers and other behavioral data. This is backed by (for example in the area where you have to play the flute to a snake to get a firewall spell) "APPEAR..PLAYFLUTE."
This is once again followed by more NULL data. Then you find the CLUT itself. This thing is preceded by 0x000100 and usually starts with 0x323232 0x646464 0x969696 0xC8C8C8 0x000000 0x101010.
Then there's NULL data, something about 'offsets' and then continues with audio data interrupted audio for a short while. After this there's essentially NULL data interrupted by audio.
This concludes one 'block' of map data (the format now repeats).
With the above information, I have been able to extract more than 70 screens of map data and crudely assembled some of it. As this data is extremely raw (no color) and only is noisy grey-scale, it is difficult to determine which piece goes where.
For those who want to try to help, here's the required data:
Full block: http://www.shikotei.com/tmp/FullBlockData.dat
Map data: http://www.shikotei.com/tmp/MapData.dat
Map processed: http://www.shikotei.com/tmp/Map.png
Sprite data: http://www.shikotei.com/tmp/SpriteData.dat
Processed sprite data (CLUT encoded): http://www.shikotei.com/tmp/SpriteDataProcessed.dat
Decoded sprite data: http://www.shikotei.com/tmp/Sprite.png
CLUT data: http://www.shikotei.com/tmp/CLUTData.dat
CLUT palette: http://www.shikotei.com/tmp/CLUTPalette.png
The above files are no longer on my site; I've succeeded in decoding them. See post below.