Decoding CD-i audio files

Anything relating to CD-i can be discussed in this forum. From the multiple hardware iterations of the system to the sofware including games, reference, music and Video CDs. Maybe you hold an interest in Philips Media and the many development houses set up to cater for CD-i if so then this is the forum.
User avatar
Shikotei
Burn:Cycle Activated
Posts: 34
Joined: Mon Mar 01, 2010 5:01 pm
Location: Netherlands
Contact:

Decoding CD-i audio files

Post by Shikotei » Thu Aug 17, 2017 2:15 pm

First off, if what I'm posting here is breaking rules (or laws for that matter), I'll remove it.

There have been at least several posts about how to get music from CD-i games (either because it's awesome, nostalgic, or just because).
I know there's a tool or two that can do exactly that, but I've found them lacking the ability to rip ALL audio.
In many cases the sound effects are not found.

So... time to take matters into my own hands!

With the availability of the Green Book, there's plenty of information to start from.
I'll be referencing this book a lot, for those who want to help out, want detailed info, or just check if I've correctly interpreted the info.
The format will be [chapter].[sections] or [chapter].[sections].F[figure number].
The example I'm using is shown using a hex editor with character set 'DOS/IBM-ASCII' as it has the most diverse char set (compared to 'ANSI', 'Macintosh', 'EBCDIC') and is purely for visual aid.

Reading a single (audio) sector

Data is stored in sectors on the disc, each can be used for various purposes, but they're generally in the following format:
Image
General sector format as detailed in II.4.1.1

With the header as:
Image
II.4.4.F11
Mode=2 must be used for CD-i tracks

And the subheader as:
Image
II.4.5.1.F12. The coding information is described in IV.3.2.4)
The values between '[]' are when the bit or bits have the value [0,1] or [00,01,10,11] respectively.
So 'Bits/sample [4/8/R/R]' makes:

Code: Select all

00 -> 4
01 -> 8
10 -> Reserved
11 -> Reserved
This information is defined in IV.3.2.4 (yes I skipped ahead a bit; the coding information can vary in purpose).

The subheader is present twice for data-integrity reasons.
II.4.5.2 describes in detail the definition of values in the subheader.
The submode has a requirement for audio sectors (defined in IV.3.2.3.F3]).

With the combined information in IV.3.1 and IV.3.3, the data block can be formatted as follows:
Image
Audio block format

Each sound groups is formatted as follows as described in IV.3.4.F6:
Image
Sound group format

From here on out, things get too small to show in figures.
A sound parameter block (16 bytes) holds 16 sound parameters of 1 byte each.
A single sound parameter consists of a Range and Filter value of 4 bits each in the following format:

Code: Select all

msb FFFFRRRR lsb
So like the following:

Code: Select all

Sound parameter: 01011100
F: 0101
R: 1100
A summary of what the audio sector looks like:

Code: Select all

   0   ..   11   : sync
  12             : header -> Minutes
  13             : header -> Seconds
  14             : header -> Sectors
  15             : header -> Mode
  16             : subheader -> fileNr
  17             : subheader -> channelNr
  18.0           : subheader -> submode -> End of File
  18.1           : subheader -> submode -> Real-Time
  18.2           : subheader -> submode -> Form
  18.3           : subheader -> submode -> Trigger
  18.4           : subheader -> submode -> Data
  18.5           : subheader -> submode -> Audio
  18.6           : subheader -> submode -> Video
  18.7           : subheader -> submode -> End of Record
  19.0           : subheader -> codingData -> zero
  19.1           : subheader -> codingData -> emphasis
  19.2 ..   19.3 : subheader -> codingData -> bitsPerSample
  19.4 ..   19.5 : subheader -> codingData -> sampleFrequency
  19.6 ..   19.7 : subheader -> codingData -> monoStereo
  20   ..   23   : subheader repeated
  24.0 ..   24.3 : dataBlock -> Sound group 0 -> Sound parameter 0 -> F
  24.4 ..   24.7 : dataBlock -> Sound group 0 -> Sound parameter 0 -> R
  25   ..   39   : dataBlock -> Sound group 0 -> Sound parameter 1..15
  40   ..  151   : dataBlock -> Sound group 0 -> Sample audio
 152   .. 2327   : dataBlock -> Sound group 1..17
2328   .. 2347   : dataBlock -> Padding
2348   .. 2351   : dataBlock -> Quality control
So far so good

More to come as I work through my code to write more documentation (if one can call it that).

User avatar
opt_fr_
Softech Recruit
Posts: 64
Joined: Sun May 17, 2009 11:13 pm
Location: Belgium
Contact:

Re: Decoding CD-i audio files

Post by opt_fr_ » Thu Aug 17, 2017 5:58 pm

Welcome to the audio decoders club !

If you find it, it would be great. Indeed I've seen some sectors flagged as audio but decoding is incorrect.
Could it be some obscure asm instructions behind that ?

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Thu Aug 17, 2017 9:35 pm

What you are describing above is helpful for understanding, but it's identical (except for the omission of Level A) to the CD-XA standards. Decoding for that audio format most certainly exists out there, see e.g. MESS, FFmpeg, and many other sources (I could provide sample code but it is already out there and not hard to find).

The problem is not so much decoding a single sector of ADPCM data (actually, decoding a single 128-byte soundgroup, of which there are 18 in a 2304-byte audio sector). It is really *finding* that data on the disc.

In my experience, sectors on disc that are actually marked as ADPCM audio almost invariably really contain such audio. There is really no reason for a developer to use such sectors for other purposes, if you want to store arbitrary data without error correction video sectors have 20 more bytes of room!

When audio sectors are included in properly delimited real-time channel-allocated streams and records, you're lucky and extracting it should be quite easy. You'll need a real-time file parser/splitter to correctly extract the basic data, but that is not really very hard (the CD-i player does it in hardware; it's just dispatching on the file number, coding and channel bytes while correctly handling end-of-record and end-of-file bits. See FFGB Appendix VII.3 and the CD-RTOS SS_Play service (see FFGB VII.2.2.3.2)for a description; FFGB stands for "Green Book" and was often used in technical documentation).

In some cases, developers may also have been lazy and include complete CD-I IFF audio files (typically for sound effects) within audio or data sectors. Such files should also be easy to find, they have a very specific header format (the CD-I IFF specification can be found on ICDIA in the Authoring Manuals section).

Finding other audio data can be much more difficult. If you're lucky it is also in regular ADPCM format, but it will often be embedded with other data and/or compressed somewhat. In either case, lots of sound effects might be concatenated in a single stream, and the left/right channels of a stereo fragment might even contain different (mono) effects. If embedded with other data, there is no telling what the format may be; it will depend on the actual title. The same goes for even finding that data. In many cases, reverse engineering the format requires disassembling the code that uses it (if the data is in "correct" ADPCM form, just not delimited in a standard way, it should be possible to find it by looking for certain patterns).

There are various reasons for compression and embedding; it is possible to save a few bytes from each soundgroup anyway because the sound parameter bytes are duplicated (level B, C) or quadruplicated (level A). See FFGB IV.3.5 and IV.3.6. There are also performance optimizations that make mixing easier or faster.

In CD-i, mixing audio sources (usually effects with music) must be done in software, just "playing" audio from memory will very noticeably interrupt audio from disc (at least if you have a base-case title; with FMV entering the picture there are other options). In effect, a soundgroup stream must be constructed that contains the audio from all the mixed sources. The mixing is complex, in the best case involving copying specific bytes from each soundgroup from source to destination (if mixing at the ADPCM level), but it can also involve half-byte masking and shifting. CD-RTOS contains services for such mixing, SD_SMix and SD_MMix (see FFGB VII.2.2.4.2), but performance-wise they can be a bit problematic. The sound mixing library that I did for CD-i somewhere in the nineties could be called with much smaller data chunks (e.g. single soundgroups) so that the cpu cost could be amortized out better.

Technically it is also possible to set the filter and range parameters of a soundgroup so that you can actually play 8-bit PCM (although somewhat weirdly interleaved), which makes it possible to mix more then two sources (even stereo) easily. I don't know if any discs out there actually use this technique; we developed a demo disc for it but never used it in a production title. Audio in this format would very hard to find.

Edit: There are 18 soundgroups of 128 bytes in an audio sector.
Last edited by cdifan on Fri Aug 18, 2017 4:19 pm, edited 1 time in total.

User avatar
Shikotei
Burn:Cycle Activated
Posts: 34
Joined: Mon Mar 01, 2010 5:01 pm
Location: Netherlands
Contact:

Re: Decoding CD-i audio files

Post by Shikotei » Fri Aug 18, 2017 11:50 am

cdifan wrote:
Thu Aug 17, 2017 9:35 pm
actually, decoding a single 18-byte soundgroup, of which there are 128 in a 2304-byte audio sector
I think you mean "a single 128-byte soundgroup, of which there are 18 in a 2304-byte audio sector". Ye old switcharoo, but inconsistent data is not something I'd let go uncorrected.

Finding the correct block of data could be difficult, especially when you want this to be automatic. But at this stage it's not the right moment to be worrying about that.
I want to have a working decoder first, so that any block I thrown in it comes out properly if it's audio.
If as a side effect of this project the information in the Green Book about audio more easy to read and understand, then that's a good thing.

Like I said, I know there's decoder tools out there that can decode a lot of the CD-i audio files, but not everything. The format may be the same as what I've been describing, but the tool either does not recognize it, or it does not find it.
There have also been some cases where it decoded the file incorrectly, making it all choppy or slowed down by factor 2.
As the sayings go "if you think you can do better, show me" or "if you want something done properly, do it yourself".

Lastly, because I'm interested in how the CD-i audio formats actually work. As a challenge.

As far as speed is concerned, I really don't care that much. The program so far takes 3ms to read, format, process, and output a sector. And the language is totally inappropriate for this kind of work (PHP)!

My target format is PCM wave format (that has no compression) so it can be used by a whole ton of programs that convert it into MP3.
cdifan wrote:
Thu Aug 17, 2017 9:35 pm
the CD-I IFF specification can be found on ICDIA in the Authoring Manuals section
Well, look who just got a backup plan!

===============

Used abbreviations:
- SG: Sound Group
- SP: Sound Parameter
- SU: Sound Unit

IV.2.F2 mentions the various sound quality levels and combined with the information in IV.3.2.4 about the 'Coding Information' in the subheader we can determine which sound quality level is present in the sector.

Coding information:

Code: Select all

- Bits/sample [4/8/R/R]
- Sample frequency [37.8kHz/18.9kHz/R/R]
Sound quality levels:

Code: Select all

- ADPCM Level A, 37.8 KHz,  8 bit
- ADPCM Level B, 37.8 KHz,  4 bit
- ADPCM Level C, 18.9 KHz,  4 bit
Level A sound group format
In IV.3.5 there's a description of Level A sound, and more details about what each SG contains.
There's talk about multiple sound units (SU) and which unit uses which SPs.

As far as I could determine:
Level A contains 4 SUs, which consists of 4 identical SPs and 28 sound data (SD) bytes.
The SPs are not sequential, but 1 every 4 bytes.
The data bytes are also 1 every 4.

The samples are sequential.

This is confirmed by IV.3.5.F8 and IV.3.5.F9. Yes these figures say "Levels B and C", but are wrongly named so. IV.3.6.F10 and IV.3.6.F11 confirm this (they're also named "Levels B and C").

Image
Sound group format of Level A

Mono sound has sequential SUs, while stereo has 1 SU every 2 per channel.
Decoding these units happens per stored sequential pair (SU0 and SU1), not per channel pair (SU0 and SU2).

Level B and C sound group formats

In IV.3.6 there's a description of Level B and C sound, and more details about what each SG contains.
There's talk about multiple sound units (SU) and which unit uses which SPs.

As far as I could determine:
Level B and C contain 8 SUs, which consists of 2 identical SPs and 28 sound data (SD) nibbles (4 bits each).
The SPs are not sequential:
- the first 8 bytes contain 1 every 4 bytes of SU 0..3. (01230123)
- the second 8 bytes contain 1 every 4 bytes of SU 4..7. (45674567)

The data nibbles are 1 every 8 nibbles and in the following order:
- 10 32 54 76 etc..
As each byte contains 2 nibbles and they're described as containing the SD for SU "1 and 0, 3 and 2, 5 and 4, 7 and 6". So the least significant nibble is SD0 and the most significant is SD1.

The samples are sequential.

Image
Sound group format of Level B and C

I think that's all to the formatting of an audio sector. The only other format that the book mentions that could be in an audio sector is CD-DA 16bit at 44.1kHz (IV.2.F2).
Next up: ADPCM
Last edited by Shikotei on Fri Aug 18, 2017 5:55 pm, edited 1 time in total.

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Fri Aug 18, 2017 4:20 pm

Shikotei wrote:
Fri Aug 18, 2017 11:50 am
cdifan wrote:
Thu Aug 17, 2017 9:35 pm
actually, decoding a single 18-byte soundgroup, of which there are 128 in a 2304-byte audio sector
I think you mean "a single 128-byte soundgroup, of which there are 18 in a 2304-byte audio sector". Ye old switcharoo, but inconsistent data is not something I'd let go uncorrected.
You are right of course; I was confused with sound units. I fixed my previous post.

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Fri Aug 18, 2017 4:21 pm

Shikotei wrote:
Fri Aug 18, 2017 11:50 am
The only format that the book mentions that could be in an audio sector is CD-DA 16bit at 44.1kHz (IV.2.F2).
You probably mean: "The only OTHER format ..."

And even that's not really true; MPEG Audio sectors are described in IX.5.2.

User avatar
Shikotei
Burn:Cycle Activated
Posts: 34
Joined: Mon Mar 01, 2010 5:01 pm
Location: Netherlands
Contact:

Re: Decoding CD-i audio files

Post by Shikotei » Fri Aug 18, 2017 6:02 pm

cdifan wrote:
Fri Aug 18, 2017 4:21 pm
You probably mean: "The only OTHER format ..."

And even that's not really true; MPEG Audio sectors are described in IX.5.2.
Yes, I did mean 'other'.. should proofread huge posts before copy-pasting them from NPP.

I haven't read the whole book (1000 pages or so), so there could definitely be things in there that might not be mentioned everywhere it is needed. I referenced the table of formats in IV.2.F2 because that's what gave me the notion that it was the full list.
The things I learn.

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Sat Aug 19, 2017 12:11 pm

The Green Book makes for excellent bedtime reading, though :)

User avatar
Shikotei
Burn:Cycle Activated
Posts: 34
Joined: Mon Mar 01, 2010 5:01 pm
Location: Netherlands
Contact:

Re: Decoding CD-i audio files

Post by Shikotei » Mon Aug 21, 2017 6:32 pm

Perhaps it does, but I prefer something a little lighter. Like Artemis Fowl, The Eyes Of The Overworld, or Overlord.

===============
Used abbreviations:
- SP.F: Sound Parameter.Filter
- SP.R: Sound Parameter.Range
- SD: Sample Data of a SU (there's 27 in a Level B SU)

IV.4.2 describes the ADPCM encoder, but unless trying to understand that process and reverse it was the only way to decode ADPCM, there's no need to go into details about it here.

IV.4.3 has interesting information though: it describes the values that the SP.F and SP.R represent.
The values of SP.F are shown in IV.4.3.F16, while SP.R is shown in IV.4.3.F17.
Apparently, SP.F only ranges between 0 and 3, so it does not use the full 4 bits it consists of.

Code: Select all

SP.F |   K0     |    K1
-----+----------+-----------
  0  | 0        |  0
  1  | 0.9375   |  0
  2  | 1.796875 | -0.8125
  3  | 1.53125  | -0.859375
Gain values for SP.F

The full decoder schematic in IV.5.1 is interesting on itself, but not for the purpose of how-to decode ADPCM.
The input data currently used is a 'CD-i sector', not 'CD-DA data'.

Now in IV.5.3 is where the magic is described and no matter how many words I use, an image is so much clearer:

Image
Schematic view of the ADPCM decoder Copyright by Philips Consumer Electronics B.V., May 1994

The input data indicates Level A(8bit) or B/C(4bit) audio source.
Gain G has the following function:
Level A: output = input * power(2,8-SP.R)
Level B/C: output = input * power(2,12-SP.R)

The gains of K0 and K1 are as described in IV.4.3.F16.

The T1 sample delay will cause a delay of 1 sample between the decoders output and the input of K0 and a delay of 2 samples between the decoders output and K1.

If by any means the output overflows the 16bit limit, it is clipped at the maximum value.

===============

Here's where I kind of lost the 100% certainty.

I'm assuming the system at rest (before using it for the first sector), all input, delayed samples, and output are '0'.
Also assumed is that there's no reset between various decoding tasks, so decoding one SU after another will have the K0 and K1 input samples from the previous SU.

The order of input data is per SD per SU per SG so (Level B/C):
- SG0:SU0:SD 0
- ..
- SG0:SU0:SD 27
- SG0:SU1:SD 0
- ..
- SG0:SU7:SD 27
- SG1:SU0:SD 0
- ..
- SG17:SU7:SD 27

The resulting dataflow would look something like this:
Image
System at rest, followed by first 4 SD of a single SU of a single SG
R0 to R3 are the to-be-quantized results of the decoder (not to be confused with SP.R0)

In a few simple formulae, I believe the output is calculated as follows:

Code: Select all

outG = SU[n] * power(2,12-SP.R)
outK0 = inQ[n-1] * gainK0[SP.F[n]]
outK1 = inQ[n-2] * gainK1[SP.F[n]]
inQ[n] = outG + outK0 + outK1
output = round(min(inQ, power(2,16)-1))
Where:
- 'n' is [0..7]
- inQ[0] is zero.
- inQ[-1] is zero.
- inQ[-2] is zero.

For a grand total of (18SGs * 7SUs * 28SDs) 4032 samples to be decoded per sector (Level B/C).
For Level B (37.8 kHz samplerate) that would be a length of (4032/37800) 0.1066 seconds.
For Level C (18.9 kHz samplerate) that would be a length of (4032/18900) 0.2133 seconds.
Half that duration when it's stereo. And half the duration of Level B for Level A.
Not very long, but it should be enough to determine what kind of audio is stored. Many sound effects are short (less than a second) and if you know the game where it's from, it'll be easy to know what you hear.
So, it could be possible that decoding a single sector would yield recognizable sound.

Mono audio is encoded sequentially, while the left and right channels of stereo are encoded separately in an SU of their own.
This is suggested by IV.3.5 and IV.3.6:
IV.3.6 wrote:In mono, the sound units are encoded sequentially i.e. SU0, SU1, SU2 ... SU7.
In stereo, the left signal is given by SU0, SU2, SU4, and SU6 and the right signal is given by SU1, SU3, SU5 and SU7.
The sound units are encoded in sequential pairs i.e. SU[n] and SU[n+1] are encoded together where n = 0, 2, 4 and 6.

User avatar
opt_fr_
Softech Recruit
Posts: 64
Joined: Sun May 17, 2009 11:13 pm
Location: Belgium
Contact:

Re: Decoding CD-i audio files

Post by opt_fr_ » Tue Aug 22, 2017 8:36 pm

Given that you are close to decoding your first adpcm audio samples (keep going, you're close!), I would like to add this here (as it wasn't very clear in the green book from my point of view) :
  • Decoding adpcm with 1 audio channel (mono) is clear (although you have to translate this scarying circuit diagram into code).
    But for stereo, the decoder must be duplicated : one decoding (and thus, memorizing) values for the left channel, and the other one for the right channel.
  • The values of SP.F are fixed binary values, that is a sum of negative power of 2.
    Ex : K0[1] = 0.9375 in decimal and 0.1111 in binary

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Tue Aug 22, 2017 9:57 pm

Okay, I think you've almost got it so giving you actual code should not be considered cheating.

Here's C++ source from CD-i Emulator for playing a 4-bit sound unit:

Code: Select all

/** Plays 4-bit sound unit.
 *
 * @param pChannel Audio channel.
 * @param bParam Sound parameter byte.
 * @param pbData Sound unit data.
 * @param iBit Sound data bit index (either 0 or 4).
 */
void CCdiAudio::PlayUnit4(SCdiAudioChannel *pChannel, BYTE68K bParam, const BYTE68K *pbData, int iBit)
{
	// Get gain shift.
	int cShift = 12 - BITS(bParam, 0, 4);

	// Get filter coefficients.
	SWORD68K *pwFilter = m_awFilter[BITS(bParam, 4, 2)];
	SWORD68K wFilter0 = pwFilter[0];
	SWORD68K wFilter1 = pwFilter[1];
	
	//* Gets previous samples.
	SWORD68K wSample1 = pChannel->c_wSample1;
	SWORD68K wSample2 = pChannel->c_wSample2;

	//* Gets output sample pointer.
	SWORD68K *pwSample = pChannel->c_awSamples + pChannel->c_iSample;
	ASSERT(pChannel->c_iSample + 28 <= CHANNEL_SAMPLES);

	for (int i = 0; i < 28; i++)
	{
		// Get sample.
		SLONG68K lSample = BITS(*pbData, iBit, 4);
		if (lSample >= 8)
			lSample -= 16;
		pbData += 4;

		// Apply gain.
		lSample <<= cShift;

		// Apply filter.
		lSample += (wFilter0 * wSample1 + wFilter1 * wSample2 + 0x80) / 0x100;

		// Clamp sample.
		SWORD68K wSample = LoWord(lSample);
		if (lSample < SWORD68K_MIN)
			wSample = SWORD68K_MIN;
		else if (lSample > SWORD68K_MAX)
			wSample = SWORD68K_MAX;
		
		// Remember sample.
		wSample2 = wSample1;
		wSample1 = wSample;

		*pwSample++ = wSample;
	}

	// Save previous samples.
	pChannel->c_wSample1 = wSample1;
	pChannel->c_wSample2 = wSample2;

	// Update sample index.
	pChannel->c_iSample += 28;
}
And here's source for playing an 8-bit sound unit:

Code: Select all

/** Plays 8-bit sound unit.
 *
 * @param pChannel Audio channel.
 * @param bParam Sound parameter byte.
 * @param pbData Sound unit data.
 */
void CCdiAudio::PlayUnit8(SCdiAudioChannel *pChannel, BYTE68K bParam, const BYTE68K *pbData)
{
	// Get gain shift.
	int cShift = 8 - BITS(bParam, 0, 4);

	// Get filter coefficients.
	SWORD68K *pwFilter = m_awFilter[BITS(bParam, 4, 2)];
	SWORD68K wFilter0 = pwFilter[0];
	SWORD68K wFilter1 = pwFilter[1];
	
	//* Gets previous samples.
	SWORD68K wSample1 = pChannel->c_wSample1;
	SWORD68K wSample2 = pChannel->c_wSample2;

	//* Gets output sample pointer.
	SWORD68K *pwSample = pChannel->c_awSamples + pChannel->c_iSample;
	ASSERT(pChannel->c_iSample + 28 <= CHANNEL_SAMPLES);

	for (int i = 0; i < 28; i++)
	{
		// Get sample.
		SLONG68K lSample = *pbData;
		if (lSample >= 128)
			lSample -= 256;
		pbData += 4;

		// Apply gain.
		lSample <<= cShift;

		// Apply filter.
		lSample += (wFilter0 * wSample1 + wFilter1 * wSample2 + 0x80) / 0x100;

		// Clamp sample.
		SWORD68K wSample = LoWord(lSample);
		if (lSample < SWORD68K_MIN)
			wSample = SWORD68K_MIN;
		else if (lSample > SWORD68K_MAX)
			wSample = SWORD68K_MAX;
		
		// Remember sample.
		wSample2 = wSample1;
		wSample1 = wSample;

		*pwSample++ = wSample;
	}

	// Save previous samples.
	pChannel->c_wSample1 = wSample1;
	pChannel->c_wSample2 = wSample2;

	// Update sample index.
	pChannel->c_iSample += 28;
}
As you can see, the two delayed samples are kept in the channel data structure between sound units. I'm almost 100% that this is correct, but the Green Book does not seem to say anything about it.

Here's the code that plays a single sound group, calling on the above functions:

Code: Select all

 
/** Plays sound group.
 *
 * @param bCoding Audio coding byte.
 * @param pbData Sound group data.
 */
void CCdiAudio::PlayGroup(BYTE68K bCoding, const BYTE68K *pbData)
{
#define pChannel0	(m_aChannel + 0)
#define pChannel1	(m_aChannel + 1)

	switch (bCoding & (CODING_BITS | CODING_MODE))
	{
	case CODING_BITS_4 | CODING_MODE_MONO:
		PlayUnit4(pChannel0, pbData[0], pbData + 16, 0);
		PlayUnit4(pChannel0, pbData[1], pbData + 16, 4);
		PlayUnit4(pChannel0, pbData[2], pbData + 17, 0);
		PlayUnit4(pChannel0, pbData[3], pbData + 17, 4);
		PlayUnit4(pChannel0, pbData[8], pbData + 18, 0);
		PlayUnit4(pChannel0, pbData[9], pbData + 18, 4);
		PlayUnit4(pChannel0, pbData[10], pbData + 19, 0);
		PlayUnit4(pChannel0, pbData[11], pbData + 19, 4);
		break;

	case CODING_BITS_4 | CODING_MODE_STEREO:
		PlayUnit4(pChannel0, pbData[0], pbData + 16, 0);
		PlayUnit4(pChannel1, pbData[1], pbData + 16, 4);
		PlayUnit4(pChannel0, pbData[2], pbData + 17, 0);
		PlayUnit4(pChannel1, pbData[3], pbData + 17, 4);
		PlayUnit4(pChannel0, pbData[8], pbData + 18, 0);
		PlayUnit4(pChannel1, pbData[9], pbData + 18, 4);
		PlayUnit4(pChannel0, pbData[10], pbData + 19, 0);
		PlayUnit4(pChannel1, pbData[11], pbData + 19, 4);
		break;

	case CODING_BITS_8 | CODING_MODE_MONO:
		PlayUnit8(pChannel0, pbData[0], pbData + 16);
		PlayUnit8(pChannel0, pbData[1], pbData + 17);
		PlayUnit8(pChannel0, pbData[2], pbData + 18);
		PlayUnit8(pChannel0, pbData[3], pbData + 19);
		break;

	case CODING_BITS_8 | CODING_MODE_STEREO:
		PlayUnit8(pChannel0, pbData[0], pbData + 16);
		PlayUnit8(pChannel1, pbData[1], pbData + 17);
		PlayUnit8(pChannel0, pbData[2], pbData + 18);
		PlayUnit8(pChannel1, pbData[3], pbData + 19);
		break;

	default:
		TraceAudio("BAD AUDIO CODING $%02x", bCoding);
		break;
	}
}
My current implementation zeroes the delayed samples at the start of an audio sector, but I am not sure that is right. It might have to be at the start of a play, or maybe even never. If somebody can find chapter and verse to either prove or disprove that, it would be great.

OTOH, if you play a single sound unit of silence the delayed samples will be zero anyway.

Oh, and of course you need the values for the filter parameters:

Code: Select all

 
//* Filter coefficients.
SWORD68K CCdiAudio::m_awFilter[16][2] =
{
	//* Filter 0.
	{ 0x000, 0x000 },

	//* Filter 1.
	{ 0x0F0, 0x000 },

	//* Filter 2.
	{ 0x1CC, -0x0D0 },

	//* Filter 3.
	{ 0x188, -0x0DC }
};
The values correspond to 0x100 * the base 2 rational numbers in IV.4.3.F16.

The values of the other symbolic constants (CODING_BITS_* and CODING_MODE_*) are easily found in the Green Book.

This is loosely based on code I found "out there" (though the function names, comments and variable names are all mine), and it seems to match up with the Green Book description.

But as you might know, CD-i Emulator is not perfect in decoding audio. It crackles and hisses a bit. It could be that an additional postfilter is needed or some little detail that is wrong...

According to some technical notes that I've seen the audio is also supposed to be muted for a bit after a play starts, I don't know for how long. CD-i Emulator does not currently do that.

User avatar
Shikotei
Burn:Cycle Activated
Posts: 34
Joined: Mon Mar 01, 2010 5:01 pm
Location: Netherlands
Contact:

Re: Decoding CD-i audio files

Post by Shikotei » Wed Aug 23, 2017 3:53 pm

opt_fr_ wrote:
Tue Aug 22, 2017 8:36 pm
  • The values of SP.F are fixed binary values, that is a sum of negative power of 2.
  • Decoding adpcm with 1 audio channel (mono) is clear (although you have to translate this scarying circuit diagram into code).
    But for stereo, the decoder must be duplicated : one decoding (and thus, memorizing) values for the left channel, and the other one for the right channel.
The SP.F were fixed values, just within the SU. It was written down wrongly (as 'SP.F[n]', instead of 'SP.F'), but used correctly in my program. This is rectified in the second half of this post. Thank you for pointing this out!

About the second decoder; I suspected as much, but could not find any evidence that made it clear. Using one decoder per channel (left/mono and right) would indeed make more sense.
I've adjusted my program to use a second decoder for the additional channel when it's stereo.
opt_fr_ wrote:
Tue Aug 22, 2017 8:36 pm
  • output = round(min(inQ, power(2,16)-1))
    I had trouble with this line, because I didn't know the programming language I used for this is using a particular rounding method that is unappropriated for that purpose (banker round, lol).
The line works as a clipping limiter:
  • max(value1, value2): returns highest value
  • min(value1, value2): returns lowest value
  • power(value1,value2): returns value1 to the power of value2 (the actual PHP function name is 'pow')
  • round(value): returns the nearest integer of value. Can round either ways (up and down)
But since my program uses integers, not bytes, words, or short words... I've had to adjust that formula to:
output = max(1-power(2,15), min(inQ, power(2,15)-1))
output = max(1-32768 , min(inQ, 32768 -1))
cdifan wrote:
Tue Aug 22, 2017 9:57 pm
Here's C++ source from CD-i Emulator for playing a 4-bit sound unit:

Code: Select all

// Get sample.
SLONG68K lSample = BITS(*pbData, iBit, 4);
if (lSample >= 8)
    lSample -= 16;
And here's source for playing an 8-bit sound unit:

Code: Select all

// Get sample.
SLONG68K lSample = *pbData;
if (lSample >= 128)
    lSample -= 256;
This was exactly what I was missing. I had not accounted for signed data. Yes, it's a simple mistake, but very crucial!

With a little bit more tinkering I now have an operational decoder!
Turns out those were the only 2 things (second decoder & signed data) that weren't correct.

The test-audio I used was the entire file 'endmus.rtf' from The Apprentice (4802 sectors). Channel 1 (554 sectors) contains the boss-battle music of the Medieval Tower.
cdifan wrote:
Tue Aug 22, 2017 9:57 pm
My current implementation zeroes the delayed samples at the start of an audio sector, but I am not sure that is right. It might have to be at the start of a play, or maybe even never.
...
But as you might know, CD-i Emulator is not perfect in decoding audio. It crackles and hisses a bit. It could be that an additional postfilter is needed or some little detail that is wrong...

According to some technical notes that I've seen the audio is also supposed to be muted for a bit after a play starts, I don't know for how long. CD-i Emulator does not currently do that.
Perhaps these three are related?
I'd assume during a real-time file that there's no time to clean the delayed samples (because it's a real-time file) and the sectors are supposed to be decoded back-to-back.
Now that I know the nature of the decoder, I think you'd improve the audio quality if you were not to zero the delayed samples. Instead of getting startup-behavior at each sector, you'd only get it at the start of emulation (or playing a file).

The two delayed samples (combined with the filter gains) effectively behave as 1st and 2nd order low/band/high pass filters but (because of the delayed sample zeroing) will always cause the first sample to pass through unfiltered.
This behavior can cause pops and crackles as your audio signal is forced from whatever amplitude to the sample before coming back up after perhaps 2 or 3 samples.

In any case, since you probably know which audio crackles and hisses, it should be a simple task of trying it out.

I'll take some time to find a clean audio file (and section) to test this on. I believe the Abandoned Tower music has a clean start (just a single instrument's tone).


===============
A rectification and replacement of the uncertainties of the previous post:
With help from opt_fr_ and cdifan

I'm assuming the system at rest (before using it for the first sector), all input, delayed samples, and output are '0'.
Also assumed is that there's no reset between various decoding tasks, so decoding one SU after another will have the K0 and K1 input samples from the previous SU.
Also assumed is that there's no reset after decoding a sound group, or a whole sector for that matter.

The order of input data is per SD per SU per SG so:

Code: Select all

Level A:
- SG0:SU0:SD 0
- ..
- SG0:SU0:SD 27
- SG0:SU1:SD 0
- ..
- SG0:SU3:SD 27
- SG1:SU0:SD 0
- ..
- SG17:SU3:SD 27

Level B/C:
- SG0:SU0:SD 0
- ..
- SG0:SU0:SD 27
- SG0:SU1:SD 0
- ..
- SG0:SU7:SD 27
- SG1:SU0:SD 0
- ..
- SG17:SU7:SD 27
The resulting dataflow would look something like this:
Image
System at rest, followed by first 4 SD of a single SU of a single SG
R0 to R3 are the to-be-quantized results of the decoder (not to be confused with SP.R0)

The CD-i's hardware uses signed bytes(Level A) and nibbles(Level B/C).
So far, the data values I used were all unsigned and have to be converted to signed Two's Complement:

Code: Select all

Level A
From unsigned: 0x00 .. 0xFF = 0 .. 255
To     signed: 0x80 .. 0x00 .. 0x7F = -128 .. 0 .. 127
if input[n] >= 128 then
    input[n] = input[n] - 256
end if

Level B/C
From unsigned: 0x0 .. 0xF = 0 .. 15
To     signed: 0x8 .. 0x0 .. 0x7 = -8 .. 0 .. 7
if input[n] >= 8 then
    input[n] = input[n] - 16
end if
In a few simple formulae, the output is calculated as follows:

Code: Select all

outG = input[n] * power(2,12-SP.R)
outK0 = inQ[n-1] * gainK0[SP.F]
outK1 = inQ[n-2] * gainK1[SP.F]
inQ[n] = outG + outK0 + outK1
output = max(1-power(2,15), min(inQ, power(2,15)-1))
Where:
- input[] is the 28 signed samples of the sound unit
- n is [0..27]
- SP.R is defined in the Sound Unit Sound Parameters
- SP.F is defined in the Sound Unit Sound Parameters
- inQ[0] is zero.
- inQ[-1] is zero.
- inQ[-2] is zero.

Mono audio is encoded sequentially, while the left and right channels of stereo are encoded separately in an SU of their own.
This is suggested by IV.3.5 and IV.3.6:
IV.3.6 wrote:In mono, the sound units are encoded sequentially i.e. SU0, SU1, SU2 ... SU7.
In stereo, the left signal is given by SU0, SU2, SU4, and SU6 and the right signal is given by SU1, SU3, SU5 and SU7.
The sound units are encoded in sequential pairs i.e. SU[n] and SU[n+1] are encoded together where n = 0, 2, 4 and 6.
This indicates that there is a second ADPCM decoder available for the other stereo channel.
Having a second decoder, means that this one decodes SU1, SU3, SU5, and SU7 and the first one (the one for mono too) decodes SU0, SU2, SU4, and SU6.
Also, this decoder is therefor NOT in use for mono.

The resulting output is signed 16 bit PCM. This is the simplest of audio format, in such that it has no encoding. It's an amplitude value per 16bit.
Wikipedia page on PCM, take a look at the graph. The blue dots are the 16bits output; one dot per 16bits.

User avatar
Shikotei
Burn:Cycle Activated
Posts: 34
Joined: Mon Mar 01, 2010 5:01 pm
Location: Netherlands
Contact:

Re: Decoding CD-i audio files

Post by Shikotei » Fri Aug 25, 2017 11:24 am

Being lazy in terms of repetitive actions ought to be automated, made me desire to generate readable audio files, instead of raw PCM data.

A little research in WAV header format yielded this page.
I'm either creating the headers incorrectly, or the data is incorrectly stored.. but the audio is sped up to 200% actual speed (playback using Winamp).
Lowering the samplerate to half the actual rate fixes it, but does not address the cause.

Ok, so perhaps it's not fully working yet.. I've been trying to get the stereo working properly (there's two channels in my output, but one of them is silent).
According to the sector header it's supposed to contain stereo Level B audio (8 SU, 4 SP, 28 SD 4 bits/sample).
The sound parameters indicate that only half the filter and range parameters are set.
The data samples are all ranging from 0x00 to 0x0F (using hex editor, so there's no mistaking it).. indicating that there really is no data in the other channel.
Why? Why store a single channel in a stereo format? The search for a true stereo file begins!

I know that the overworld music in Link:The Faces of Evil is stereo: the drums are distinctly different in the left and right channels.
Nevermind.. the dang source file is 200MB+ and my code can't distinguish the various files in it. There's supposedly 70 of them in there, but I only find 8 (of huge lengths).
No, I'm settling for the bumpers of The Vision Factory and Philips. These are audio+video files, but have a single channel of audio (ch 0 and 15 resp. in my case).
They're decoded easy enough, but are still too fast.

So either:
  • The headers are correct, but the samples are played back too fast.. not enough samples? One sector with Level B audio has 18SG * 8SU * 28SD = 4032 samples. That's 8064 bytes of data, which is what I make so there's no mistake there.
  • The headers are incorrect and I have to adjust the samplerate or one of the other values.
I'll have to do more experimenting.

===============
About the uncertainty when to clear the delayed audio samples in the decoder, I've found this tidbit:
IV.3.7 wrote:The audio sector interleaving is not necessarily applicable if the real-time bit is zero for audio sectors. It should be noted that non-real-time audio data must be sent to a soundmap.
My understanding is that the soundmap is used to store multiple sectors of audio until it is ready to be decoded back-to-back. Why else would non-real-time audio be required to be sent there?
This does not, however, state that real-time audio must be sent directly to the decoder.
IV.5.4 wrote:Normally the soundmap contains, in RAM, the whole or a part of the audio part of a real-time record.
I think that these two parts combined indicate that the decoder requires an uninterrupted stream of data to properly decode a sound record.
Either you stream it directly from the sectors of a real-time-file (RealTime bit is set in the header), or you buffer it in the soundmap before streaming.

A few experiments with the delayed samples of the decoder later made it very clear when NOT to clear these.
Below are the results of this test. Each file is a clean WAV (including generated headers, so absolutely NO audio program touched these), where I clear delayed samples at the start of each: The audio is roughly only 13 seconds, but long enough to clearly hear the difference between the versions.

Is the distortion in these files comparable to the crackles and hisses of the CD-i Emulator?

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Sat Aug 26, 2017 1:40 pm

Shikotei wrote:
Fri Aug 25, 2017 11:24 am
The sound parameters indicate that only half the filter and range parameters are set.
The data samples are all ranging from 0x00 to 0x0F (using hex editor, so there's no mistaking it).. indicating that there really is no data in the other channel.
Why? Why store a single channel in a stereo format? The search for a true stereo file begins!
This particular sample is from The Apprentice. These are stereo ADPCM files with an all-zero right channel because that saves a 4-bit masking operation when mixing with sound effects. The result of mixing is made "mono" by using the audio mixing unit.

For Vision Factory games, use the CDDA audio tracks for high quality music (44.1 kHz PCM stereo). The ADPCM tracks are intended just for mixing with sound effects; The Apprentice even plays from CDDA if it isn't doing sound mixing.
Shikotei wrote:
Fri Aug 25, 2017 11:24 am
A few experiments with the delayed samples of the decoder later made it very clear when NOT to clear these.
Below are the results of this test.

Is the distortion in these files comparable to the crackles and hisses of the CD-i Emulator?
Yes, that sounds comparable (music_1_5_Se). Looks like I need to stop clearing those samples at each sector.
Strange, as I seem to remember having tried this in the past... Anyway, thanks very much!

User avatar
cdifan
CD-i Emulator Author
Posts: 906
Joined: Fri Jun 24, 2005 6:19 am
Location: The Netherlands
Contact:

Re: Decoding CD-i audio files

Post by cdifan » Sat Aug 26, 2017 9:18 pm

I couldn't help myself and wrote a little tool today:

Code: Select all

C:\Tmp>apprsfx
Usage: apprsfx levelN.dat
Function: Extracts The Apprentice sound effects from level data
Raw sound effect data is written to effectN_M.raw
Written by CD-i Fan <cdifan@gmail.com>
It is based on reverse-engineering the code that mixes the sound effects into the music read from disc.

The levelN.dat files can be found in the root directory of The Apprentice.
IsoBuster extracts them just fine (they are real-time files but contain only data sectors in a single channel).

When run on the extracted level1.dat file the results are as follows:

Code: Select all

C:\Tmp>apprsfx level1.dat
Extracting The Apprentice sound effects from level 1 data:
Reading block header from level1.dat @$0...
Reading sound header from level1.dat @$79800...
Extracting effect1_1.raw (194 groups) from level1.dat @$8c238...
Extracting effect1_2.raw (77 groups) from level1.dat @$89da8...
Extracting effect1_3.raw (101 groups) from level1.dat @$79908...
Extracting effect1_4.raw (198 groups) from level1.dat @$7c8d8...
Extracting effect1_5.raw (216 groups) from level1.dat @$825a8...
Extracting effect1_6.raw (122 groups) from level1.dat @$79908...
Extracting effect1_7.raw (64 groups) from level1.dat @$88ae8...
Extracting effect1_8.raw (85 groups) from level1.dat @$8b6f8...
Extracting effect1_9.raw (230 groups) from level1.dat @$831d8...
Extracting effect1_10.raw (203 groups) from level1.dat @$7d238...
Extracting effect1_11.raw (236 groups) from level1.dat @$91da0...
Extracting effect1_12.raw (29 groups) from level1.dat @$8a8e8...
Extracting effect1_13.raw (174 groups) from level1.dat @$8df48...
Extracting effect1_14.raw (164 groups) from level1.dat @$98c40...
Extracting effect1_15.raw (142 groups) from level1.dat @$9d998...
Extracting effect1_16.raw (175 groups) from level1.dat @$93150...
Extracting effect1_30.raw (433 groups) from level1.dat @$98358...
Extracting effect1_31.raw (324 groups) from level1.dat @$a1ca0...
Extracting effect1_32.raw (326 groups) from level1.dat @$a4ec8...
The output files are raw ADPCM Level C Stereo data (128 bytes per sound group) with a silent left channel.
Use your favorite tool to convert these to wav files or any other audio format (mine is called cdiconv, the CD-i File Converter).

The source and a Windows executable can be downloaded here: http://www.cdiemu.org/download/apprsfx.zip.

If you look at the source, note that level data files consist of blocks, each of which is loaded to a separate memory buffer.
This is a generic pattern for Vision Factory discs.

For The Apprentice level data files, block 0 is the level code, block 2 and 3 are compiled sprites and block 4 is the sound effect data.
As of right now I do not know what's in block 1, it is not the level map (these are in separate mapN_M.dat files).

Next I will be taking a look at Dimo's Quest...

Post Reply