I decided to spend a little more time on my variable bit rate DCT coder. Mainly looking at a few perceptual error metrics to see if anything could yield better general results than PSNR. One of the main sources for general reading was Charles Bloom’s Blog, where he looks at a few of the metrics and tries to derive his own which does a best fit over various metrics.
I also stumbled upon a nice little paper/presentation (HERE) which does give more details on the SSIM metric. The paper also compares the Q algorithm with SSIM and MSSIM. The Q algorithm seems to behave better when confronted to blurriness and JPEG style artifacts. So I decided to try the Q approach in my Variable Bit Rate Version of the Adaptive DCT Coder.
So I modified my program to use the Q metric in the Variable Bit Rate implementation. I ran a few tests and here are some results below.
Interestingly, I was expecting Q to perform better but it seems that images encoded with the Q metric perform worse than the PSNR implementation at the same compression ratio. I am unsure if I implemented this correctly or if something else is wrong. But for reference here is the code use to calculate the Q metric:
float CalculateQ( Ipp8u* dataA, int strideA, Ipp8u* dataB, int strideB, IppiSize size) { float meanA = 0; float meanB = 0; float varA = 0; float varB = 0; float varAB = 0; float err = 0; // Compute mean for(int y = 0; y<size.height; y++) { for(int x = 0; x<size.width; x++) { meanA += (float)dataA[0 + x * 3 + y * strideA]; meanB += (float)dataB[0 + x * 3 + y * strideB]; meanA += (float)dataA[1 + x * 3 + y * strideA]; meanB += (float)dataB[1 + x * 3 + y * strideB]; meanA += (float)dataA[2 + x * 3 + y * strideA]; meanB += (float)dataB[2 + x * 3 + y * strideB]; } } meanA /= (size.width * size.height * 3); meanB /= (size.width * size.height * 3); // Compute Variance for(int y = 0; y<size.height; y++) { for(int x = 0; x<size.width; x++) { for(int c = 0; c<3; c++) { float diffA = (float)dataA[c + x * 3 + y * strideA] - meanA; float diffB = (float)dataB[c + x * 3 + y * strideB] - meanB; varA += diffA * diffA; varB += diffB * diffB; varAB += diffA * diffB; } } } varA /= (size.width * size.height * 3) - 1; varB /= (size.width * size.height * 3) - 1; varAB /= (size.width * size.height * 3) - 1; return (4 * varAB * meanA * meanB) / ((varA + varB) * (meanA * meanA + meanB * meanB)); }
And here is the general main function used to compress the data at the requested rate:
void Comp::DCT_CompressToQ(Ipp8u* srcData, int stride, int width, int height, float targetQ, Ipp8u* qualityLevels, float* q) { Ipp8u rawDXTCColors[4 * 4 * 4]; Ipp8u rawDXTCDecompColors[4 * 4 * 3]; Ipp8u rawDCTCColors[16 * 16 * 3]; Ipp16u compDXTCColors[4 * 4][4]; float psnrDXT[4 * 4]; Ipp16u blockIndex = 0; for(int y=0; y<(height/16); y++) { for(int x=0; x<(width/16); x++, blockIndex++) { Ipp32u srcOffsetDCT = ((x * 16) * 3) + ((y * 16) * stride * 3); qualityLevels[blockIndex] = 0; while (qualityLevels[blockIndex] < 16) { IppiSize roiSizeDCT = {16, 16}; ippiCopy_8u_C3R( srcData + srcOffsetDCT, stride * 3, rawDCTCColors, 16 * 3, roiSizeDCT ); DCT_CompressDecompress(rawDCTCColors, 16, pQuantFwdTable[qualityLevels[blockIndex]], pQuantInvTable[qualityLevels[blockIndex]]); float dctQ = CalculateQ( srcData + srcOffsetDCT, stride * 3, rawDCTCColors, 16*3, roiSizeDCT ); q[blockIndex] = dctQ; if (dctQ <= targetQ || qualityLevels[blockIndex] == 15) { DCT_CompressDecompress(srcData + srcOffsetDCT, stride, pQuantFwdTable[qualityLevels[blockIndex]], pQuantInvTable[qualityLevels[blockIndex]]); DXT_CompressDecompress(srcData + srcOffsetDCT, stride, 16, 16, srcData + srcOffsetDCT, stride); break; } qualityLevels[blockIndex]++; } } } }
So I am unsure why this is yielding worse than initially expected. Maybe I got the implementation of Q wrong. Or as an alternate question, is there any better perceptual metrics than Q and SSIM?

0 Comments.