Wednesday, 13 November 2013

What does the H.264/avc1 codecs parameters for video/mp4 mime types mean?

The HTMLMediaElement.canPlayType() API enables you to query what video formats a user agent can play. For "video/mp4", the container for H.264/AAC, you can specify a "codecs" parameter that denotes the the H264 profile and level. Firefox doesn't currently handle MP4 codecs parameter very well, so I took it upon myself to figure out what the codecs parameters mean for H.264.

According to RFC6381 The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types, codecs paremeters for H.264 are contained in the "avc1" sample entry, and are is represented as follows:

avc1.PPCCLL

That is, the string "avc1." (or "avc2.", I'm not sure what the difference is yet), followed by 3 bytes represented in hex without the "0x" prefix, where the bytes represent the following:

PP = profile_idc
CC = constraint_set flags
LL = level_idc

These fields are defined in in Annex 1 of ITU-T H.264 and ISO/IEC 14496-10:2012  twinned standards. ITU-T H.264 can be downloaded for free.

profile_idc defines the H.264 profile. ITU-T H.264 doesn't have a single table listing what the different profile_idc values mean, but handily, Microsoft defines an eAVEncH264VProfile enumeration on the decimal values of the profile_idc in Codecapi.h (available on Win7):

enum eAVEncH264VProfile {
  eAVEncH264VProfile_unknown                    = 0,
  eAVEncH264VProfile_Simple                     = 66,
  eAVEncH264VProfile_Base                       = 66,
  eAVEncH264VProfile_Main                       = 77,
  eAVEncH264VProfile_High                       = 100,
  eAVEncH264VProfile_422                        = 122,
  eAVEncH264VProfile_High10                     = 110,
  eAVEncH264VProfile_444                        = 144,
  eAVEncH264VProfile_Extended                   = 88,
  eAVEncH264VProfile_ScalableBase               = 83,
  eAVEncH264VProfile_ScalableHigh               = 86,
  eAVEncH264VProfile_MultiviewHigh              = 118,
  eAVEncH264VProfile_StereoHigh                 = 128,
  eAVEncH264VProfile_ConstrainedBase            = 256,
  eAVEncH264VProfile_UCConstrainedHigh          = 257,
  eAVEncH264VProfile_UCScalableConstrainedBase  = 258,
  eAVEncH264VProfile_UCScalableConstrainedHigh  = 259
};


So for example, avc1.4D401E has a profile_idc of 0x4D, which is 77 in decimal, so it's main profile.

constraint_set flags are encoded as a bit flags 6 bitfields named constraint_set0_flag through to constraint_set5_flag. The meaning of a constraint_setN_flag being set depends on the profile being represented. The bits are stored in with constraint_set0_flag in the high bit, and so there are two padding 0s in the low bits. So for example, avc1.4D401E, the contraint_set flags are 0x40, so that's 01000000b, thus this means constraint_set1_flag is set.

The level number, level_idc, is defined as fixed point, from 1 to 5.2, with an oddball 1b level, as defined in ITU-T H.264 in Table A-1 - Level Limits. The level_idc is encoded in the codecs parameter as 10 times the level number, i.e. level 5.1 is represented as decimal 51, or 0x33 hex. So continuing our example avc1.4D401E, the level_idc is 1E, or 30 decimal, so level 3.0.

Level 1b is an oddball encoding, and is encoded as 11 decimal with the constraint_set3_flag equal to 1. If the constraint_set3_flag is not equal to 1, level 1.1 is encoded.

Microsoft also conveniently define this in the eAVEncH264VLevel enumeration.

enum eAVEncH264VLevel {
  eAVEncH264VLevel1    = 10,
  eAVEncH264VLevel1_b  = 11,
  eAVEncH264VLevel1_1  = 11,
  eAVEncH264VLevel1_2  = 12,
  eAVEncH264VLevel1_3  = 13,
  eAVEncH264VLevel2    = 20,
  eAVEncH264VLevel2_1  = 21,
  eAVEncH264VLevel2_2  = 22,
  eAVEncH264VLevel3    = 30,
  eAVEncH264VLevel3_1  = 31,
  eAVEncH264VLevel3_2  = 32,
  eAVEncH264VLevel4    = 40,
  eAVEncH264VLevel4_1  = 41,
  eAVEncH264VLevel4_2  = 42,
  eAVEncH264VLevel5    = 50,
  eAVEncH264VLevel5_1  = 51,
  eAVEncH264VLevel5_2  = 51

}

No comments: