Understanding WebVTT file format (draft)

Julien Villetorte <j.villetorte[at]gmail[dot]com> @delphiki
Lastest update: 2011, 20 May
Thanks to HTML5Doctor, Bruce Lawson & Simon Pieters (Opera Software).

WebVTT is widely based on the SubRip file format.
Compatible player: Playr.

File specifications

Encoding: UTF-8
MIME type: text/vtt
Line terminator: \r, \n or \r\n

Format specifications

# File header

WEBVTT FILE

[cue]

...

# Cue format

[one or more characters not containing the substring "-->" or \r, \n, \r\n]
[hh...:]mm:ss.msmsms --> [hh...:]mm:ss.msmsms [settings]
First line
Second line
...
Example:
WEBVTT FILE

1
01:23:45.678 --> 01:23:46.789
Hello world!

2
01:23:48.910 --> 01:23:49.101
Hello
world!
Milliseconds separators are full stops (.) not a commas (,).
Cues have to be separated by one (or more) blank line.

# Cue settings

Settings have to placed right after the timing, on the same line, separated with one (or more) space or tabulation.

Vertical text
  • D:vertical (vertical growing left)
  • D:vertical-lr (vertical growing right)
Line position
  • A specific position relative to the video frame:
    L:[a number]%, where [a number] is a positive integer.
  • A line number:
    L:[a number], where [a number] is a positive or negative integer.
Text position T:[a number]%, where [a number] is a positive integer.
Text size S:[a number]%, where [a number] is a positive integer.
Text alignment A:start or A:middle or A:end

Cue setting example:
WEBVTT FILE

1
01:23:45.678 --> 01:23:46.789 D:vertical
Hello world!

2
01:23:48.910 --> 01:23:49.101 S:50%
Hello
world!

# Cue text

Replacements
  • & has to be replaced with &amp;
  • < has to be replaced with &lt;
  • > has to be replaced with &gt;
Voice declaration tags
  • <v.Name>
Example:
01:23:45.678 --> 01:23:46.789
- <v.John>Hey!</v>
- <v.Jane>Hey!</v>
Text tags
  • Class: <c.classname>Your text</c>
  • Bold: <b>Your text</b>
  • Italic: <i>Your text</i>
  • Underline: <u>Your text</u>
  • Ruby annotations: <ruby>base text<rt>annotation</rt></ruby>
If you want your text to appear step-by-step (karaoke style), just put intermediate timestamps (wrapped with <...>) in your cue.
Example:
01:23:45.678 --> 01:23:46.789
One... <01:23:45.800>Two... <01:23:46.500>Three...