Automation 4 Subtitle Data format and functions API

This file describes the API for retrieving and manipulating subtitle data in
Automation 4, as well as the data structures generated and accepted by these
APIs.

---

Subtitle Line table

A Subtitle Line table contains various information about a line in the
subtitle file being processed. There are several classes of subtitle lines,
which all have a few fields in common, but otherwise have different fields.


Common keys for all Subtitle Line classes:

class (string)
  The class of the Subtitle Line. Must be one of:
    "clear", (empty line)
    "comment", (semicolon-style comment line)
    "head", (section heading)
    "info", (key/value pair in Script Info section)
    "format", (line format definition)
    "style", (style definition line)
    "stylex", (style extension line, tentative AS5 feature)
    "dialogue", (dialogue line or dialogue-style comment line)
    "unknown" (unknown kind of line)
  Lines of the "unknown" class should never appear in well-formed subtitle
  scripts. They will usually be a result of a line being outside the section
  it belongs in.

raw (string)
  The raw text of the line.
  You should not change this field directly, since the data in it will never
  be used for generating lines in internal representation. It will, however,
  be updated from the remaining data in the line whenever it is required for
  one thing or another by an internal function.

section (string)
  The section this line is placed in. If it is placed before the first section
  heading, this field is nil.


Key defined only for the "comment" class:

text (string)
  The text of the comment line, ie. everything after the initial semicolon,
  including any spaces.


Key defined only for the "head" class:

No special keys are defined for this class, but the "section" field will be
the name of the new section started.


Keys defined only for the "info" class:

key (string)
  The "key" part of the line, ie. everything before the first colon.

value (string)
  The "value" part of the line, ie. everything after the first colon,
  discarding any leading whitespace.


Keys defined only for the "format" class:

fields (table)
  An Array Table of strings, each being the name of a field on the line.


Keys defined only for the "style" class:

name (string)
  Name of the style.

fontname (string)
  Name of the font used.

fontsize (number)
  Size of the font used, in pixels.

color1 (string)
color2 (string)
color3 (string)
color4 (string)
  The four colors for the style. (Fill, pre-karaoke fill, border, shadow)
  In VB hexadecimal, ie. "&HAABBGGRR&"

bold (boolean/number)
  The boldness/weight of the font. This will usually be a boolean, but it
  can be a number, in which case it must be one of 0, 100, 200, ..., 900.

italic (boolean)
underline (boolean)
strikeout (boolean)
  Other properties of the font.

scale_x (number)
scale_y (number)
  Scaling of the text, in percent.

spacing (number)
  Additional spacing between letters. Always integer.

angle (number)
  Rotation of the text on the Z axis in degrees.

borderstyle (number)
  1 = outline and drop shadow; 3 = opaque box.

outline (number)
  Width of the outline.

shadow (number)
  Distance between shadow and text.

align (number)
  Numpad alignment of the text.

margin_l (number)
margin_r (number)
margin_t (number)
margin_b (number)
  Left/right/top/bottom margins of the text in pixels.
  If using a format without support for separate top and bottom margins, the
  margin_t value will be used for vertical margin when converting back to
  textual representation.

encoding (number)
  Font encoding used for text. This follows the MS Windows font encoding
  constants.

relative_to (number)
  From STS.h: "0: window, 1: video, 2: undefined (~window)"

vertical (boolean)
  Whether vertical text semantics is used or not. (Tentative AS5 field.)


Keys defined only for the "stylex" class:

Remember that this class is only for the tentative AS5 format.

name (string)
  Name of the new style defined.

basename (string)
  Name of the style the new style is based on.

overrides (string)
  String of override tags defining the new style.


Keys only defined for the "dialogue" class:

comment (boolean)
  True if the line is a comment line, otherwise false.

layer (number)
  The layer the line is rendered in.

start_time (number)
end_time (number)
  Start/end time of the line in milliseconds.

style (string)
  Name of the style assigned to this line.

actor (string)
  Name of the actor performing this line.

margin_l (number)
margin_r (number)
margin_t (number)
margin_b (number)
  Left/right/top/bottom margins of the text in pixels.
  If any of these are zero, it's considered "not overriding".
  Same comment as for style lines applies.

effect (string)
  Effect to apply to the line.

userdata (string)
  Authoring-application defined data. (Tentative AS5 field.)

text (string)
  The text for this line.

---

Subtitle File user data object

The Subtitle File object is a user data object with some of the metatable
methods overridden to provide table-like access to the subtitle lines, as
well as some functions to modify the subtitles.

The following operations are supported.

n = #subs
n = subs.n
  Retrieve the number of lines in total.
  The first syntax is preferred.

line = subs[i]
  Retrieve line i, assuming 1 <= i <= n.

subs[i] = line
  Replace line i with new data.

subs[i] = nil
subs.delete(i[, i2, ...])
  Delete line i, pushing all following lines up an index. (Ie. by repeatedly
  deleting line 1 this way, the file will eventually end up empty.)
  The function syntax for this function can also take multiple line indexes,
  in which case it deletes each of those lines. All indexes are relative to
  the line numbering before the function is called.

subs.deleterange(a, b)
  Deletes all lines from index a to index b, both inclusive. If b < a,
  nothing is done.

subs[0] = line
subs.append(line[, line2, ...])
  Append one or more lines to a file.

subs[-i] = line
subs.insert(i, line[, line2, ...])
  Insert one or more lines before index i.


Effeciency concerns

Internally in Aegisub the subtitles are stored in a linked list, meaning
random access runs in O(n) time rather than O(1), as the interface presented
here could make it seem like.

Internally, a cursor to the last accessed item is kept, to make sequential or
mostly-sequential access to items faster, but totally random access will
still be somewhat ineffecient. Accessing items near the start or end of the
subtitle file can also be done reasonable fast however.

After an item-by-item deletion, the cursor will be placed at the item that
now has the lowest id specified for the operation.
After a range deletion operation, the cursor will be placed at the item
that now has the first id in the range of the operation.
After an insert operation, the cursor will be placed after the last inserted
item.
An append operation does not move the cursor.

---

Parsing tag data

This function uses the Aegisub SSA parser to split a string into override
blocks and text, and give separate access to each tag in override blocks.

function aegisub.parse_tag_data(text)

@text (string)
  The SSA-format string to parse.

Returns: A Parsed Tag Data table.

---

Recreating a line from tag data

This function takes a Parsed Tag Data table and creates an SSA string from it.

function aegisub.unparse_tag_data(tagdata)

@tagdata (table)
  The Parsed Tag Data table to "unparse".

Returns: A string, being the "unparsed" data.

---

Parsing karaoke data

Tihs function uses the Aegisub SSA parser to split a string into karaoke
syllables with additional calculated information added.

function aegisub.parse_karaoke_data(text)

@text (string)
  The SSA-format string to parse.

Returns: A Parsed Karaoke Data table.

---

Parsed Tag Data table

The Parsed Tag Data table is an Array Table containing a number of Parsed Line
Block tables.


Parsed Line Block table

A Parsed Line Block describes part of a line. (See ass_dialogue.cpp:70 for a
more complete description of this.
There are several classes of Parsed Line Block tables, which have slightly
varying fields.


Base Parsed Line Block table class

class (string)
  One of:
    "plain",
    "override",
    "drawing"


"plain" and "drawing" Parsed Line Block table classes

text (string)
  The text contained in this block.


"override" Parsed Line Block table class

This class doesn't have any new, specifically named fields. It does, however,
have multiple integer indexed fields, ie. acts as an Array Table.
Each of these indexes refer to a table of type Parsed Override Tag.


Parsed Override Tag table

This table describes a single override-tag in an SSA line.

valid (boolean)
  Whether this tag was parsed as a valid tag, or is just used for representing
  junk in the middle of the line.

name (string)
  Name of the tag, including leading backslash character. (In the case of
  invalid tags, the leading backslash might not be present.)

paran (boolean)
  Whether this tag has parantheses in its textual representation.

params (table)
  This is an Array Table containing the parameters for this tag. It will
  always have the maximum number of entries that can be supported by the tag,
  but in case of omitted parameters, the parameters omitted will have 'false'
  for value in this table.

---

Parsed Karaoke Data table

The Parsed Karaoke Data table is simply an Array Table of Karaoke Syllable
Data tables. However, the Parsed Karaoke Data table will always have one more
item than its count shows, which is numbered 0 (zero). This item contains
everything before the first syllable.


Karaoke Syllable Data table

This table contains information about a single karaoke syllable.

duration (number)
  Duration of the syllable in milliseconds.
  (In case of syllable zero, this is always zero. Also zero for "kt" tags.)

start_time (number)
end_time (number)
  Start and end times of the syllable, relative to the start of the line,
  given in milliseconds.
  (While these can technically easily be calculated from the duration data,
  they are too convenient to leave out from the standard interface.)

tag (string)
  The tag used for marking up this syllable. Usually one of:
    "k", "K", "kf", "ko", "kt"
  (In case of syllable zero, this is always the empty string.)

text (string)
  The text of the syllable, including all additional override tags.

text_stripped (string)
  The text of the syllable, stripped of any override tags.

---

Setting undo points

This function can only be used in macro features, it will have no effect when
used in any other feature.
It sets an undo point.

You should always call this function after every operation that must be
undoable as a separate step. It is considered very bad practice to modify
the subtitles in a macro without setting at least one undo-point, which must
be at the end.

An undo step can consist of any number of subtitle operations. (Ie. inserting,
deleting and replacing subtitle file lines.)

Furthermore, this function also marks the subtitles as "modified". No other
function does this.

function aegisub.set_undo_point(description)

@description (string)
  A short description of the operation that will be undone, if this undo-point
  is used.

Returns: nothing.

---