Aegisub/specs/as5/as5.tex
Rodrigo Braz Monteiro 51bf4fce32 Updated AS5 draft.
Originally committed to SVN as r1401.
2007-07-10 03:29:54 +00:00

143 lines
No EOL
5.9 KiB
TeX

\documentclass{spec}
\newcommand{\syntax}[1]{
\subsubsection*{Syntax}
\begin{tabbing}
\hspace{2cm}\=\\[-16pt]
#1
\end{tabbing}
}
\newcommand{\secspec}[1]{Section:\>\texttt{#1}}
\newcommand{\secspecs}[2]{Sections:\>\texttt{#1}, \texttt{#2}}
\title{\LaTeX}
\date{}
\begin{document}
\title{AS5 Subtitle Format Draft}
\author{Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter}
\spectitle
\section{Abstract}
This document specifies the \emph{AS5 subtitle format}, developed jointly by the
Aegisub\cite{Aegisub} and asa\cite{asa} teams in order to replace the old
\emph{Sub Station Alpha}\cite{SSA} subtitle format and its extensions:
\begin{itemize}
\item Advanced Sub Station Alpha (ASS) implemented by VSFilter\cite{VSFilter}
\item Advanced Sub Station Alpha 2 (ASS2), also implemented by VSFilter
\item Advanced Sub Station Alpha 3 (ASS3) implemented by equinox.
\end{itemize}
The goal is to create a flexible, easy to understand and powerful subtitle format
that can be used in hardsubs or multiplexed into Matroska Video\cite{mkv} files as
softsubs.
\section{File Structure}
\subsection{File Format}
All AS5 files are \emph{REQUIRED} to comply with the three requirements below:
\begin{itemize}
\item Be encoded with one of \emph{UTF-8}\cite{UTF-8}, \emph{UTF-16 Big Endian}
\cite{UTF-16} or \emph{UTF-16 Little Endian} Unicode Transformation Formats. UTF-8 is
preffered.
\item Not to have any character below Unicode code point U+20, except for U+09, U+0A, U+0D.
That is, it must be a plain-text file.
\item All lines must end with Windows line endings, that is, U+0D followed by U+0A.
\end{itemize}
The character set of a subtitle file can be autodetermined by its Byte-Order Mark or by
the value of the first two bytes. See below.
\subsection{File Structure}
The file is divided in \emph{sections}, which are uniquely identified by a string inside
square brackets, in a line of its own. From that point on, every next line is considered
to be part of the last found section until another section is found. There is no end-of-section
termination mark; they always end at the start of the next one or at the end of the file.
Each section is divided in lines, each line representing one command or definition. Empty
lines \emph{MUST} be ignored. It is recommended that programs generating AS5 files insert
a blank line at the end of each section to increase readability. There \emph{MUST} always
be a blank line at the end of the file (as every line is required to end in a line break).
Each line in a section takes the general form of \textit{Type: data1,data2,...,dataN}. An
unknown \textit{Type} \emph{MUST} be ignored by a parser. It is recommended that subtitle
editing programs keep such ignored lines in the file after re-saving it.
There are two sections which are required, \emph{[AS5]} and \emph{[Data]}, the equivalents of
\emph{[Script Info]} and \emph{[Events]} in previous formats. If either of those sections is
missing, the file is deemed invalid and \emph(MUST) be refused by the parser. Any other section
can be ommitted from the file, and need not be implemented by all parsers. However, any unknown
section \emph{MUST} be preserved in the file by a subtitle editing program when it re-saves a
file with sections that it does not recognize. It can, however, be removed at the user's discretion.
Finally, there is a special type of undefined group, \emph{[Private:PROGNAME]}, which
\emph{MUST} be \emph{ENTIRELY} preserved by other programs when re-saving it. This is used to
store program-specific data, for example, Aegisub would create a group called
\emph{[Private:Aegisub]} to store its data inside. This type of group should be identified
by the fact that it starts with \emph{"`[Private:"'}.
\subsubsection{[AS5]}
This must be the first section in every AS5 file. If the very first line of the file is not
[AS5], the file \emph{MUST} be rejected by the parser as invalid. Note, however, that the first
line is allowed to contain a Byte-Order Mark (BOM), which is the character U+FEFF encoded in
the encoding used for the rest of the script\cite{Unicode BOM}. The first four bytes will therefore be:
\begin{itemize}
\item 0xEF 0xBB 0xBF 0x5B - UTF-8 (with BOM)
\item 0x5B 0x41 0x53 0x53 - UTF-8 (without BOM)
\item 0xFF 0xFE 0x5B 0x00 - UTF-16 LE (with BOM)
\item 0x5B 0x00 0x41 0x00 - UTF-16 LE (without BOM)
\item 0xFE 0xFF 0x00 0x5B - UTF-16 BE (with BOM)
\item 0x00 0x5B 0x00 0x41 - UTF-16 BE (without BOM)
\end{itemize}
It is possible, therefore, to determine the encoding of the file by checking its first two bytes.
This section \emph{MUST} declare the following properties:
\addcontentsline{toc}{section}{References}
\begin{thebibliography}{1}
\bibitem{Aegisub} Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter et al., Aegisub. Application, 2005-2007.\\
\url{http://www.aegisub.net/}
\bibitem{asa} David Lamparter, asa. Application, 2004-2007.\\
\url{http://asa.diac24.net/}
\bibitem{SSA} Kotus, Sub Station Alpha. Website, 1997-2003.\\
\url{http://web.archive.org/web/*/http://www.eswat.demon.co.uk/substation.html}
\bibitem{ASS} \#Anime-Fansubs, Advanced Sub Station Alpha.\\
\url{http://www.anime-fansubs.org}\\
\url{http://moodub.free.fr/video/ass-specs.doc}
\bibitem{VSFilter} Gabest, VSFilter. Application, 2003-2007.\\
\url{http://sourceforge.net/projects/guliverkli/}
\bibitem{ASS3} David Lamparter, Advanced Sub Station Alpha 3. Website, 2007.\\
\url{http://asa.diac24.net/ass3.pdf}
\bibitem{mkv} The Matroska project.\\
\url{http://www.matroska.org/}
\bibitem{UTF-8} The Internet Society, RFC 3629, "`UTF-8, a transformation format of ISO 10646"'. Website, 2003.\\
\url{http://tools.ietf.org/html/rfc3629}
\bibitem{UTF-16} The Internet Society, RFC 2781, "`UTF-16, an encoding of ISO 10646"'. Website, 2000.\\
\url{http://tools.ietf.org/html/rfc2781}
\bibitem{Unicode BOM} Unicode, Inc, The Unicode Standard, Chapter 13. PDF, 1991-2000.\\
\url{http://www.unicode.org/unicode/uni2book/ch13.pdf}
\end{thebibliography}
\end{document}