51bf4fce32
Originally committed to SVN as r1401.
143 lines
No EOL
5.9 KiB
TeX
143 lines
No EOL
5.9 KiB
TeX
\documentclass{spec}
|
|
\newcommand{\syntax}[1]{
|
|
|
|
\subsubsection*{Syntax}
|
|
|
|
\begin{tabbing}
|
|
|
|
\hspace{2cm}\=\\[-16pt]
|
|
|
|
#1
|
|
|
|
\end{tabbing}
|
|
|
|
}
|
|
\newcommand{\secspec}[1]{Section:\>\texttt{#1}}
|
|
\newcommand{\secspecs}[2]{Sections:\>\texttt{#1}, \texttt{#2}}
|
|
|
|
\title{\LaTeX}
|
|
\date{}
|
|
|
|
\begin{document}
|
|
\title{AS5 Subtitle Format Draft}
|
|
\author{Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter}
|
|
\spectitle
|
|
|
|
|
|
\section{Abstract}
|
|
This document specifies the \emph{AS5 subtitle format}, developed jointly by the
|
|
Aegisub\cite{Aegisub} and asa\cite{asa} teams in order to replace the old
|
|
\emph{Sub Station Alpha}\cite{SSA} subtitle format and its extensions:
|
|
|
|
\begin{itemize}
|
|
\item Advanced Sub Station Alpha (ASS) implemented by VSFilter\cite{VSFilter}
|
|
\item Advanced Sub Station Alpha 2 (ASS2), also implemented by VSFilter
|
|
\item Advanced Sub Station Alpha 3 (ASS3) implemented by equinox.
|
|
\end{itemize}
|
|
|
|
The goal is to create a flexible, easy to understand and powerful subtitle format
|
|
that can be used in hardsubs or multiplexed into Matroska Video\cite{mkv} files as
|
|
softsubs.
|
|
|
|
|
|
\section{File Structure}
|
|
\subsection{File Format}
|
|
All AS5 files are \emph{REQUIRED} to comply with the three requirements below:
|
|
|
|
\begin{itemize}
|
|
\item Be encoded with one of \emph{UTF-8}\cite{UTF-8}, \emph{UTF-16 Big Endian}
|
|
\cite{UTF-16} or \emph{UTF-16 Little Endian} Unicode Transformation Formats. UTF-8 is
|
|
preffered.
|
|
\item Not to have any character below Unicode code point U+20, except for U+09, U+0A, U+0D.
|
|
That is, it must be a plain-text file.
|
|
\item All lines must end with Windows line endings, that is, U+0D followed by U+0A.
|
|
\end{itemize}
|
|
|
|
The character set of a subtitle file can be autodetermined by its Byte-Order Mark or by
|
|
the value of the first two bytes. See below.
|
|
|
|
\subsection{File Structure}
|
|
The file is divided in \emph{sections}, which are uniquely identified by a string inside
|
|
square brackets, in a line of its own. From that point on, every next line is considered
|
|
to be part of the last found section until another section is found. There is no end-of-section
|
|
termination mark; they always end at the start of the next one or at the end of the file.
|
|
|
|
Each section is divided in lines, each line representing one command or definition. Empty
|
|
lines \emph{MUST} be ignored. It is recommended that programs generating AS5 files insert
|
|
a blank line at the end of each section to increase readability. There \emph{MUST} always
|
|
be a blank line at the end of the file (as every line is required to end in a line break).
|
|
|
|
Each line in a section takes the general form of \textit{Type: data1,data2,...,dataN}. An
|
|
unknown \textit{Type} \emph{MUST} be ignored by a parser. It is recommended that subtitle
|
|
editing programs keep such ignored lines in the file after re-saving it.
|
|
|
|
There are two sections which are required, \emph{[AS5]} and \emph{[Data]}, the equivalents of
|
|
\emph{[Script Info]} and \emph{[Events]} in previous formats. If either of those sections is
|
|
missing, the file is deemed invalid and \emph(MUST) be refused by the parser. Any other section
|
|
can be ommitted from the file, and need not be implemented by all parsers. However, any unknown
|
|
section \emph{MUST} be preserved in the file by a subtitle editing program when it re-saves a
|
|
file with sections that it does not recognize. It can, however, be removed at the user's discretion.
|
|
|
|
Finally, there is a special type of undefined group, \emph{[Private:PROGNAME]}, which
|
|
\emph{MUST} be \emph{ENTIRELY} preserved by other programs when re-saving it. This is used to
|
|
store program-specific data, for example, Aegisub would create a group called
|
|
\emph{[Private:Aegisub]} to store its data inside. This type of group should be identified
|
|
by the fact that it starts with \emph{"`[Private:"'}.
|
|
|
|
\subsubsection{[AS5]}
|
|
This must be the first section in every AS5 file. If the very first line of the file is not
|
|
[AS5], the file \emph{MUST} be rejected by the parser as invalid. Note, however, that the first
|
|
line is allowed to contain a Byte-Order Mark (BOM), which is the character U+FEFF encoded in
|
|
the encoding used for the rest of the script\cite{Unicode BOM}. The first four bytes will therefore be:
|
|
|
|
\begin{itemize}
|
|
\item 0xEF 0xBB 0xBF 0x5B - UTF-8 (with BOM)
|
|
\item 0x5B 0x41 0x53 0x53 - UTF-8 (without BOM)
|
|
\item 0xFF 0xFE 0x5B 0x00 - UTF-16 LE (with BOM)
|
|
\item 0x5B 0x00 0x41 0x00 - UTF-16 LE (without BOM)
|
|
\item 0xFE 0xFF 0x00 0x5B - UTF-16 BE (with BOM)
|
|
\item 0x00 0x5B 0x00 0x41 - UTF-16 BE (without BOM)
|
|
\end{itemize}
|
|
|
|
It is possible, therefore, to determine the encoding of the file by checking its first two bytes.
|
|
|
|
This section \emph{MUST} declare the following properties:
|
|
|
|
|
|
\addcontentsline{toc}{section}{References}
|
|
\begin{thebibliography}{1}
|
|
|
|
\bibitem{Aegisub} Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter et al., Aegisub. Application, 2005-2007.\\
|
|
\url{http://www.aegisub.net/}
|
|
|
|
\bibitem{asa} David Lamparter, asa. Application, 2004-2007.\\
|
|
\url{http://asa.diac24.net/}
|
|
|
|
\bibitem{SSA} Kotus, Sub Station Alpha. Website, 1997-2003.\\
|
|
\url{http://web.archive.org/web/*/http://www.eswat.demon.co.uk/substation.html}
|
|
|
|
\bibitem{ASS} \#Anime-Fansubs, Advanced Sub Station Alpha.\\
|
|
\url{http://www.anime-fansubs.org}\\
|
|
\url{http://moodub.free.fr/video/ass-specs.doc}
|
|
|
|
\bibitem{VSFilter} Gabest, VSFilter. Application, 2003-2007.\\
|
|
\url{http://sourceforge.net/projects/guliverkli/}
|
|
|
|
\bibitem{ASS3} David Lamparter, Advanced Sub Station Alpha 3. Website, 2007.\\
|
|
\url{http://asa.diac24.net/ass3.pdf}
|
|
|
|
\bibitem{mkv} The Matroska project.\\
|
|
\url{http://www.matroska.org/}
|
|
|
|
\bibitem{UTF-8} The Internet Society, RFC 3629, "`UTF-8, a transformation format of ISO 10646"'. Website, 2003.\\
|
|
\url{http://tools.ietf.org/html/rfc3629}
|
|
|
|
\bibitem{UTF-16} The Internet Society, RFC 2781, "`UTF-16, an encoding of ISO 10646"'. Website, 2000.\\
|
|
\url{http://tools.ietf.org/html/rfc2781}
|
|
|
|
\bibitem{Unicode BOM} Unicode, Inc, The Unicode Standard, Chapter 13. PDF, 1991-2000.\\
|
|
\url{http://www.unicode.org/unicode/uni2book/ch13.pdf}
|
|
|
|
\end{thebibliography}
|
|
|
|
\end{document} |