CME FAQ Sheet #4: .ae and .sgm
Author/Editor is not always the best tool for the job. You
may, if you like, work with the raw .sgm file to accomplish
particular ends. This is purely optional. And if you do, be
careful:
-
It is quite possible to do things to the .sgm file that
will make it difficult or impossible to re-import it
into A/E. It is quite possible to make global changes
that inadvertently wipe out whole swatches of text.
(I've done both of these things, more than once.)
- So make sure that you save, make backups under other
names, whatever it takes to ensure that if you do something
fatal, you can always recover from it by using an unaffected
file.
- If you do move back and forth between the .ae and .sgm files,
make sure that you don't create a "version" problem, where
you make changes in one file that don't get incorporated
in the other. The easiest way to do that is to treat the
.ae file as the authoritative one, export an .sgm file
to do some particular task (after saving the .ae), do what
you need to, then reimport it into .ae again immediately.
- Bear in mind always that the .ae files are versions of
the .sgm file that are "imported" into A/E's native
binary format. So one "opens" and "saves" .ae files;
but one "imports" and "exports" .sgm files.
- Bear in mind that A/E is notorious for introducing
carriage returns into the .sgm files that it
exports, often in the middle of tags (between
the element name and the attribute name):
<DIV2
TYPE="chapter">
This means that some searches and some replacements
that you try may fail unless you restore some
predictability to the location of the carriage
returns.
Some reasons to work with the .sgm:
- Better searching, especially of attributes.
- Extraction. You can search for patterns and create a
list of matches.
- Better find-and-replace.
- Better validation, using NSGMLS.
- Ability to run search-and-replace on a selected
portion of the text instead of the whole thing
(e.g. change all the <DIV2>s in this section to
<DIV3>s).
Some available tools:
- TextPad : a good basic text editor with good support
of regular expressions (pattern-matching) in both
find and search-and-replace modes. Able to extract
lists of matches using the "find-in-files" feature
(ctrl-F5).
- Other basic editors, such as Windows NotePad. Most
of these are weak in features, but they will allow
you to (for example) cut out pieces of the
DocType declaration that A/E won't allow you to touch.
Make sure that whatever editor it is, it saves as
plain (ASCII) text.
- Emacs : a venerable and powerful text editor with
a very difficult interface, good regexp support, and
available integration with an interactive parser
(PSGML) and validator (NSGMLS).
- Perl. A scripting language with very powerful abilities
to manipulate text.
- NSGMLS. A command-line validator that will often give
more useful error messages than A/E does.
(3), (4) and (5) are not for the fainthearted, and are not
installed on many (or any?) of the machines. If we get
to a point where we need them, I'll try to install them.
In the mean time, I will try to install at least TextPad
on all the HTI machines.