[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Installation and package tools document, version 1.0

A good read...

Gregory S. Sutter                 Bureaucrats cut red tape--lengthwise.
PGP DSS public key 0x40AE3052
--- Begin Message ---

Without a lot of preamble, let me just say that all that talk of
FreeBSD needing a more active specifications and management process
finally got me motivated into writing all this down.  This being
version 1.0 of this document, I also expect it to go through multiple
versions as I get feedback on it, so please consider it merely the
start of an ongoing effort to write down all these installation and
packaging thoughts which have been rattling around my head these past
6 or so years.  See the Preface for more information, and thanks in
advance for being willing to read through a 5300 word document. :-)

- Jordan

Title: FreeBSD installation and package tools, past, present and future
Date: September 8th, 2000
Author: Jordan K. Hubbard
Version: 1.0


This document discusses FreeBSD's installation, configuration and
package management tools from the perspective of where they are and
where I think they need to go.

1. Preface

2. History and current limitations
   2.1 The package tools
   2.2 Sysinstall

3. The Future
   3.1 FreeBSD's distribution format
   3.2 User Interface
   3.3 Security
   3.4 Configuration and version control
   3.5 Installation scripting

4. Appendix: Current efforts
   4.1 libh
   4.2 lizard

1. Preface

There has been a lot of discussion throughout FreeBSD's history as to
just what purpose sysinstall and the pkg_install suite were intended
to achieve, what their shortcomings are and how we might move forward
with a design document which breaks the various challenges into more
manageable pieces which might be implemented by a number of different

It's long been my desire to sit down and do exactly that, a lack of
time being my only excuse for not having done so long ago.  I'm also
of the understanding that a new "open packages" effort was recently
started by some of the people at Daemon News, a project with parallels
to some of the existing efforts to get all the various open source
projects to standardize on existing package formats like RPM, Debian
packages, etc., and a good excuse for me to finally do this.

I'm certainly all in favor of a standardization effort based around
some viable and practical second-generation technology and can only
hope that producing this document will in some way aid the design of a
next-generation package and installation system.  Should such an
effort ultimately prove itself attractive to a large segment of the
open source community then all the better, but we have to start
somewhere and that somewhere, for me at least, is FreeBSD.  The
existing package systems (RPM, Deb, *BSD) all suffer from being
first-generation efforts and, while quite mature, do not address a
number of significant issues which I'll cover in this document.  I'll
also document some of the design decisions which went into FreeBSD's
current system, hopefully explaining some of the [mis]features which
have confused newcomers to FreeBSD or caused them to wonder just why
things were not done differently.

2. History and current limitations

2.1 The package tools

The FreeBSD package tools, located in /usr/src/usr.sbin/pkg_install,
were written in August of 1993 in response to several requirements
that we had at the time.  Most significantly, it was not possible to
easily track "extra software" that one might add to the system and
conceivably wish to easily remove again, nor was it easy to see which
versions of software had been installed on a given system for easier
troubleshooting.  Finally, any specialized installation procedures for
a given piece of software essentially had to be done manually by
reading the README file (when available) accompanying the binary
distribution tarball, assuming of course that anything other than
sources which you needed to build yourself were available.

After looking at the problem for awhile, I decided that the quickest
and easiest solution would be to simply add a little extra "meta-data"
to these existing binary tarballs, something which could then be
executed and recorded for future reference by a package adding
utility.  Thus were born the pkg_install utilities we have today.

At the time, system administrators were also very mistrustful of
pre-built binary distributions of software (not that many would
actually read source code before building and installing binaries from
it, but that's another story) so that's why I decided to use an
existing archive format, namely gzipped tar files.  This approach
allowed paranoid admins to easily extract a package manually and
inspect it, it also allowing me to leverage our existing tools
relatively easily (though one feature, --fast-read, did need to be
added to tar so that individual items could be extracted more

There were and are problems with this approach, however, the most
significant being that tar files (especially gzipped ones) are NOT
very amenable to random-access.  The directory structure of a tarfile
is distributed, e.g. the file data is interleaved with the directory
meta-data and, in order to get to a given item in a tarball,
pkg_add(1) needs to read serially through the whole thing looking for
it.  This can be an especially big problem when all it has to work
with is a file handle and not an actual file, something which is the
case when a package is coming directly from an FTP server or some
other data source which offers only serial access to the bits.

pkg_add "solves" this problem by first finding sufficient temporary
space on one of the available file systems and then unpacking the
tarball to be extracted into a scratch directory.  After the tarball
is extracted, pkg_add then reads through the "packing list" (one of
the meta-data files) and follow its instructions to move only those
parts of the unpacked tarball into place which are needed, thus
skipping the meta-data files and any others which might be optional
and not actually requested by the user.  During this process, it is
then possible to run any custom installation scripts the package might
have provided to ask the user configuration questions, do special
permissions/conflict checks, and run through the package's list of
dependencies on other packages to see if they should be somehow
fetched and installed as well.

All in all, it's a very general purpose and open-ended mechanism which
many packages have used to good effect, but the temporary directory
requirement would also turn around to bite me firmly on the ass when
it came time to write sysinstall, which followed in April of 1995.

2.2 Sysinstall

Sysinstall, located in /usr/src/release/sysinstall, was FreeBSD's
first attempt at doing something more elegant and user-friendly than a
simple shell script-based installation which merely asked questions in
a fixed order and gave the user little opportunity to do different
types of installation and configuration.  The "first draft" of
sysinstall was actually meant to be little more than a prototype of
the installer I really wanted to write, especially from the user
interface perspective since it used something called dialog(3).  The
dialog library began its life as a monolithic utility for writing
semi-graphical shell scripts and was pressed, with great reluctance,
into the duty of functioning as an interface library for C
programmers.  At the time, this seemed the easiest course of action
given that I wasn't overly keen on writing a new set of interface
components in curses(3) and the dialog library provided some fairly
colorful canned dialogs which looked, at least for the time,
reasonably visually impressive.

In retrospect, this was also one of my biggest mistakes given that
dialog(3) is also extremely limited in the user-friendliness
department and lacks features like the ability to put more than 2
buttons into a dialog or a Yes/No dialog which had a selectable
default (e.g. No).  The inability to put a "Back" button into various
dialogs which could really use one or the necessity for asking only
"positive" questions are outgrowths of those limitations and good
examples of how an insufficiently powerful UI library can drive the
utility-writer in undesirable but unavoidable directions.

The dialog library also features checkbox/radio menus which use the
spacebar and enter keys very, erm, creatively to essentially confuse
the heck of out users who don't pay too much attention to the Usage
instructions at the beginning and simply impulsively hit Enter through
the whole installation.  Earlier versions of the library also
completely lacked the idea of call-backs, so any form of real
"dynamism" in a menu or dialog was pretty much out of the question.
The things I had to do to this library in order to provide those
features were so hideous that I'll probably go to a special
programmer's hell when I die and be forced to do AI programming in
RPG-II, or something, it also souring me on the idea of extending
dialog(3) to the point where it might have actually made sysinstall
less pathological in its interface behavior.

The user interface library has also turned out to be not the least of
sysinstall's design shortcomings.  Since it was, at least in my mind,
a prototype, there wasn't a lot of attention put into the area of
flexibility.  I provided for things like "Expert" and "Novice" (now
less-insultingly named "Standard") installs, but I didn't really do
much for people who wished to build many machines in a more
assembly-line fashion or allow the user to save their answers to its
questions for later "replay" into another installation session.
Extending sysinstall also requires a knowledge of C programming (and
the willingness to hack on a prototype) in order to customize it for
other purposes, say a university environment where special course-ware
might be part of the FreeBSD installation at the beginning of each
semester.  It's nowhere near as easy as it should be and many have
been impaled on sysinstall in their efforts to customize FreeBSD for
their unique needs.

An even more significant issue with sysinstall and FreeBSD's release
methodology in general is the distribution format of FreeBSD itself
and sysinstall's handling of packages, especially interactive ones.
FreeBSD's release methodology has really not changed all that much in
the last 8 years, the basic distribution format still being largely
influenced by the size of a 3.5" floppy.  Each chunk of a FreeBSD
distribution, e.g. the "bin" or "manpages" distributions, is nothing
more than one big gzipped tarball which has been split into 240K
chunks which can conveniently fit on floppies, 5 to a 5.25" floppy or
6 to a 3.5" one.  Back in 1992, when we first started doing this,
there were a lot of people doing floppy installs and CDs were still
uncommon and/or expensive.  Sysinstall was therefore designed to take
a lot of the hair out of the process by automagically gluing these
240K chunks together as they came along, from whatever distribution
medium was available, and feeding them to a background tar process
which would simply extract them verbatim into a directory (usually,
but not always, /).

There are lots of problems with this, one being the fact that since a
"distribution" is nothing more than a gzipped tarball split into
pieces, there is none of the nifty meta-data which packages provide to
say what has been installed, what dependencies it has, or any hooks
for providing post-installation configuration opportunities.  Even
component size information is a mystery, making sysinstall unable to
predict when you've chosen more distribution data than will fit on a
given filesystem, leading to occasionally unpleasant surprises during
installation when something fills up and simply exlodes in a messy and
unhelpful fashion.

A bigger problem is the fuzzy and entirely undesirable dividing line
between packages and distributions.  What should be a distribution and
what should be a package?  Where does the ``base distribution'' stop
and the ports/packages collection begin? How should one upgrade the
respective bits?  Erasing this line of demarcation has proven to be
one of the more annoying challenges in FreeBSD's release engineering
process and I'll explain how and why later in this document.

Finally, sysinstall simply represents a conglomeration of too many
tasks.  It partitions your disk(s), it loads software, it asks you
questions about your network interfaces, it sets up your ppp
connection, etc etc.  It just tries to do too much in one place and
that's a violation of the Unix Philosophy, where each component should
do one easily recognizable task and no more than that, more complex
tasks being achieved by putting such tools together.

What we currently think of as sysinstall should essentially do nothing
more than partition your disks and get a much fancier second-stage
"configurator" onto the root partition before rebooting.  At that
stage, the configurator can give the user the option of adding the
other disks and chosing what kinds of software to put on them.  The
scope of the configurator should be such that it becomes a
general-purpose setup tool which can be used to manage all the
hardware and software in the system on an ongoing basis, not simply
run once and forgotten.

3. The Future

3.1 FreeBSD's distribution format

As I mentioned in the history section, one of the more annoying
problems with FreeBSD's current distribution format is the dividing
line between distributions and packages.  There should really only be
one type of "distribution format" and, of course, it should be the
package (There Can Be Only One).  Achieving this means we're first
going to have to grapple with several problems, however:

First, eliminating the distribution format means either teaching the
package tools how to deal with a split archive format (they currently
do not) or divorcing ourselves forever from floppies as a distribution
medium.  This is an issue which would seem an easy one to decide but
invariably becomes Highly Religious(tm) every time it's brought up.
In some dark corner of the world, there always seems to be somebody
still installing FreeBSD via floppies and even some of the fortune 500
folks can cite FreeBSD success stories where they resurrected some old
386 box (with only a floppy drive and no networking/CD/...) and turned
it into the star of the office/saved the company/etc etc.  That's not
to say we can't still bite that particular bullet, just that it's not
a decision which will go down easily with everyone and should be well

Second, there's the issue of packages currently requiring temporary
space as part of their extraction method.  If we're going to have
things like "bin" be a package, even if we split it up into
subcomponents and make "bin" simply a package which contains a list of
dependencies and nothing more (which is desirable), there are still
going to be pieces which are non-extractable under the current scheme
because the available disk space is too small to contain both the
temporary copy and the final installed copy, which may not be on the
same file system can cannot be simply moved into place.  Since we'd
also like to retain the ability to extract a package directly over a
network connection and never have the temporary bits "hit the disk",
this means that we're almost certainly going to have to go to a
different archival format.  Fortunately, there are some existing
formats to choose from which have a lot of the required features so we
won't have to reinvent the wheel and come up with our own (yuck).  My
current favorite is the Zip archive format.

Zip is a popular archival format which gives us a wide variety of
existing tools for creating, fixing and inspecting zip files.  The
directory is also at the very beginning so we can quickly read it in
and figure out where in the data stream/file we need to go to get a
specific item.  Since the "configurator" stage of the installation
will also be running after we've acquired a root partition and some
swap space, it's also not inconceivable that we could buffer bits read
over a network connection in memory so that even "seeking" out to the
end of an archive file read from an FTP server socket would still
allow us to move backwards in the archive for other contents.  The zip
file format also allows for per-archive and per-file "comment" fields
which can be used to store things like MD5 checksums, pgp signatures
and all sorts of other potentially useful types of meta-data.  I'm not
wedded to the zip file format, I simply find its combination of good
compression and random-access (without having to decompress the entire
archive) to be especially attractive for what we need to do.

Finally, there's the issue of user interaction.  The bulk of
sysinstall's hard-coded features do things like make user queries
which could just as easily be part of a package's install-time
configuration script.  Sysinstall, for example, allows you to specify
which daemons will run at startup time even though this is only
pertinent to the "bin" package which actually contains those daemons.
Similarly, there have been security-related questions pertaining to
the cryptography distributions which, even though the US crypto export
and RSA issues have now been largely dealt with, may still be
pertinent in other countries.  Clearly, such interaction should be
part of the package installation procedure itself and sysinstall
should be little more than a friendly wrapper for selecting which
packages to install and running their installation procedures, and
that brings us to the question of User Interface.

3.2 User Interface

As noted in the History section, one of the biggest problems with
sysinstall is its user interface which could only be charitably
described as Evil Incarnate.  The dialog(3) interface library, as I've
already described, is insufficiently powerful to give the user a
flexible and intuitive installation experience nor it does not take
any real advantage of environments like the X Window System, should
the user be running a configurator under such an environment.

The package system also suffers significantly in the UI area since the
pkg_add(1) utility has no idea as to whether it's running at the end
of a pipe, as it currently does under sysinstall, or if it's got a
real live user at the other end who's invoked it interactively from a
shell.  This leads to real problems when a package suddenly decides it
wants to talk to the user but is being run via a front-end which will
react adversely (or not at all) to the sudden appearance of the
package's own interaction dialogs.  This is not just a hypothetical
situation but one which can, and currently does, happen whenever
sysinstall's packages menu invokes a package which is interactive. The
user dialogs all go to the 2nd VTY and leave the actual user somewhat
mystified as to why the package installation has mysteriously "hung"
on them as it waits for user input which never arrives.

To effectively solve this problem, what is needed is a flexible (e.g.
containing more basic "widgets" than canned dialogs) and generic UI
library which provides front-end utilities like sysinstall and pkg_add
with the ability to play traffic cop and direct all user interaction
through a common interface. That might be something CUI based, like
TurboVision (my current CUI favorite) or GUI based, like Qt/gtk, when
running under X.  It might even be something which talks to a
Java-enabled web browser at some point in the future - we really can't
predict all the conceivable UI scenarios.  The package system would
call into this library whenever it wanted to talk to the user, thus
sharing the screen/display non-competitively with whatever utility
invoked it.  It would be up to the outermost "caller" (be it pkg_add
or sysinstall) to decide at initialization time just what kind of
back-end UI to instantiate for the generic UI.

Such an approach would allow us to write all of our configuration
utilities and scripts in a UI-neutral fashion which allows us to take
advantage of new UI technologies as they come along without having to
go back and re-write all of those painstakingly crafted user dialogs.
That's basically where 99% of all the work of crafting such user
interfaces goes, and we certainly don't want to have to write two
different interface definitions for CUI (serial console / remote
installer) and GUI (X Desktop) based users.  There are some operating
systems (that I won't mention) which sort of get away with this today,
but FreeBSD has always been a strongly server-centric operating system
and that means we really can't have a highly desktop-centric
installer, we have to support the idea of installation on machines
without graphics cards at all or even in situations where the user is
visually handicapped and wishes to have a customized installer who's
"interface" is a voice synthesizer.  All of this is possible when the
UI library you write directly to makes no assumptions at all about
what the ultimate rendering model is going to be, it simply thinks in
terms of objects like "buttons" and "choice lists", leaving it up to
the back-end layer to ultimately render the appropriate UI objects

3.3 Security

A major failing of most package systems, ours included, is that a
package's installation and configuration scripts can essentially be
any type of executable at all.  While this does allow the package
writer a great deal of flexibility in providing for a package's needs,
and there are packages which do have highly specialized requirements,
it also has a huge potential effect on security.

Most packages are installed as root for a variety of reasons, some
legitimate and some not, and the overall effect is that security is
essentially an "opt-in" process for whomever creates or installs a
package.  A package which is installed as root is a package which can
be either intentionally or unintentionally lethal to a user's system,
even a pgp-signed and triple-authenticated package being capable of
completely destroying a user's system, and it's not hard to see how.

Consider what might happen if an otherwise perfectly respectable
package author, overly caffeinated and partially delirious at 4am,
were to write: ``rm -rf /${MYTMPDIR}'' into a package's installation
script as part of its clean-up procedure.  Let's also say that this
removal operation is inside a failure-case check in the installation
script and the author doesn't hit that case during their testing since
they happen to drive the installation successfully each time.  Let's
finally say that the actual name of the variable in question is
"MYTEMPDIR" and the author, in a state of 4am dyslexia, does not spot
this mistake.  You get the idea.

Even if the package is pgp signed and the package author is your
personal, trusted friend, you're still going to be wondering at all
the sudden extra disk activity right after bombing out of his
package's installation script and none of the conventional security
practices have saved you from his mistake.  The author is most
embarrassed, your system is most toast, and you can both chalk it up
to another annoying conjunction of human and infra-structural
stupidity.  Clearly, it would be desirable for a package which
genuinely and truly needs to be root to do so in a manner which is in
any way safer than it is now.

One method I'm in favor of is to change a package's customization
script(s) from being any arbitrary executable to being a very specific
executable, namely a set of instructions in some tightly constrained
scripting language.  My personal favorite is Secure TCL, a useful
outgrowth of the enhancements done to TCL when it got stuffed into a
web browser and suddenly needed to worry a lot more about security
issues.  Secure TCL allows us to create highly restricted TCL
environments which can be selectively "tightened" according to an
administrator's own level of paranoia, allowing them to have a highly
customizable and final say over what level of capability will be given
to any package they install.

Thus it would be possible, just to give an example, to restrict the
``file-access'' primitive to only returning a positive "It's OK to
access this" indication for file names who's paths match "/etc/.*",
"/usr/local/.*" or "/usr/X11R6/.*".  The ``file-create'', ``file-write''
and ``file-remove'' primitives could, in turn, always validate their
arguments against ``file-access'' before proceeding.  With a properly
designed set of primitives, it would be thus possible to evolve
mechanisms for "practical security", where potentially foot-shooting
primitives can either be disallowed entirely, allowed to proceed only
upon user confirmation or go completely unhindered, all according to
the administrator's wishes.  With a little time, such package security
tweaks would also begin to float around and come into the reach of less
skilled administrators, just as standardized cisco access-lists for
fire-walling are passed around today.

It need not be TCL that is chosen for this purpose, naturally, it's
simply my personal preference since I happen to already know and have
working experience with TCL.  A language like Python or Ruby is also
probably capable of doing the job just as well, it only being
necessary for the interpreted language of choice to have some sort of
reasonable security model and a comparatively small footprint.  I
stipulate that the footprint needs to be small because any future
system configurator and package infrastructure will need to be wrapped
together to some extent, the resulting product being something we may
wish to bootstrap off of comparatively small media.  A properly
written package management system will be an indispensable piece of
the installation process given that the pieces of the operating system
will, of course, be packages.

3.4 Configuration and version control

Ultimately, installing the "OS networking package" or the "Apache
Server" package should be part of a seamless, "one piece",
installation experience with a common and consistent UI.  The ability
to leave "configurators" for each subsystem or tool behind should also
be an integral part of the process, these later being runnable from a
single front-end tool (let's call it ``setup'') which offers a
properly organized menu/folder hierarchy for all the available tool
configurators to drop themselves into.  None of this is rocket science
and folks like Microsoft and Apple have been doing it for ages with
their operating systems.  It's a workable model and, perhaps more
importantly, it's now the most familiar model.

Another nice thing about having a package install itself through a
carefully controlled scripting language is that each mutagenic
operation (say, a file overlay) can store "undo" information for
itself if given enough available disk space.  Also imagine that all of
the undo information for a given package, throughout its lineage, goes
onto an "undo stack" for that package.  If necessary, the package can
thus be "popped" back through its previous versions to test and see
where and if a given problem (which may be noticed only months after
the last upgrade) first appeared.  Since the changes would be stored
as deltas, files which do not change would also appear only once and
no space wasted in representing multiple redundant copies of those
pieces of a package which don't change from version to version (like
the docs :-).

Making such a mechanism part of the basic infrastructure may strike
some as an over-kill proposal, but I would also submit that the
problem of upgrading packages and of having multiple active versions
of a single package (like gtk or TCL) are significant issues which
have received rather ad-hoc attention to date.  With the creative and
automated use of symlinks and some filename hashing, I think we could
come up with a mechanism which does for package version control what
CVS does for software version control (though hopefully even less

A genuine database of some sort containing package version meta-data
is also a requirement since, on a fully tricked-out system, many
hundreds (if not thousands) of files might eventually be involved and
keeping track of various their inter-relationships is not something
you'd generally want to do with simplistic file structures (like
/var/db/pkg) which require a lot of time to search and index.

3.5 Installation scripting

Another subject I touched on earlier was the need for automated and/or
highly customized installations since the needs of everyone installing
FreeBSD aren't exactly identical.  Given access to a nice generic UI
library, as described in section 3.2, and a powerful scripting
language, as described in section 3.3, we could make what people
currently regard as sysinstall a purely script-driven affair.  This
will obviously make customization a lot easier since all anyone needs
is a text-editor and a document of available primitives (which many
would probably choose to learn simply by looking at the example
installation anyway) in order to create a customized install and/or
add their own questions to an existing package configurator.  I also
doubt that most people would need to be able to do this, but for those
very few that do, such flexibility can and will make the difference
between getting FreeBSD into some highly customized environments or
simply not making the grade.

4. Appendix: Current efforts

4.1. libh

The libh project is something I started over a year ago, with input
from Mike Smith and the paid services of a Russian contract programmer
named Eugene, to fulfill many of the goals expressed in this document.

Unfortunately, managing a project of this complexity with a contractor
many thousands of miles away and a personal schedule which allowed for
very little interaction with him didn't prove to be a workable
scenario and work was stopped while partially in progress.  Since that
time, work on it has been taken over by Alexander Langer and a small
group of volunteers.  A mailing list, freebsd-libh, can also be
subscribed to via majordomo@freebsd.org, and the sources checked out
via ``:pserver:anonymous@usw4.freebsd.org:/home/libh/cvs'' using

The name ``libh'' is also something of a mystery to everyone but it
nonetheless stuck as a working title.  It probably needs to be renamed
to something sexier before this project can really succeed. :-)

Roughly speaking, libh currently contains:

   A first cut at the generic UI library, as described in section 3.2,
   with back-end renderers for TurboVision and Qt currently being
   provided.  The generic UI API it provides is available for C, C++
   and TCL.

   A complete zip file-access library written for C, C++ and TCL as
   described in section 3.1.

   Much of the security infrastructure described in section 3.2 is
   also implemented, with enough currently done to make possible a
   prototype package creation/extraction system with some test
   packages available (and used as part of the regression-test suite).

   The package information database is also written, with APIs for C,
   C++ and TCL.  It provides for package conflict, upgrade and outdate

While libh does contain a lot of the code we might ultimately use, it
should nonetheless be considered only one possible starting point for
implementing what I've described in this document.  I certainly would
be happy to see the time and investment in libh ultimately go to good
use, of course, but I also wouldn't want it to stand in the way of any
larger and more successful effort which chose a different scripting
language or UI design, for example.

4.2 lizard

Lizard is the installer currently bundled, albeit in highly modified
form, with Caldera's OpenLinux distribution and made freely available
in some of its earlier incarnations from ftp.caldera.com.  It has been
suggested that a "Desktop version" of FreeBSD could be created using
this technology as a stop-gap measure until libh or some similar
project succeeded in solving the more complex set of issues I've
outlined, that perhaps buying us a bit more time to "do things right"
(in my highly prejudicial opinion :).  As far as I'm aware from my
limited reading of the code, lizard is only applicable to graphical
installations and does not make allowances for people installing via a
serial console, hence its applicability to just a desktop-oriented
product.  Still, it might be worth looking at by people who's
interests lie solely in that direction.  Customization from the highly
linux-centric environment lizard currently assumes is, of course,
something else which would need to be grappled with as part of such an

--- End Message ---