Linux in the Library

What can it do for you?

A presentation at the 1999 CODI conference
February 10-12, 1999
Seattle, Washington



This web page originated from a presentation I gave at the 1999 Customers of Dynix Inc. (CODI) conference, but may be of interest to anyone interested in what Linux can do at a Library, regardless of the Automation System in place. Although I have attempted to automate some of the routine Dynix system administration tasks, Linux is not "tied" to Dynix in any special way. Indeed, even with our Dynix server unavailable we can provide access to CD-ROM databases and the Internet to our patrons.

It might be of interest to you even if you don't work for a Library, but are curious about what Linux can do. This is by no means a complete list of all the things that are possible, just some of the things we've used it for.

Comments, suggestions & opinions are welcome.
Please send them to: Eric Sisler (esisler@cityofwestminster.us)


Index

Although everything listed below is on this page, I've included the index in the event you want to look at a particular section or just want to know what you're getting yourself into. ;-) Enjoy!

  1. Title & overview
  2. Introduction
  3. How did I get started with Linux?
  4. Why did I recommend Linux?
  5. Library facility networks - College Hill Library
  6. Library facility networks - 76th Avenue Library
  7. Library facility networks - Kings Mill Library
  8. What is Linux?
  9. What is a Linux distribution?
  10. What distributions are available?
  11. What comes with a Linux distribution?
  12. Will Linux work with other operating systems?
  13. Choosing Linux - direct costs
  14. Choosing Linux - indirect costs
  15. Choosing Linux - stability & performance
  16. Choosing Linux - support
  17. Choosing Linux - updates & source code
  18. What do we use Linux for?
  19. Domain Name System (DNS)
  20. Samba (network file & print services)
    1. domain logons & scripts
    2. CD-ROM databases served
    3. server shares
    4. public PC administration & security
    5. network printers
  21. Apache (Internet web server)
  22. Squid (Internet object cache)
  23. DHCP (Dynamic Host Configuration Protocol)
  24. Automating tasks with bash, expect & cron
  25. Future projects at the Library using Linux
  26. Sources & further reading
  27. Thank you's
  28. Dedication

Introduction - who am I, anyway?

I am Eric Sisler, Library Computer Technician for the City of Westminster. I have worked for the library for 13+ years in various jobs: page, circulation clerk, processor & cataloger in technical services. I am currently part of Automation Services, providing computer and network support to 3 facilities.

Index


How did I get started with Linux?

In 1996, we were in the process of moving our Dynix system from a shared HP-UX box to its own HP-UX box. I was just beginning to learn more about Unix in general and wanted something I could use as a learning tool without the worry of destroying it and Linux fit the bill. A distribution could be had for around $50 and it would run on my home PC. We were also in the process of planning the automation needs for a new library facility. I began thinking about what kind of services we wanted to provide to library patrons and how we might go about doing so. I knew we would be providing some CD-ROM databases, so I began experimenting with Linux as a CD-ROM server, discovering it was easily capable of much more. Like many who like to tinker with computers, I have been accused of trying to build the Internet in my home out of spare parts!

Index


Why did I recommend Linux?

Obviously, I felt it was the best solution to our needs, which initially included serving some CD-ROM databases, a DNS server and some basic file & print services. Seemingly on its own, this list has grown to include many other services: more file/print services, domain logons & scripts, MARC record storage & retrieval, public PC administration & security, staff & public web pages, Internet object caching, DHCP, shell scripting and others I've probably missed.

I also felt limited by other Network Operating Systems (NOS's) for a variety of reasons:

If I had to do it all over again, would I make the same decision? Absolutely! I can't imagine providing all the services currently available as efficiently & reliably any other way.

Index


Library facility networks - College Hill Library

College Hill Library is a joint project between the City of Westminster and Front Range Community College. The 76,000 square foot facility was opened in April of 1998 and is run by both agencies, a story in itself that is beyond the scope of this document. Dynix & Internet access are via a dedicated T-1 circuit and there are 2 Linux servers at College Hill: Gromit & Preston.

Gromit
Gromit is the primary server for the facility, providing all the services described later in this document. The original Gromit hardware consisted of the following:
Compaq Deskpro 2000; 200MHz Pentium II with 64Mb RAM, 8.5Gb hard drive space (IDE), 1 internal CD-ROM drive, 4-drive external CD-ROM tower, 4mm DDS-2 tape drive & UPS.
Gromit was moved to new hardware in December of 1998, partly to add drive space, partly to gain a little more speed, but mostly because Linux had more than proven itself and we wanted to reduce the chance of hardware failure by moving it to "server" class hardware:
HP NetServer LH Pro; dual 200MHz Pentium Pro with 256Mb RAM, 22.5Gb hard drive space (SCSI), 1 internal CD-ROM drive, 4mm DDS-2 tape drive & UPS.
Why the big difference in hardware? Partly because the LH Pro is a discontinued model and we were able to get a good price on it. Initially Gromit was built from "desktop" class hardware because it was available, I had done much of the setup at home and I wanted to prove to myself (and others) that Linux was capable of what I wanted to do with it, even on "regular" hardware. If Linux failed to meet my (or others) expectations I could re-use the hardware for a different OS. Happily, Linux has met and greatly exceeded my expectations. Two big reasons that the new hardware seems much faster are the additional RAM and SCSI hard drives. As with any server OS, RAM and fast disk drives make a big difference.
Preston
Preston is a "recycled" Gateway 486/66 with 16Mb of RAM, 600Mb disk space (IDE), 1 internal CD-ROM drive & a UPS. It provides some very limited access between the library & college networks. Eventually it will become a full-blown firewall between the two networks and we will be providing access to some of our CD-ROM databases through it.
College Hill Library PC configuration:
  • 46 staff PC's
  • 34 public PC's
  • 1 instructor's workstation (Library instruction classroom)
  • 22 student workstations (Library instruction classroom)
  • 8 network printers

Index


Library facility networks - 76th Avenue Library

76th Avenue library is the former main library for the City of Westminster. Originally built in 1961 and remodeled several times, it is 6,000 square feet in size and has a fractional T-1 circuit (256Kb) for Dynix & Internet access. It has 1 Linux server, named Rust, that provides many of the same services found at College Hill. One reason for giving 76th Avenue its own server was the small data circuit size. 256Kb is fine for Dynix & Internet access, but proved a little slow for serving CD-ROM databases from the College Hill server. The other reason was to provide some independence between the facilities - a downed data circuit or server at one location does not affect the other.

Rust
Rust is a Compaq DeskPro 486/66 (with Pentium 90 overdrive chip), 16Mb RAM, 1.5Gb hard drive space (IDE), 2 internal CD-ROM drives and UPS. It is scheduled to be upgraded the first part of this year, mostly to provide more hard drive space and extra memory.
76th Avenue Library PC configuration:

Index


Library facility networks - Kings Mill Library

Kings Mill is a small (4,500 square feet) branch library, built in 1978. It has a 64Kb frame relay circuit for Dynix & Internet access and no Linux server at this time. Once the server at 76th Avenue is upgraded, the old hardware will likely become the server for Kings Mill.

King's Mill Library PC configuration:
  • 4 staff PC's
  • 3 public PC's

By the way - if you're wondering about the naming scheme for our Linux servers, with the exception of Rust they're all named after characters from Nick Park's excellent claymation series "Wallace and Gromit". You can find out more about the series here.

Index


What is Linux?

The Linux Kernel
When you talk about Linux, you're really talking about the Kernel itself - the core software that allows other software to "talk" to the hardware. Linux was created by Linus Torvalds, then a student at the University of Helsinki, Finland. It is an open-source Unix clone that aims at POSIX compliance. It is developed and maintained by many Unix programmers & wizards across the Internet.
GNU General Public License (GPL) software & utilities
Linux would not exist without the many drivers, compilers, utilities, services & programs ported from the Free Software Foundation (FSF) under the GPL.
Berkeley Unix (BSD)
Linux also takes advantage of many Internet daemons & utilities ported from BSD, one of the original flavors of Unix.
Commercial software from vendors
Many vendors are beginning to port their software to Linux or write new software for Linux. Some distributions may include commercial software and/or demos.
A (perhaps bad) analogy using Windows might be this:
Windows = Linux kernel
Windows drivers, utilities & software = GNU & BSD

Index


What is a Linux distribution?

When most people talk about Linux, what they're really talking about is a Linux distribution, which typically comes with the following:

A stable version of the Linux kernel
Although experimental versions of the kernel are usually included with a Linux distribution, production servers are normally installed with a stable kernel.
A collection of frequently used services & software
As a full-featured Unix, Linux comes with just about any Internet or network service you can think of. Most distributions install many of these by default using some kind of package management software.
Installation guide, documentation & media
In addition to the actual installation instructions, the installation guide usually contains general information about Linux, the particular distribution you're installing, a basic user's guide and some essential system administration information. The installation media is usually a boot floppy and 1 or more CD-ROM's.
Commercial software
Some Linux distributions come with commercial software as an added feature, or demo versions of the full package.

Index


What distributions are available?

There are a number of Linux distributions available. This is by no means a complete list, but some of the "better-known" ones include:

Red Hat
This is our distribution of choice for the library.

Slackware
The distribution I started with. Although still a good distribution, it can be a little tough to get started with.

Caldera

Debian GNU/Linux

Linux Pro

S.u.S.E.

TurboLinux

Index


What comes with a Linux distribution?

As a full-fledged Unix clone, Linux comes with everything you'd expect, and then some:

Index


Will Linux work with other operating systems?

Because Linux "speaks" many other network protocols, it works well with other operating systems, including:

Index


Choosing Linux - direct costs

Hardware
Linux will run on nearly any of Intel's family of x86 processors, from the 386 to the Pentium II and beyond. It also runs on a variety of other architectures.
Operating System Software
Most Linux distributions can be had for around $50 or for free if you want to download them from the Internet. Other operating systems seemed to vary wildly depending on what (if any) discounts we qualified for and what services we wanted to provide. Rather than give incomplete or misleading information, I encourage you to contact your software vendor directly.
Commercial backup software
Although the release of RedHat we started with (5.0) came with the "personal" edition of EST's excellent BRU2000, we opted to purchase the commercial version which comes with an optional renewable support contract for around $300. The commercial version includes free upgrades and supports network backups, allowing several servers to share 1 tape drive.
Licensing
With the exception of the commercial backup software, all other services currently running on our Linux servers have unlimited user licenses. Again, other OS's licensing costs varied by the number and type of services provided.
Approximate cost per client PC = $2.89
This is the total cost per PC, not including any software installed on each PC, which you'd need regardless of the server OS. Since we have more staff & users than PC's, the total cost per user account is even less. I calculated the cost using the following: $350 (RedHat CD + backup software) / 121 PC's (58 staff + 63 public) = $2.89. As the number of PC's/users increases, the cost per PC/user decreases.

Index


Choosing Linux - indirect costs

Services
Most of the services we wanted to provide were available "out-of-the-box". Those that didn't come with the distribution were available from the Internet. Other network OS's either didn't provide all the services we wanted or were only available at an additional cost.
Time
Yes, it is Unix and it does have a steep learning curve, but I felt it was well worth the effort. If you already know one flavor of *ix, learning another isn't that difficult and since I was already trying to learn Linux to teach myself more about Unix in general, this gave me a practical reason for doing so.
System administration
With servers at 3 different locations, remote administration is a must, and Linux fits the bill nicely. All administration can be done remotely using telnet or dial-in access and the command line interface (CLI), although there are some graphical (GUI) administration tools as well. Most people prefer one or the other for system administration, you can read my musings on the subject here if you'd like.
Starting & stopping services
Nearly all services can be stopped & started as necessary without rebooting the server. I have updated software, applied patches, changed configuration files, updated/recompiled the kernel and even rebooted the server remotely, all with just shell (telnet) access. In my experience a well configured server running on decent hardware generally requires little administration.

Index


Choosing Linux - stability & performance

Stability
Linux has proven to be a very stable server OS. The old Gromit ran continuously from February to December 1998 with only 3 interruptions: an extended power outage, a physical move of the server and to install some additional hardware. It currently holds the "record" for server uptime at the Library - 125 days without a reboot!
Rebooting
There are only a few circumstances when Linux must be rebooted: after upgrading/compiling a new kernel or during hardware replacement/installation. While frequent rebooting may be a necessary evil on the client end, I feel server reboots should only be necessary due to hardware failure/installation or operating system (kernel) upgrades.
The kernel & other processes
Very few things can crash the Linux kernel, faulty hardware being the #1 culprit. I have had services crash, generally due to misconfiguration on my part (oops!), but fixing the configuration and restarting the service is all that's been necessary. Nearly all processes & services can be stopped and restarted without rebooting - even networking if done correctly.
Performance
In addition to being very stable, Linux also performs well on most hardware. It uses the CPU and RAM efficiently, has one of the fastest TCP/IP implementations available and frequently outperforms other NOS's on the same or lesser hardware.

Index


Choosing Linux - support

One thing I frequently run across concerning Linux is the "lack" of support. Generally what this means is "the lack of someone I can call and pay lots of money to who may or may not be able to solve my problem." Although some companies are beginning to offer commercial support, to quote an anonymous Linux user - "There's a bordering-on-clinically-interesting level of support from the Linux community at large." The lack of commercial support has never been an issue for us. My best sources for support and information include:

For a list of some Linux resources, click here.

Index


Choosing Linux - updates & source code

Software updates
Software updates, especially security related ones, are released in a timely manner (sometimes days or hours) via the Internet.
Software package with RPM
RPM is a software management utility created by RedHat that has since been adopted by other distributions of Linux. It makes software installation, upgrades and even removal quite easy. Other distributions that do not use RPM generally have their own software management utility.
Source code
Source code is available for all open-source, GPL'ed software included with a distribution. This can be useful if you discover a bug, want to make changes or just practice your programming skills.

Index


What do we use Linux for?

Ok, enough about all that. Let's look at what we use Linux for at the Library.

Index


DNS

The Domain Name System or DNS is the Internet "phonebook" of hostnames & IP addresses. Anytime you connect to a computer on the Internet using its host address, DNS provides the translation from the hostname to the corresponding IP address.

Local DNS
We provide hostnames for all our PC's and other networked equipment to make configuration & troubleshooting easier. If you have or are considering getting your own domain, Linux can be an inexpensive way to administer it.
Remote DNS resolution
We have multiple DNS servers for speed & redundancy.

Index


Samba

Samba provides file & print services, much like Windows NT or NetWare. It supports domain logons, logon scripting and a "browse" list of available shares. There are many access control options, both system-wide and share specific.

Server requirements
The server requirements for samba are simple - TCP/IP networking and the samba software for most clients.
Client requirements
All of our client machines are currently Windows ‘95, and the requirements for them are simple as well - TCP/IP networking (usually the MS TCP/IP stack) and the MS "Client for Microsoft networks" client, both of which are included with Windows ‘95. The NetBEUI protocol is NOT needed or helpful.
Samba - domain logons & scripts
Samba supports domain logons by username or machine name. Logon scripts are written as DOS style batch files, with the initial script often calling many other scripts. There are currently around 150 user accounts and rather than having to change each script as services are added the initial script calls other scripts as needed. If a particular service needs to be changed, only that script has to be edited, and all users will be updated the next time they reboot. Logon scripts typically perform the following functions:
Samba - CD-ROM databases currently served
We provide access to a number of CD-ROM databases to patrons & staff, including:
Although we have a number of multimedia PC's available in the children's area, multimedia & DVD CD-ROM's are not served from the network as they tend to be bandwidth hogs.
Samba - server shares
A "share" is merely a directory on the server that is accessable to client PC's, via a mapped drive letter or UNC (Universal Naming Convention) path. The CD-ROM databases are shares as well, but read-only ones.
Domain logon scripts
Although the logon scripts can be edited from the Linux shell prompt, we found it was just as easy to provide access to all the scripts & configuration files using Samba. Editing a script is as simple as starting a text editor like notepad or wordpad and opening the file you want to change. Changes to other configuration files are made in much the same way.
Configuration files
Not all public PC's have access to the full range of services provided. Since the configuration files are stored and distributed from the server, making changes to what's available on a specific PC is usually as easy as editing its logon script and having a staff member reboot it.
Software & ghost image storage space
Also housed on the server are frequently installed software packages and ghost images of all staff & public PC's. Ghost is a drive imaging utility that takes a "snapshot" of a hard drive or partition and stores it all in one file. This allows us to easily restore a public machine that has been trashed and install periodic updates to all staff & public PC's. Rather than having to manually install software updates to each machine, the hard drive is reformatted and the new ghost image installed. This allows the installation to be clean and guarantees that each machine will at least start out with the same configuration.
Staff home directories
Staff members have their own home directory on the server where they can store their private documents. The files are backed up every night and are available from most any PC they happen to be using. This eliminates the need to backup staff PC's, which we lack the staff and time to do. Staff & public PC's have their software updated periodically using the utility ghost, mentioned in the previous paragraph.
Group directories for projects
In addition to private directories, there are group directories where staff can store documents that need to be accessible by others.
Meeting room schedule pages
College Hill's meeting room schedule is available to all via a web browser, but to update the schedule we use a mapped share and an html editor.
MARC record storage/retrieval
Dynix provides a way for our catalogers to load MARC (MAchine Readable Cataloging) records into the system, the only problem is it uses FTP to get the records. Rather than making each cataloger's PC an FTP server, the records are stored via Samba on the Linux server and Dynix FTPs them from one central location.

Samba - public PC administration & security

WinSelect security software.
WinSelect provides a way to lock down the Windows ‘95 desktop and most applications, as well as control access to printers, the start button and other normally accessible programs & settings on the Windows desktop. The software is installed locally on each PC but the configuration file is updated from the server during the network logon process.
Dynix PAC for Windows
Public Access Catalog (PAC) for Windows is a front-end menu system for Windows 95/NT. It provides a graphical user interface for public users and is used to search the Dynix catalog and run other applications. We use it on all our public PC's, giving patrons access to a variety of services - the online catalog, CD-ROM databases, Internet reference products and more. As with WinSelect, the configuration file is updated from the server during the network logon process. Although we do not use it as a front-end menu on the staff PC's, it is available on those PC's and updated in the same way.
Netscape
We use Netscape as our browser of choice for those PC's that are allowed to access the Internet. In addition to using WinSelect to lock down many of Netscape's menu items, the main configuration file (prefs.js) is updated, the cookies file (cookies.txt) is cleared and all items in the cache directory are removed, again during the logon process.
Samba - network printers
Samba also provides access to 9 network printers. Access to these printers can be configured by user login, group or individual PC.

Index


Apache

Apache is the most widely used web server software on the Internet. It runs on a wide variety of operating systems and was recently "adopted" by IBM for use in their high-end web servers. Because Apache is open-source and the developers weren't interested in money from IBM, an agreement was reached essentially stating that IBM was free to use Apache as their web server, but any changes or improvements to the code must be made available to the Apache group so they can include it in future releases, thereby benefitting all those who use it.

Apache - public pages

Public PC homepage
All public PC's with Internet access have their homepage set to the Library's PAC homepage. This homepage contains a short version of the Library's Internet acceptable use policy, a link to the full policy, some recommended search engines and various other links.
Meeting room schedule & policy pages
College Hill Library has several meeting rooms and the schedule for each week is made available on the web.
Redirect of disallowed sites to error pages
We block access to some web-based e-mail and chat sites. Blocked sites are redirected to an error page explaining why access was denied.

Apache - staff pages

Dynix reports
All Dynix circulation statistics including daily, monthly, yearly and special reports are made available via the web. This allows easy distribution to library staff and permits electronic archival instead of using paper.
Special reports
Special reports like claims returned, exception items and data extracts from Dynix are also made available this way whenever possible.

Index


Squid

Squid is an Internet object cache/proxy. It works in a similar fashion to the local cache directory used by Netscape or Internet Explorer. Anytime a web page is accessed, all "static" objects (text, graphics, background images, etc.) on the page are cached in case an object is re-used on another page or the same page is accessed again by anyone using the cache server. Instead of having to get the object from the Internet, it is retrieved from the cache server at local network speed, which is typically much faster than the connection to the Internet. This helps reduce network traffic and increases the retrieval speed for frequently accessed sites.

Site restriction
Squid can also be used to restrict access in many different ways: remote site, local PC, browser type, time of day, user ID, etc. We have PC's that are catalog only (no Internet access) but since they have Netscape installed and some catalog records include URL's, we block access to all websites from those machines. We also block access to many web-based e-mail sites and chat rooms. Given the limited resources available at the library and the fact that the College has student PC labs for those kinds of activities, we felt e-mail & chat rooms were not a resource we should provide access to. In all cases, attempts to access sites that have been restricted are redirected to an error page. This error page explains why accessing that particular site is not allowed.
Cache exceptions
Some of the commercial online reference services we provide access to require a list of IP addresses that are allowed to access the site and this can pose a problem when using squid since most requests appear to come from the cache server and not the actual client machine. Fortunately, Netscape has an easy fix for this problem, allowing you to specify sites where the cache server should not be used.

Index


DHCP

Dynamic Host Configuration Protocol or DHCP is a way to configure a PC's TCP/IP settings during startup, including IP address, hostname, domain name, default gateway, DNS servers, WINS servers and more. Most Internet Service Providers (ISP's) use DHCP to assign your PC a temporary IP address when you connect to the Internet using a modem. Note: Although hostnames can be changed with DHCP, NetBIOS (computer) names cannot. There are other ways to change the NetBIOS name remotely and I recommend the hostname & NetBIOS name be the same. It just makes life a little easier. ;-)

For security reasons and to aid troubleshooting, we statically assign IP addresses to all PC's. Since all PC configurations are the same upon restoring a ghost image, using DHCP reconfigures most of the network settings on reboot. In the event of a change in domain name or router failure, these settings can be changed on the server and propagated to the PC's by rebooting them.

Index


Automating tasks

A recent and ongoing project has involved trying to automate, or at least "semi-automate" some of the more routine tasks for both the Linux servers and the Dynix server. After all, what good is having a computer or two if it can't do some of the more mundane tasks for you? Some of the tools I've used so far include:

Cron
Cron is a job scheduling daemon. It provides a way to run tasks at defined intervals - by minute, hour, date, month and day of week. A task can be anything from a simple command to a complicated script or other program.
A command interpreter or shell
The command interpreter or shell is what you normally see when you telnet in to a Linux server. There are different types of shells available, including bash, korn, C and others. While shells are used for executing commands interactively, they also have their own internal scripting language, which can be used to write shell scripts. The logon process runs a series of shell scripts that sets up your user environment - your user ID, group ID, home directory and even your default shell.
Expect
Expect is another scripting language that is used primarily to automate interactive programs like telnet, ftp and others. If you've ever used the native scripting language common to many telnet clients to automatically log you on to another computer that's essentially what expect does, only it's not limited to a specific piece of software.

Write a shell script that calls a couple of commands and maybe an expect script or two, schedule the script using cron and you've got an easy way to automatically complete routine tasks.

Automated (or semi-automated) tasks
Nightly backups
Nightly backups was one of the first things to be automated. A simple cron entry coupled with full & incremental backup scripts. Just remember to change the tape!
Rotation of "this-week.html" meeting room schedule page
The symbolic link to this week's meeting room schedule page is updated once a week.
Semi-automatic generation of daily & monthly circulation statistics
My latest creation, it uses several bash & expect scripts to generate statistical reports and make them available via the web. Although the "master" script must be started manually, all the rest of the work is done by the scripts. Just start the script and 10 minutes later daily statistics for the previous day (or any day) are available. Magic!

Index


Future projects

What does the future hold for Linux at the Library? Some of the projects I have in mind for "down the road" include:

Index


Sources & further reading

Index


Thank you's

Patricia, my Wife

Veronica Smith, my supervisor

Kathy Sullivan, Library Services Manager

All those people who make Linux possible.

Index


Dedication

This presentation is dedicated to the memory of Judith A. Houk, my friend and mentor for many years.

Index

top

Back to Eric's Linux pages


This page last modified September 07, 1999 by: Eric Sisler (esisler@cityofwestminster.us>)