Accions
Opció LOCKSS¶
Òbviament, aquesta és la primera opció que cal considerar. El JoseManuelCastillo se n'ha baixat una còpia, l'ha intentat instal·lar, i ha escrit a la gent de LOCKSS sobre els problemes que ha tingut. Els seus mails són aquests.
----
From: Jose Manuel Castillo <JoseManuel.Castillo@uab.cat>
Subject: Xerrada amb suport de LOCKSS
To: Ferran Jorba <Ferran.Jorba@uab.es>
Date: Fri, 23 Mar 2007 14:04:11 +0100
User-Agent: Mozilla Thunderbird 1.5.0.10 (X11/20070306)
Appreciated Lockss Team,
We are trying Lockss out and playing with it to see if it suits our needs.
We are currently deploying it on a couple of machines.
Could you, please, add our first test machine to your control list?
IP: 158.109.174.36 (marimba.uab.es)
Thank you very much!
Regards,
--
José Manuel Castillo
Servei d'Informàtica
Universitat Autònoma de Barcelona
Hello José,
Welcome to LOCKSS! I've added your IP address, but it looks like your
machine has not come up properly. Unfortunately I can't connect to it
to determine why. Is it behind a firewall? If so, several ports
(listed below) will need to be opened to allow it to function. Once
that has been done I should be able to diagnose the problem.
Also, the machine normally sends us mail when you configure it. We
didn't receive that mail, which usually means that the mail hub isn't
relaying mail from it to us. Could you check that the IP address of
the mail hub you gave it is correct, and that that mail hub is
configured to allow it to relay mail from your LOCKSS machine to any
email address?
Tom
========================================================================
If your LOCKSS box is behind an institutional or departmental
firewall, please ask your network admins to allow the following
connections to and from the LOCKSS box:
Inbound connections:
tcp to port 22 from 171.66.236.0/26 - monitoring from Stanford
tcp to port 8081 from 171.66.236.0/26 - monitoring from Stanford
tcp to port 9729 from anywhere - V3 LCAP (polling)
(The following inbound ports will be required until 1Q07)
udp to port 5554 from anywhere - inbound multicast LCAP (optional]
udp to port 5555 from anywhere - inbound LCAP
tcp to port 8080 from anywhere - repairs
Outbound connections:
tcp to anywhere, port 80 - publishers, package fetch
tcp to anywhere, port 123 - NTP time synchronization
udp to anywhere, port 123 - NTP (also allow incoming responses:
"keep-state" or equivalent)
tcp to 171.66.236.0/26, port 8001 - parameter reload
tcp to anywhere, port 9729 - V3 LCAP (polling)
tcp to 18.7.14.139, port 11371 - refresh public signing keys
(The following outbound ports will be required until 1Q07)
udp to anywhere, port 5554 - outbound multicast LCAP (optional]
udp to anywhere, port 5555 - outbound LCAP
tcp to anywhere, port 8080 - repairs
In addition, outbound UDP to port 53 (DNS) and TCP to port 25 (SMTP)
are required, but only to the machines configured as [[[[name]]]]server and
mail hub, respectively. Normally these servers are behind the same
firewall as the LOCKSS box, in which case no additional rules would
be needed.
En/na Tom Lipkis ha escrit:
> Hello José,
> Welcome to LOCKSS! I've added your IP address, but it looks like
> your
> machine has not come up properly. Unfortunately I can't connect
> to it
> to determine why. Is it behind a firewall? If so, several ports
> (listed below) will need to be opened to allow it to function.
> Once
> that has been done I should be able to diagnose the problem.
> Also, the machine normally sends us mail when you configure it. We
> didn't receive that mail, which usually means that the mail hub
> isn't
> relaying mail from it to us. Could you check that the IP address
> of
> the mail hub you gave it is correct, and that that mail hub is
> configured to allow it to relay mail from your LOCKSS machine to
> any
> email address?
>
Hello Tom,
Thank you for your quick and kind response.
I'm going to check it out and see what's the problem with communications.
Abusing of your kindness, may I ask you a question?
The reason we're trying LOCKSS is because we need to preserve some
digital fixed-content files (arount 4TB and growing). Specifically,
they are basically digitized images in form of TIFFs files (not
e-journals, like seems to be the "normal" use of LOCKSS). Right now
we're storing them in SATA disk in our array, but we need to find a
more feasible, complete and scalable solution
Do you think we could use LOCKSS in this project?
Thank you very much!!
Best Regards,
--
José Manuel Castillo
Servei d'Informàtica
Universitat Autònoma de Barcelona
José,
> > The reason we're trying LOCKSS is because we need to preserve some
> > digital fixed-content files (arount 4TB and
> growing). Specifically, they > are basically digitized images in
> form of TIFFs files (not e-journals, > like seems to be the "normal"
> use of LOCKSS). Right now we're storing > them in SATA disk in our
> array, but we need to find a more feasible, > complete and scalable
> solution
> > > Do you think we could use LOCKSS in this project?
>
Yes, LOCKSS is suitable for this. We started with e-journals, but
LOCKSS has now been used for several other applications (see
http://www.lockss.org/lockss/Related_Projects). The current version
can handle pretty much any set of data that's relatively static and
has a stable naming structure (i.e., permanent URLs).
Material that's not of general interest to the library community, or
is too large to share space on machines maintained by the community
can be preserved in a private LOCKSS network. There are several
groups either doing this now or planning to start soon. Examples are
the MetaArchive Project (http://www.metaarchive.org) and the Alabama
Digital Preservation Network (http://adpn.org/wiki/Main_Page).
In most situations we recommend that a private LOCKSS network consist
of a minimum of six machines, preferably in different physical
locations. The machines can be inexpensive, see
http://www.lockss.org/lockss/Installation_Instructions#Minimum_Requirements
4TB is pushing the capacity of a single PC, so if you plan to ingest
that much content immediately you may need more machines. We've found
that most projects ramp up more slowly, and it usually makes more
sense to start smaller and buy more disks/machines as needed, to take
advantage of the constantly improving speed & size to price ratio.
LOCKSS needs some knowledge of your web site(s) and data (see
http://www.lockss.org/lockss/Plugins) and some other site-specific
configuration. For groups that are members of the LOCKSS Alliance
(http://www.lockss.org/lockss/LOCKSS_Alliance), we're happy to provide
direct support for all this (e.g., we'll write and test the plugin if
you wish).
We're working on a much more detailed description of private LOCKSS
networks and what's involved in setting them up. For now please feel
free to ask about any more details or clarifications you need.
Tom
En/na Tom Lipkis ha escrit:
> We're working on a much more detailed description of private LOCKSS
> networks and what's involved in setting them up. For now please
> feel
> free to ask about any more details or clarifications you need.
>
Hello, Tom.
Thanks again for your kind and helpful reply.
Could you give us, please, some guidance on how to set up a proper
test installation of a private LOCKSS network?
We would like to make a complete valuation of the solution. Due to the
fact that it's the only one that we have found that can fill our needs
and is open source, we are very interested in studying it as well as
we can. We would like to not to be locked up with a propietary
solution like EMC, SUN, HP, Caringo, Avamar, etc. if we could possibly
avoid it.
By the way, what's the difference between a "normal" and "private"
LOCKSS network? How can you make it private?
Thank you again for your help.
Best Regards,
--
José Manuel Castillo
Servei d'Informàtica
Universitat Autònoma de Barcelona
José,
> > Could you give us, please, some guidance on how to set up a proper
> test > installation of a private LOCKSS network?
>
My apologies, I won't be able to get this to you until early next week.
> > By the way, what's the difference between a "normal" and "private"
> > LOCKSS network? How can you make it private?
>
There's a public network of about 200 LOCKSS machines at various
libraries around the world, preserving several hundred titles so far
(mostly serials). Each librarian decides what to preserve on his or
her machine(s). Decisions about which new titles or publishing
platforms to bring into LOCKSS are made by the community (because this
effort is largely funded by the community). These titles are of
general interest, so the requirement for a minimum of six replicas of
each title is easily met. Anyone can bring up a LOCKSS box and join
this public LOCKSS network, though they will only be able preserve
content to which they have rights.
In contrast, several groups, including some state agencies, have data
they wish to preserve which is either not of general interest to the
library community, or cannot be distributed outside the group because
it is sensitive. I'm assuming your data is of that nature. In this
case, the (minimum six) replicas are all operated by the group, or
perhaps by a consortium established by the group. Membership in this
private network is controlled by the group; the LOCKSS machines talk
only to other machines in the same private network. For additional
security, this can be configured so that all communication between the
machines is encrypted, so that no eavesdropping by outsiders is
possible.
Having all the machines operated by a single entity eliminates one of
the redundancies of LOCKSS: that is, not all replicas are under
control of a single organization. But that's the nature of such data,
and it's still possible and desirable to distribute the replicas
geographically.
The answer to the question above is, very briefly, that you would
purchase (or otherwise redeploy) six machines (or fewer for initial
feasibility testing) and configure them as LOCKSS boxes (probably by
using our standard platform CD, though there are other possibilities).
You'll also need to put your data where LOCKSS can collect it (e.g.,
on a web server), write one or more plugins that tell LOCKSS where to
find the data and several other characteristics of the server and the
data, and possibly operate a configuration server for the LOCKSS
boxes.
Tom
José,
I apologize again for the delay. Here is a summary of the steps
involved in constructing a Private LOCKSS Network.
Tom
========================================================================
Private LOCKSS Networks
1. Overview
LOCKSS is currently being used to preserve content in two distinct
types of environments: a public LOCKSS network holds material of
general interest to a wide community, and several private LOCKSS
networks hold material for smaller communities. These correspond to
two different models for providing enough replicas of the data to
ensure a high probability of survival.
The public network is designed for material that is generally
available on the internet, including subscription-only material.
Anyone may participate in this network, by running one or more LOCKSS
boxes. Each box may collect and preserve any content to which its
host institution has access rights. Sufficient replication is ensured
because the materials preserved in the public network are those that
the community has agreed they wish to preserve.
Nodes in the public network are owned by their host institution. The
network is maintained by the LOCKSS group with funding provided by the
LOCKSS Alliance. As of Spring 2007, the public network comprises
about 200 libraries worldwide and holds a collection of scientific and
other journals, blogs, and ETDs. LOCKSS Alliance members have more
titles available for them to preserve than libraries who have not
joined the LOCKSS Alliance.
In contrast, material that is of interest to a small community, or
that is sensitive, such as that held by government agencies, may be
preserved in a Private LOCKSS Network (PLN). Participation in these
private networks is controlled by the particular community.
Individual entities or communities with too few members for sufficient
replication may either run multiple LOCKSS boxes at each member/site,
or combine with other such entities to reciprocally preserve each
other's data.
2. Collection Evaluation
LOCKSS collects and preserves content in discrete collections called
Archival Units (AUs), which are generally between several megabytes
and several gigabytes in size. For example, for e-journals, an AU may
be one volume (year) of one title. Any convenient grouping may be
used, as long as the system can determine (via a plugin, below)
whether or not any particular item belongs in any particular AU,
usually by examining the item's [[[[name]]]] (URL).
The size of the initial collection (number and size of AUs) and the
expected rate of growth should be determined. This will influence
hardware decisions.
3. Hardware
Each AU should be replicated on a minimum of six LOCKSS boxes. We
recommend at least seven, if possible. If a single copy of the
collection is larger than the capacity of a single machine, additional
multiples of machines will be needed. (All machines need not preserve
the same set of AUs, as long as each AU has sufficient replicas
somewhere in the network.)
Minimum machine specs are: 1GHz CPU, 512MB memory, and sufficient disk
space. More memory is required for a large number of AUs (more than
~5000).
All the machines should be connected to a network where they will have
access to the material they are supposed to collect, and to each
other. Ideally they should be in different locations, so that a
single physical event is not likely to impact all of them.
4. Software
The LOCKSS team supplies a bootable CD that turns a standard PC into a
highly secure, easily configured, low maintenance preservation
appliance. The CD is a specially configured version of [[OpenBSD]]; it
runs on most standard PC hardware. We recommend this as the best
solution in most situations.
If the LOCKSS CD cannot be used, the LOCKSS preservation software (the
LOCKSS daemon) can be installed and run on Linux or other Unix systems
that support Java. Installation scripts are provided for some Linux
variants, but this method requires more work and knowledge on the part
of the machine's administrator.
5. Plugins
Several aspects of the LOCKSS collection and preservation process must
be customized for each application. This is done by building a
Publisher Plugin, which captures the knowledge necessary to define any
number of related AUs (such as those that share a common structure or
publishing platform). More information about plugins can be found at
http://www.lockss.org/lockss/Plugins .
6. Network Management
Configuring and running a PLN requires a few network services to be
set up, and some databases built. Specifically, each LOCKSS box
loads, from a network server, a set of runtime configuration
parameters, plugins, and a title database describing individual AUs.
These may be on the same or different servers, packaged in various
combinations, depending on the application.
In addition, some monitoring of the network and constituent machines
is recommended to ensure the continued health of the data.
7. Providing Access
The final step in building a PLN is to provide access to the preserved
data. The appropriate mechanism will depend on the application; the
two primary mechanisms are to configure your users or network to treat
the LOCKSS boxes as HTTP proxy servers (see
http://www.lockss.org/lockss/Proxy_Integration), or to export the data
to an external server.
8. Support for LOCKSS Alliance Members
If you join the LOCKSS Alliance
(http://www.lockss.org/lockss/LOCKSS_Alliance), you will receive
Stanford LOCKSS team support. All members of a PLN must be LOCKSS
Alliance members; group discount rates are available. Support
includes consulting on hardware and software for LOCKSS boxes, site
design and plugin design, implementation and testing, server hosting
of configuration parameters, plugins, and title database, etc., and
assistance with proxy integration or content export.
If you choose not to join the LOCKSS Alliance you are welcome to use
the open source software without support. We will answer questions
and provide advice as our resources allow; real support for the
product is provided with Alliance membership.
En/na Tom Lipkis ha escrit:
> José,
> I apologize again for the delay. Here is a summary of the steps
> involved in constructing a Private LOCKSS Network.
> Tom
>
Hello, Tom.
Thank you very much again for all your help.
We will try to continue our tests with the information you just provided us.
Best Regards,
--
José Manuel Castillo
Servei d'Informàtica
Universitat Autònoma de Barcelona
Actualitzat per fa més de 17 anys · 0 revisions