Projecte

General

Perfil

Accions

On guardar tots aquests fitxers

Vegeu també: GestionarFitxersViaWeb i HistoricsIRepliquesAmbGit

Els fitxers TIFFs els guardem en un SataBeast

Fa poc vaig fer una pregunta a una llista de biblioteques i programari lliure, i us adjunto la meva pregunta i les respostes. Així tot(e)s hi teniu accés.

En tot cas, no té una resposta fàcil.

FerranJorba

Molt ben fet! Aquest és un tema que cal posar en marxa urgentment. NuriaGallart

El JoseManuelCastillo i jo hem estat investigant sobre diferents opcions de hardware i software. En les següents pàgines anirem recopilant el que anem descobrint:


 Subject: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date: Thu, 25 Jan 2007 16:33:30 +0100
 From: Ferran Jorba <Ferran.Jorba@uab.es>
 Organization: Universitat Autònoma de Barcelona
 To: OSS4LIB (E-mail) <oss4lib-discuss@lists.sourceforge.net>

 Hello,

 I'm going to ask help for something I'd say it is a common situation
 nowadays mostly everywhere.

 My university is engaged in digitalisation of old material, like most
 do.  So do some of my neighbour univesrities.  Our libraries belong to
 a local consortium, like most libraries do.  This digitalized material
 means, among other things, lots of fat TIFF files, totaling a huge
 amount of Gigabytes.  I think most of you know that.

 There are plenty disk array vendors willing to sell you their
 solutions.  I like specially Capricorn Tech
 (http://www.capricorn-tech.com/), due to their Archive.org pedigree,
 and Copan Systems (http://www.copansys.com/) for their MAID concept.

 But one of our most urgent problems is keeping those original TIFF
 (and their corresponding PDFs) safe beyond just storing them
 somewhere: I mean having more than one copy, doing backups, veryfiying
 checksums, automatically fixing the damaged files, maybe changing
 formats, etc.  This second part is already invented, and it is called
 LOCKSS (http://www.lockss.org).  It is a software with anything I
 could ask for, and more.

 What I have been unable to find in the LOCKSS site is a configuration
 model where some libraries, in a local consortium, join together to
 keep jointly this material.  Ok, I understand that LOCKSS is designed
 to keep the material of [external] publishers.  But when I first
 learned about CLOCKSS (Controlled LOCKSS) I immediately thought that
 it would address the scenario we are facing in our consortium.
 However, I cannot find it in their web pages.

 May I ask how are you addressing this scenario?  If there is a better
 forum for this question, I'd gladly ask it there again.

 Thanks,

 Ferran

 Subject:     Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date:     Thu, 25 Jan 2007 10:21:04 -0600
 From:     Beth Nicol <nicollb@auburn.edu>
 To:     Ferran Jorba <Ferran.Jorba@uab.es>
 References:     <45B8CDCA.2020406@uab.es>

 Ferran:

 I do believe that CLOCKSS or a private LOCKSS network is what you are
 looking for. If you contact the LOCKSS folks, I think they can help
 you out.

 I work with 2 projects that are doing what you seem to want to do: the
 MetaArchive of Southern Digital Culture (which is a part of the
 Library of Congress's National Digital Information and Infrastructure
 Preservation Project, aka NDIIPP) and another project in Alabama which
 is just starting up.

 The MetaArchive group is offering a workshop on setting up these types
 of networks. Information is online at
 http://www.metascholar.org/events/2007/ddp/

 Essentially, these projects (an others) are using private LOCKSS 
 networks to create a dark archive with the caches somewhat 
 geographically dispersed.

 Does this help?

 Beth Nicol <nicollb@auburn.edu <mailto:nicollb@auburn.edu>>
 Information Technology Specialist
 Auburn University Libraries
 (334)844-1731

 Subject: Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date: Thu, 25 Jan 2007 11:33:13 -0500 (EST)
 From: Joe Hourcle <oneiros@grace.nascom.nasa.gov>
 To: Ferran Jorba <Ferran.Jorba@uab.es>
 CC: OSS4LIB (E-mail) <oss4lib-discuss@lists.sourceforge.net>
 References: <45B8CDCA.2020406@uab.es>

 On Thu, 25 Jan 2007, Ferran Jorba wrote:

 > Hello,
 >
 > I'm going to ask help for something I'd say it is a common situation
 > nowadays mostly everywhere.
 >
 > My university is engaged in digitalisation of old material, like most
 > do.  So do some of my neighbour univesrities.  Our libraries belong to
 > a local consortium, like most libraries do.  This digitalized material
 > means, among other things, lots of fat TIFF files, totaling a huge

 Could you give us an idea of how big the archive is, and how fast it's
 expected to grow?

 For smaller repositories, we're currently using Apple XServe RAID.  For
 stuff that's multiple terrabytes, we're using hardware from Pillar Data
 Systems <http://www.pillardata.com/>.  I know for our last big purchase,
 we had looked at Network Appliance <http://www.netapp.com/>, but I wasn't
 involved in the purchase decision, so I don't know what the determining
 factors were (performance, features, support, price, etc.), or what other
 vendors were evaluated.

 [trimmed]

 > What I have been unable to find in the LOCKSS site is a configuration
 > model where some libraries, in a local consortium, join together to
 > keep jointly this material.  Ok, I understand that LOCKSS is designed
 > to keep the material of [external] publishers.  But when I first
 > learned about CLOCKSS (Controlled LOCKSS) I immediately thought that
 > it would address the scenario we are facing in our consortium.  However,
 > I cannot find it in their web pages.

     http://www.lockss.org/clockss/Home
     (see the left-nav)
     http://www.lockss.org/clockss/FAQ

     Where does the initiative currently stand?

     The initiative, which began early in 2006, is implementing and
     evaluating both social and technical models over a two-year
     period.  During this time the initiative will work to build a
     full-scale production system using a significant portion of the
     content of the publisher members.  The work of the initiative is
     transparent and will be independently assessed, with all findings
     reported to the wider community.

 Joe Hourcle

 Subject:     Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date:     Sat, 27 Jan 2007 15:06:22 +0930
 From:     Stephen De Gabrielle <spdegabrielle@gmail.com>
 To:     OSS4LIB (E-mail) <oss4lib-discuss@lists.sourceforge.net>
 CC:     Ferran Jorba <Ferran.Jorba@uab.es>
 References:     <45B8CDCA.2020406@uab.es> 
 <26C6B0CCB6892843849BE72624C9D12E1745C7@medusa.library.arizona.edu>

 Hi I just thought I'd point out some useful resources;

 This message talks about making LOCKSS work with Digital repository
 software to create a private LOCKSS network.
 http://www.sfu.ca/~hgmorris/openaccess-archiving/msg00006.html
 Hopefully this helps, as the Repository software, is getting pretty
 good at supporting the digital preservation work you mention in you
 original email.

 Also the China Digital Museum Project paper 'Building a Distributed,
 Standards-based Repository Federation' talks a fair bit about how they
 handled replication and naming of metadata and content using the
 DSpace Digital Repository software.

 -- http://www.dlib.org/dlib/july06/tansley/07tansley.html

 The RLG has some nice work that you may find interesting:
 Attributes of Trusted Digital Repositories
 - http://www.rlg.org/en/page.php?Page_ID=583
 Audit Checklist for Certifying Digital Repositories (Draft - but good 
 enough for others[MAGDIR] to consider it as a model)
 -- http://www.rlg.org/en/page.php?Page_ID=20769

 Cheers,

 Stephen De Gabrielle

 Subject: Re: [oss4lib-discuss] oss4lib-discuss Digest, Vol 8, Issue 6
 Date: Thu, 25 Jan 2007 15:54:51 -0500
 From: Bosman, Don <dbosman@mail.lib.msu.edu>
 To: oss4lib-discuss@lists.sourceforge.net

 I trimmed a lot out for brevity. 

 For archiving, and for day to day "homes" folder usage Libraries,
 Michigan State University is using Apple's XServe RAID SAN solutions
 using Fibre Channel connections to our main servers. We currently have
 two XServe setups. One large mirrored set in our main library and an
 off site (in a branch) unit. We mirror every night and backup to the
 off site unit on the weekend. Getting started was a bit rocky as we
 were one of the first institutional installations for Apple and they
 weren't quite finished with the software. The last couple of terabyte
 expansions were relatively painless. I think we are at ten to twelve
 terabyte at this time.

 I must add that we do not do "live" manipulations on the SAN
 system. Using Photoshop to crop, rotate, tweak the color, etc, on a
 all the pages in a scanned journal or newspaper can slow the building
 network. Files are scanned or Bookeye'd to local drives - manipulated
 as needed for archiving then batch copied to the SAN in the
 evening. This type of buffering has works for us.

 I don't know what the future will bring or need, but we are happy with
 the system we have in place at this time.

 Don Bosman
 Information Technologist
 Libraries, Michigan State University
   100 Library
   East Lansing, MI 48824-1048
   dbosman@mail.lib.msu.edu
   (517) 432-6123 ext 233
   Fax (517) 432-8374

 Subject: Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date: Fri, 26 Jan 2007 15:55:25 +0100
 From: Ferran Jorba <Ferran.Jorba@uab.es>
 Organization: Universitat Autònoma de Barcelona
 CC: OSS4LIB (E-mail) <oss4lib-discuss@lists.sourceforge.net>
 References: <45B8CDCA.2020406@uab.es> <45B884A2.9A00.00D8.0@auburn.edu>

 Thank you all for your responses.

 [...]
 > Essentially, these projects (an others) are using private LOCKSS 
 > networks to create a dark archive with the caches somewhat 
 > geographically dispersed.

 Beth's reply about Auburn participating in a private LOCKSS network
 has given me hope.  I've followed your suggestion and I've filled
 contacted the LOCKSS people at http://www.lockss.org/clockss/Talkback

 Answering some of your other questions, I still don't know the size,
 because I have (partial) information about my own library, but less
 from the others.  Several Terabytes for sure, but again, I know that
 this is too vague.

 > Does this help?

 Sure it does.

 Thanks again,

 Ferran

 Subject:     Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date:     Fri, 26 Jan 2007 10:38:27 -0600
 From:     Beth Nicol <nicollb@auburn.edu>
 To:     Ferran Jorba <Ferran.Jorba@uab.es>
 References:     <45B8CDCA.2020406@uab.es> <45B884A2.9A00.00D8.0@auburn.edu> 
 <45BA165D.5030704@uab.es>

 Ferran:

 This is off-list, but, you can contact troberts@stanford.edu
 <mailto:troberts@stanford.edu> directly about the Private LOCKSS
 networks. I talked with him yesterday, and he can either answer your
 questions or get you hooked up with the right folks. You can tell him
 I referred you.

 Beth Nicol <nicollb@auburn.edu <mailto:nicollb@auburn.edu>>
 Information Technology Specialist
 Auburn University Libraries
 (334)844-1731

 Subject: Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date: Fri, 26 Jan 2007 14:09:53 -0700
 From: Han, Yan <hany@u.library.arizona.edu>
 To: Ferran Jorba <Ferran.Jorba@uab.es>, "OSS4LIB (E-mail)" <oss4lib-discuss@lists.sourceforge.net>
 References: <45B8CDCA.2020406@uab.es>

 This is not an easy answer for your questions.

 my understanding of LOCKSS is that it does not work for straight TIFF
 /PDFs. There are organizations who can take care of your
 problems. OCLC is testing the idea of preservation. There is also a
 research project going on with NDIIPP project, which has a consortium
 to do digital preservation. (Emory U. is one of the partners).

 Or you can just buy some hard drives/tapes and save multiple copies in
 off-site storage. but in this case, you are responsible for the
 migration of formats etc.

 I like the idea of consortium preservation, but there are other issues
 to be sorted out.

 Yan Han
 The University of Arizona Libraries 

 Subject:     Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date:     Mon, 29 Jan 2007 08:30:40 -0600
 From:     Beth Nicol <nicollb@auburn.edu>
 To:     OSS4LIB (E-mail) <oss4lib-discuss@lists.sourceforge.net>, Yan Han 
 <hany@u.library.arizona.edu>, Ferran Jorba <Ferran.Jorba@uab.es>
 References:     <45B8CDCA.2020406@uab.es> 
 <26C6B0CCB6892843849BE72624C9D12E1745C7@medusa.library.arizona.edu>

 I'm not sure what you mean by "it does not work for straight TIFF/PDF's" 
 -- you must organize your files into Archival Units, and create a 
 manifest page just as you would for a journal. I've harvested several 
 GB's of tiff's using LOCKSS.

 Beth Nicol <nicollb@auburn.edu <mailto:nicollb@auburn.edu>>
 Information Technology Specialist
 Auburn University Libraries
 (334)844-1731

 Subject:     Re: [oss4lib-discuss] Storing and keeping safe those huge digitalisation files
 Date:     Mon, 29 Jan 2007 11:15:16 -0700
 From:     Han, Yan <hany@u.library.arizona.edu>
 To:     Beth Nicol <nicollb@auburn.edu>, "OSS4LIB (E-mail)" 
 <oss4lib-discuss@lists.sourceforge.net>, Ferran Jorba <Ferran.Jorba@uab.es>
 References:     <45B8CDCA.2020406@uab.es> 
 <26C6B0CCB6892843849BE72624C9D12E1745C7@medusa.library.arizona.edu> 
 <45BDB0B6.9A00.00D8.0@auburn.edu>

 Beth,

 Thanks for point out.

 The algorithem in LOCKSS is to use peers in the network to 
 preserve/restore/repair files (by voting with the majority). My question 
 is: if I have a MD5/SHA signiture, I know if the file is authenticated. 
 Why do I need a vote?

 As your library is a member of MetaArchive, could you explain more about 
 how you handle the digitial signiture? do you do any modification of the 
 LOCKSS source code?  what about the cost? (in this case, I assume that 
 you are using PC as a storage unit. the cost should be lower).

 Yan 

 Subject: Universitat Autònoma de Barcelona
 Date: Fri, 26 Jan 2007 10:07:38 -0800
 From: Victoria Reich <vreich@stanford.edu>
 Reply-To: vreich@stanford.edu
 To: Ferran.Jorba@uab.es, Victoria Reich <vreich@stanford.edu>

 Dear Ferran,

 We are very pleased that you are interested in the LOCKSS and CLOCKSS
 Programs.  For your application, you will want to use the LOCKSS
 software and you will wish to set up a Private LOCKSS Network.

 The LOCKSS system can be used to preserve many TB of of web based
 content that the library holds.  If you cooperate with other libraries
 in Spain -- you can inexpensively build a very robust, distributed
 preservation network.  This is not hard to do, we support libraries
 who join the LOCKSS Alliance to do this.

 Before going further, I strongly suggest you bring a LOCKSS box
 online.  This first hand experience is the best and easiest way to
 learn how LOCKSS works.  The instructions for installing a LOCKSS box
 are here.  http://www.lockss.org/lockss/Installing_LOCKSS To bring a
 LOCKSS box online is free and you are welcome to send us email if you
 have questions.

 Sincerely,

 Victoria Reich
 Director LOCKSS Program
 Stanford University Libraries
 011.650.725.1134
 www.lockss.org

 Libraries are using LOCKSS to build local libraries!
 www.lockss.org
 CLOCKSS, a collaborative community archive
 www.lockss.org/clockss

----

 It is not clear to me, from reading your site, that it can work in
 this scenario:

 I work for a University Library where we have to store for the long
 time digital material where, most of the time, there is no publisher
 involved: either because they are personal archives, or old
 periodicals with no live editor, etc.  Other universities around ours
 have similar projects.

 Reading your CLOCKSS pages, I see references to those editors that
 distract me.  What I'm currently seeking is advice whether CLOCKSS can
 be used to do a (mostly) unattended backups, recoveries, etc. for this
 large archives (several Terabytes, not determined yet).

 I have been addressed here by people involved in to
 http://www.metascholar.org/events/2007/ddp/ .

 Subject: Re: Universitat Autònoma de Barcelona
 Date: Mon, 29 Jan 2007 10:17:35 +0100
 From: Ferran Jorba <Ferran.Jorba@uab.es>
 Organization: Universitat Autònoma de Barcelona
 To: vreich@stanford.edu
 References: <45BA436A.6040909@stanford.edu>

 Hello Victoria,

 thank you for your fast response.  I'll follow your suggestion and
 I'll try to bring a LOCKSS box myself, and see what I learn.

 How do you suggest me to proceed if I have more detailed questions?  I
 googled for a CLOCKSS mailint list and I didn't find it; not even a
 LOCKSS mailing list.  May I ask you for a suitable forum, or a contact
 person?

 Thanks again,

 Ferran
 

Actualitzat per Ferran Jorba fa quasi 14 anys · 1 revisions