Projecte

General

Perfil

Accions

Altres convencions pels noms dels fitxers

Com anomenar els fitxers és un tema que ocupa (i preocupa) d'una manera o altra a la gent que gestiona dipòsits. En aquest pàgina recollim documentació sobre estàndards i/o bones pràctiques d'altres que ens puguin il·luminar.

Missatges a la llista digipres

Reprodueixo aquí alguns missatges (amb links) sobre aquest tema a la llista de preservació digital digipres. Veureu que hi intervenen els grans noms de les biblioteques nord-americanes.

 From: Ann Marie Willer <amwillerala@yahoo.com>
 Subject: [digipres] file naming conventions?
 To: digipres@ala.org, padg@ala.org
 Date: Wed, 09 Jul 2008 14:22:16 -0700 (PDT)
 X-Mailer: YahooMailRC/1042.33 YahooMailWebService/0.7.199

 Colleagues,
 I am involved in discussions about file naming conventions for the products of
 digitization projects.  Could you (1) recommend guidelines recently published
 or posted and/or (2) share what you do at your institution?

 If I've missed a previous discussion, please let me know, and I will consult
 the archives as well.

 Thanks,
 Ann Marie

 Ann Marie Willer
 Preservation Services Librarian
 Massachusetts Institute of Technology
 77 Massachusetts Ave.
 Building 14-0513
 Cambridge, MA 02139
 617-253-5692 phone

 Send ALA business to: AMWillerALA@yahoo.com

 From: Jessica Branco Colati <jessica@coalliance.org>
 Subject: [digipres] RE: file naming conventions?
 To: 'Ann Marie Willer' <amwillerala@yahoo.com>, digipres@ala.org, padg@ala.org
 Date: Wed, 09 Jul 2008 15:34:55 -0600
 X-Mailer: Microsoft Office Outlook 12.0

 Hi Ann Marie,

 Members of our consortium have used CDP's Imaging guidelines when looking at
 file-naming conventions, both historically and within the context of reviewing
 the recent release of a new version late last month: 
 http://www.bcr.org/publications/bcreview/2008/06/digital-imaging-best-practices-ver2.html

 We try to accommodate *any* file-naming conventions in practice at our members?
 institution, but from a software perspective, we've had some difficulty with
 '.' (dots/periods) used in filenames other than to delineate the file
 extension, and have had better luck with '_' (underscores) when provided by our
 members (i.e. MS01.0001.00001.tif vs MS01_001_00001.tif, seems to be more
 code-friendly??)

 Best,

  Jessica

 Jessica Branco Colati
 Project Director
 Alliance Digital Repository
 Colorado Alliance of Research Libraries
 3801 E. Florida Ave., Suite 515
 Denver, CO 80210
 t: (303) 759-3399 x113
 f: (303) 759-3363
 e: jessica@coalliance.org
 w: http://www.coalliance.org

 From: Liz Madden <emad@loc.gov>
 Subject: [digipres] Re: RE: file naming conventions?
 To: digipres@ala.org, padg@ala.org, jessica@coalliance.org,
  amwillerala@yahoo.com
 Date: Thu, 10 Jul 2008 09:14:36 -0400
 X-Mailer: Novell GroupWise Internet Agent 7.0.2 HP

 In addition to avoiding the dot and other punctuation marks and
 spaces, we've discovered it's useful to stay away from characters that
 Windows or UNIX or other platforms use as special characters. Here's a
 list of ones that may cause problems:

     < > : " / | ? *

 Avoiding these is especially important in the event that you need to
 transfer your content elsewhere in the future (e.g., to another
 repository/institution/system that uses a different platform), where
 the file name could be misinterpreted by the new system because of the
 characters in it.

 --Liz

*******************************
 Liz Madden
 Digital Media Projects Coordinator
 Office of Strategic Initiatives
 Library of Congress
 101 Independence Ave SE
 Washington, DC  20540

 emad@loc.gov
 phone: 202-707-4578
*******************************

 From: "Walls, David" <david.walls@yale.edu>
 Subject: [digipres] File Naming Conventions?
 To: "digipres@ala.org" <digipres@ala.org>
 Date: Thu, 10 Jul 2008 10:26:24 -0400

 Ann Marie

 If there are guidelines around for file naming conventions, I haven't been able
 to find anything that offers more than the most basic suggestions.

 My advice is to not to try to make up a naming convention, but to use the
 bibliographic record identification number for the specific resource to be
 scanned that is found in the MARC record for the title in your OPAC.   Most of
 the materials that we are digitally reformatting are cataloged in our OPAC.
 Call numbers can change, several books can have the same title, and using
 truncated titles for file names frequently don't offer much information.  The
 bibliographic record number is unique, does not change, and we use this as the
 persistent identifier for the files.  Also, data from OPACs already have a
 fairly reliable track record of being migrated into the future.

 In our OPAC, the bibliographic record number is a six digit number.  When we
 send materials to be scanned, we also send the vendor an Excel spreadsheet that
 includes the bibliographic record number, the title, and other information.
 The vendor returns the digital files of the materials scanned on a portable USB
 hard drive.  The drive contains a series of folders all named by the six digit
 bibliographic id number.  Inside each of the folders are the master,
 derivative, and metadata files.  For example, the parent folder would be named
 123456 or whatever the actual number is.  Inside the parent folder are four
 other folders named 123456.tif or 123456.jp2 depending on what we've chosen for
 the master file.  The other folders are 123456.pdf and 123456.xml.

 Please let me know if you have other questions.

 David Walls

 Preservation Librarian, Yale University Library.

 Head, Reformatting and Media Preservation.

 From: Robert Dowd <RDOWD@MAIL.NYSED.GOV>
 Subject: [digipres] Re: File Naming Conventions?
 To: "digipres@ala.org" <digipres@ala.org>
 Date: Thu, 10 Jul 2008 10:43:21 -0400
 X-Mailer: Novell GroupWise Internet Agent 7.0.2 HP

 The New York State Library employs such a convention, with OCLC number or local
 control number being the major portion of most file names (not all items imaged
 have been cataloged).  Some of our larger items are scanned in parts, some
 imaging equipment saves raw scans at one file per page (later combined for a
 use copy), and some of our imaged titles are multi-part sets or serials.  And
 so while the bib record identification number is a good start, we
 necessarily create file names that may include Volume, Number, Year, Month,
 Day, Part, Page, etc.  Previous cautions about use of 'underscore' and other
 standard characters all play into that.

 Bob Dowd
 Senior Librarian
 Documents Section
 New York State Library
 Albany, NY 12230

 From: Nancy <nmccrave@rochester.rr.com>
 Subject: [digipres] RE: file naming conventions?
 To: Ann Marie Willer <amwillerala@yahoo.com>, digipres@ala.org, padg@ala.org
 Date: Thu, 10 Jul 2008 19:55:33 -0400
 X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)

 When I was researching file naming some time ago, I had bookmarked these pages
 (I found the first to be particularly helpful):

 http://wiki.dlib.indiana.edu/confluence/display/INF/Filename+Requirements+for+Digital+Objects
 http://www.archives.gov/preservation/technical/guidelines.pdf  (see page 60)
 http://www.controlledvocabulary.com/imagedatabases/filename_limits.html
 http://edocs.lib.sfu.ca/projects/Doukhobor-Collection/technical.html
 http://staffweb.library.northwestern.edu/dl/adhocdigitization/storage/

 Hope this helps.

 Nancy McCrave

 From: "Casey, Michael T" <micasey@indiana.edu>
 Subject: [digipres] RE: RE: file naming conventions?
 To: "digipres@ala.org" <digipres@ala.org>, "padg@ala.org" <padg@ala.org>
 Date: Fri, 11 Jul 2008 09:14:55 -0400

 The Archives of Traditional Music updated its file naming scheme in
 2006, working with our Digital Library Program which was simultaneously
 developing the recommendations presented by the first link in Nancy's message,
 below. You can see our implementation for audio files in chapter 3 of the
 publication Sound Directions: Best Practices for Audio Preservation, available
 at

 http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/index.shtml

 Mike Casey

 --
 Mike Casey
 Associate Director for Recording Services
 Archives of Traditional Music
 Indiana University

 (812)855-8090

 Co-Chair, ARSC Technical Committee

 From: Liz Bishoff <lbishoff@BCR.ORG>
 Subject: [digipres] RE: RE: RE: file naming conventions?
 To: digipres@ala.org, padg@ala.org
 Date: Fri, 11 Jul 2008 10:38:26 -0600

 The BCR-CDP Digital Imaging Best Practices version 2.0 has just been published
 and it also has a section on naming conventions.  So I think there is now
 plenty of options for those interested.  You can find it at 
 http://www.bcr.org/cdp/best/index.html

 Liz Bishoff, Director, Digital and Preservation Services

 BCR

 14394 E. Evans

 Aurora CO 80014

 lbishoff@bcr.org

 From: Ingrid Mason <Ingrid.Mason@vuw.ac.nz>
 Subject: [digipres] RE: file naming conventions?
 To: Ann Marie Willer <amwillerala@yahoo.com>, digipres@ala.org, padg@ala.org
 Date: Tue, 15 Jul 2008 14:40:44 +1200

 Hi Ann,

 I don't see anyone mentioning this, so I figure I may as well offer this as a
 side issue.  I understand the need to define how files are to be named,
 particularly in such ways that won't create system issues (characters and
 spaces).  But, having 'intelligence' built into filenames, by using naming or
 system based alpha-numeric arrangement strikes me as slightly worrying.

 Why?  Simply because I'm hoping that there is metadata associated with the
 digital file that enables it to be identified and retrieved; not using the
 'information' in the filename in a meaningful way.

 We have purposefully used 'dumb' or 'generic' filenames in loading digital
 material into the research repository, e.g. thesis.pdf; paper.pdf; form.pdf;
 report.pdf, etc.  I expect the metadata that the object is associated with to
 enable information and object retrieval.  However, in saying that, the
 filenames we use are also a means to guide/remind users of the type of file
 that they are downloading.  That in itself is 'doubling' up the load on the
 filename to act also as a resource type label.  However, if all the filenames
 change in a preservation migration or transformation, we have metadata
 associated with digital object to identify the resource type.

 I hope this thought/reminder is useful.

 Cheers, Ingrid

 Ingrid Mason

 Digital Research Repository Coordinator
 ResearchArchive@Victoria
 Victoria University of Wellington
 ph: 64-4-463 6844
 em: ingrid.mason@vuw.ac.nz
 Location: Kelburn Campus, Rankine Brown, RB501A

 From: Bruce Gordon <bgordon@fas.harvard.edu>
 Subject: [digipres] Re: RE: file naming conventions?
 To: Ann Marie Willer <amwillerala@yahoo.com>
 Cc: digipres@ala.org, padg@ala.org
 Date: Mon, 14 Jul 2008 23:19:47 -0400
 X-Mailer: Apple Mail (2.926)

 Hello Ann and Ingrid,

 Digital preservation is not dependent upon the file name being anything but
 unique. Therefore a simple number string will suffice as long as metadata is
 linked to the file. That said,  there is a lot of value in having human
 readable names that convey information about the file such as catalog number,
 role, sequence number, etc. These things make the actual preservation workflow
 easier to follow and de-bug in case of problems.

 In our workflow we use filenames that incorporate the call number, volume
 number, preservation role, face number, and file sequence number. Upon
 ingestion into the digital repository, this human-readable name is stored in
 metadata, and the file is named by the repository automatically with a unique
 number string which is more efficient for data processing. Upon retrieval from
 the digital repository, the human readable name may be restored from the
 metadata so that humans can work with the file without confusion.

 Consistency and uniqueness are most important, regardless of the method used.

 Best,

 -Bruce

 Bruce J. Gordon
 Audio Engineer
 Eda Kuhn Loeb Music Library
 Harvard University
 Cambridge, Massachusetts 02138
 U.S.A

 From: Trudy Levy <Trudy@dig-mar.com>
 Subject: [digipres]  Re: RE: file naming conventions?
 To: digipres@ala.org
 Date: Sat, 19 Jul 2008 13:38:07 -0700

 I am glad that management systems have reached a state of development
 where links to objects never become broken. Harking from an earlier
 time when this did occur, I always encourage my clients to develop a
 alphanumerical code, such as Bruce is describing, which gives them
 some hint of the original object's identity.

 In thinking of joining a larger collection down the line, I also
 encourage that they identify location/ownership of original object.
 In the California Local Historical Digital Resource Project, which is
 residjng with the CDL, we are using the codes derived the OCLC ID
 codes to identify each library. For this project, we are embedding
 metadata in the TIFF header some descriptive metadata - title, owner,
 scanning vendor, ICC profile - for identifying purposes.

 Yours
 Trudy

 --
 Trudy Levy
 Digital Transition Consultant

 Image Integration 415 750 1274 http://www.DIG-Mar.com
 Images are information - Manage them

Altres estàndards o documents

Una proposta intrigant és la Pairtrees for Object Storage (http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html) que es troba a web de la California Digital Library. Sembla molt ben pensat però no acabo d'entendre del tot com aconsegueuxen les avantatges que diuen que té.

A la web de HathiTrust (http://www.hathitrust.org/) un dipòsit digital cooperatiu de les grans universitats nord-americanes hi ha un enllaç a les University of Michigan Digitization Specifications (http://www.lib.umich.edu/lit/dlps/dcsUMichDigitizationSpecifications20070501.pdf). A partir de la pàgina 7 hi ha els requisits dels noms de directoris i fitxers.

Actualitzat per Ferran Jorba fa quasi 14 anys · 2 revisions