Projecte

General

Perfil

Accions

Extraccio de metadades Audio

Als fitxer .info hi guardem metadades tècniques del fitxers: md5, sha1, i l'extracció de metadades des de dos softwares, un d'específic i el Jhove. En el cas de les imatges, l'específic és Imagemagick, i per als PDFs, xpdf.

En el cas dels àudios (i segurament els vídeos) hem de triar-ne algun, i més perquè el Jhove no és capaç d'extreure res útil dels mp3, per exemple.

De les diferents opcions que he avaluat, m'ha semblat preferible hachoir-metadata, perquè dóna la informació més clara i detallada. En tot cas, en aquesta pàgina deixo apuntades les diferents opcions que he avaluat i el resultat que donen.

Jhove ==


$ jhove --help

Jhove (Rel. 1.4, 2009-07-30)

 Date: 2011-04-28 12:49:46 CEST

 App:

  API: 1.2, 2007-05-10

  Configuration: /etc/jhove/jhove.conf

  JhoveHome: /users/stephen/projects/jhove

  Encoding: utf-8

  TempDirectory: /var/tmp

  BufferSize: 131072

  Module: AIFF-hul 1.3

  Module: ASCII-hul 1.2

  Module: BYTESTREAM 1.3

  Module: GIF-hul 1.3

  Module: HTML-hul 1.2

  Module: JPEG-hul 1.2

  Module: JPEG2000-hul 1.3

  Module: PDF-hul 1.8

  Module: TIFF-hul 1.5

  Module: UTF8-hul 1.3

  Module: WAVE-hul 1.3

  Module: XML-hul 1.3

  OutputHandler: Audit 1.1

  OutputHandler: TEXT 1.4

  OutputHandler: XML 1.5

  Usage: java Jhove [-c config] [-m module] [-h handler] [-e encoding] [-H handler] [-o output] 

         [-x saxclass] [-t tempdir] [-b bufsize] [-l loglevel] [[-krs] dir-file-or-uri [...]]

  Rights: Copyright 2004-2009 by the President and Fellows of Harvard College. 

          Released under the GNU Lesser General Public License.


$ jhove -k bbcsound_v29n4.mp3 

Jhove (Rel. 1.4, 2009-07-30)

 Date: 2011-04-28 12:38:47 CEST

 RepresentationInformation: bbcsound_v29n4.mp3

  ReportingModule: BYTESTREAM, Rel. 1.3 (2007-04-10)

  LastModified: 2003-03-17 17:11:15 CET

  Size: 3048973

  Format: bytestream

  Status: Well-Formed and valid

  MIMEtype: application/octet-stream

  Checksum: fc6d7074

   Type: CRC32

  Checksum: 90eb77e67154df47b8ccf36a2a0afb35

   Type: MD5

  Checksum: 9b0bfbbbebd58213e6baf8e8e15ae530b1958d1f

   Type: SHA-1

hachoir-metadata


$ hachoir-metadata --help

Usage: hachoir-metadata [options] files

Options:

  -h, --help           show this help message and exit

  Metadata:

    Option of metadata extraction and display

    --type             Only display file type (description)

    --mime             Only display MIME type

    --level=LEVEL      Quantity of information to display from 1 to 9 (9 is

                       the maximum)

    --raw              Raw output

    --bench            Run benchmark

    --parser-list      List all parsers then exit

    --profiler         Run profiler

    --version          Display version and exit

    --quality=QUALITY  Information quality (0.0=fastest, 1.0=best, and default

                       is 0.5)

  Hachoir library:

    Configure Hachoir library

    --verbose          Verbose mode

    --log=LOG          Write log in a file

    --quiet            Quiet mode (don't display warning)

    --debug            Debug mode


$ hachoir-metadata bbcsound_v29n4.mp3 

Metadata:

- Title: Morocco, Cafe, Rabat, Audible Traffic, TV And Expresso Machine

- Author: BBC

- Album: BBC Sound Effects 29 - Africa- The Human World

- Duration: 4 min 13 sec 387 ms

- Music genre: Efectes de so

- Track number: 4

- Channel: Joint stereo

- Sample rate: 44.1 kHz

- Bits/sample: 16 bits

- Compression rate: 14.7x

- Bit rate: 96.0 Kbit/sec (constant)

- Format version: MPEG version 1 layer III

- MIME type: audio/mpeg

- Endian: Big endian

extract


$ extract --help

Usage: extract [OPTIONS] [FILENAME]*

Extract metadata from files.

Arguments mandatory for long options are also mandatory for short options.

  -a, --all                  do not remove any duplicates

  -b, --bibtex               print output in bibtex format

  -B, --binary=LANG          use the generic plaintext extractor for the

                               language with the 2-letter language code LANG

  -d, --duplicates           remove duplicates only if types match

  -f, --file[[[[name]]]]             use the file[[[[name]]]] as a keyword (loads

                               file[[[[name]]]]-extractor plugin)

  -g, --grep-friendly        produce grep-friendly output (all results on one

                               line per file)

  -h, --help                 print this help

  -H, --hash=ALGORITHM       compute hash using the given ALGORITHM (currently

                               sha1 or md5)

  -l, --library=LIBRARY      load an extractor plugin [[[[name]]]]d LIBRARY

  -L, --list                 list all keyword types

  -n, --nodefault            do not use the default set of extractor plugins

  -p, --print=TYPE           print only keywords of the given TYPE (use -L to

                               get a list)

  -r, --remove-duplicates    remove duplicates even if keyword types do not

                               match

  -s, --split                use keyword splitting (loads split-extractor

                               plugin)

  -v, --version              print the version number

  -V, --verbose              be verbose

  -x, --exclude=TYPE         do not print keywords of the given TYPE


$ extract -f bbcsound_v29n4.mp3 

duration - 4m14

format - MPEG-1 Layer III audio, 96 kbps (CBR), 44100 Hz, joint stereo, no copyright, original

resource-type - MPEG-1

mimetype - audio/mpeg

description - BBC: Morocco, Cafe, Rabat, Audible (BBC Sound Effects 29 - Africa-)

track number - 4

album - BBC Sound Effects 29 - Africa-

artist - BBC

title - Morocco, Cafe, Rabat, Audible

album - BBC Sound Effects 29 - Africa- The Human World

track number - 04

content type - Efectes de so

title - Morocco, Cafe, Rabat, Audible Traffic, TV And Expresso Machine

filesize - 3.05 MB

file[[[[name]]]] - bbcsound_v29n4.mp3

Actualitzat per fa quasi 15 anys · 0 revisions