Projecte

General

Perfil

Accions

Muscat status reports

In this page there are the status reports we have prepared to summarise the progress:


Some comments on the slides by @aw-bib

A single bibliographic record

As I did not follow all the discussion, so I am a bit confused. If I get

https://github.com/rism-digital/muscat/discussions/1115

right, Muscat uses a full Marc bibliographic to describe an item.

Then, for some (at least to me) not so apparent reason, they then introduced a distinction between bibliographic record types. They created a thing they call `source` and one they call `publications`. If I get it right, both are full bibliographic records. However, they treat `source` as sort of a primary record type and `publications` as a secondary record type. I am unclear why and where this has any relevance. I do understand that it is indended that `publications` refer to a source. (But I don't get why this makes them something so fundamentally different that one needs a new name and a new record type.)

Conceptually, this sounds like `source` being a journal (or book) and `publication` being an article in a journal (or chapter). Both, however, are complete bibliographic items that could stand for themselves and the relation in Muscat is talksAbout instead of appearsIn.

We do something like that in join2 with one single record type. So, I do not get why they introduced the two types in the first place. We do have publication types in join2, but they are only relevant for the end-user useable submission masks to propose the proper fields and define the mandatory ones. IOW it is only a property of the input UI not of the record as such.

From the discussion in issue 1115 I now understand that `source` is really tailored to Music also with some hard coded stuff. I understand that this is the reason all changes requested were done for `publications`. But from the issue mentioned I still read that they keep both, they would just suggest for the UAB sample implementations to do everything that was done in `publications` in an actual `source` record.

Blacklight removed from Muscat

This sounds reasonable, but is about the same idea the invenio people had when they abandoned `search_engine.py` in favour of ElasticSearch.

More and more I wonder if either way we don't just end up in a host of micro services to replace the current single stack that packages everything. And indeed if we'd not be better off that way, anyway. (The most extensive use of micro services I saw is Folio. I understand that it fires up some 20+ docker containers to get going, and it still doesn't have a UI for mere mortals...)

Two systems is more complex than one

I agree with this disadvantage.

However, I wouldn't care at all if there is a concept of distribution wrapped around it. I don't care either, that my Linux currently consists fo 3876 "systems" from `acl` to `zygrib-maps`.

Side note: I think one of the worst points is that CERN does not understand the need of a distribution for such a beast like Invenio. (join2 spent quite a bit of time to build this around Invenio1 by means of `InstallInvenio` which was recently even dockerized. Still some endeavour on it's own, but at least doable given `InstallInvenio` existed.)

Using standard ids for authority ids

UAB person ids most likely should read "some sort of ID that is resolved to a peoples record (on the system)". IMHO this would have most flexibility and can easily adopt whatever including ORCiD.

Note also that our authors may have a host of ids. E.g. we need to store InspireIDs to make author association work (without too much pain) for papers from Atlas or CMS. We currently assign only the internal one to our record, but in case we have it, export the ORCiD, eg. to DataCite. So some juggling between them will stay with us for some years to come. (Still, not every record has a DOI and this id has quite a history compared to ORCiD.)

Journal titles: numerical part of ISSN

While ISSNs are not as bad as ISBNs used to be... Still quite a few periodicals have a host of them. (E.g. https://pubdb.desy.de/record/302363; yes they assigned a new one at basically every subtitle change. For mere mortals all of them are "Phys Rev D". Fullstop.) And not all are unique, but this is indeed quite rare.

Did you check if you could work with say ZDB-IDs? ZDB is the largest periodicals database (journals and series) today and we use it as a basis for modelling journals in join2 for the last decade. Recently we adopted it for our Russian friends as well. See our subset of ZDB e.g. at

https://pubdb.desy.de/collection/Periodicals

If a journal is missing at ZDB we usually create it there first and then build our record based on the ZDB record. This is rarely the case, however, we had a few small Russian journals that are published in Russian language only where we went that way and sometimes if one of our authors publishes in the very first issue of a new journal... I don't know about the coverage of specialized journals in Catalan eg. but it might be well worth a try.

(The general concept for journals within join2 is the same as for authors above: some ID that points to a record so said record can hold a lot more information, like not only one ISSN.)

Note that we regenerate our records every year to reflect changes in database coverage and fix mistakes that can even creep into ZDB based on all journals covered by Scopus, WoS, Pubmed, Academic Source, DOAJ and a few others and allow for missing journals to be manually added if required. This process is entirely independent of invenios infra structure. It just spits out Marc21 and if needs be could create any other format with a reasonable amount of work (basically writing a mapper for Marc->something.) Also note that we have some non-english journals with their proper names (e.g. https://pubdb.desy.de/record/462456), so in general Catalan should not be an issue. For the Russian journals however, we supplied a list of proper names in Cyrillic to ZDB.

You may check if it could serve you at https://zdb-katalog.de/list.xhtml (we might not have much Catalan journals on our end.)

BTW: our enriched journal records would be available for harvesting, if you're interested, as well.

[p.12 onwards]

I miss a mandatory requirement for minting DOIs (or other IDs like Handle) and the general idea of full text handling for an OA repository.

Internal format must be Marc21.

I am unsure about the internal format to be Marc21. It needs the richness of Marc21 and should be able to digest it. But if this is a 1:1 lossless conversion of Marc21 -> some internal format -> Marc21, I would not care too much. (Note that e.g. Folio does not store in Marc21 either even though it is designed as an ILS suitable for hooking up with union catalogues. Also most German union catalogues use PICA or Alephs version of MAB2 as internal formats.)

The system should have forms for different types of documents.

I do not believe that we can send the average user(tm) to the Muscat cataloguing backend. Therefore, this requirement includes a way to upload files probably created by some different, form based interface that can be handled by mere mortals (non-librarians). Imagining the submission interface as a service and the upload as a service one ends up in sort of a mini-service-world.

Maybe this is referred to as The system should have forms for self-archiving. But then note, that this is the default frontend for all workflows in join2, including STAFF processes.

The system should provide an url checker for the 856 links

Would be nice however, most web servers don't return 404 any more but send you somewhere unrelated. IOW maybe this is just not worth too much effort. (I admit that I personally treat the cataloguing of URLs in 856 as a waste of time, unless they are the URL-expressions of some persistent ID. And then I'd use a dedicated field for that id so I don't have to rewrite thousands of records if http://dx.doi.org changes to https://doi.org.)

The system should provide standard statistics.

Most of the "standard" statistics (how much downloads etc) are irrelevant for join2. OTOH our bean counting is very detailed.

Actualitzat per Ferran Jorba fa quasi 3 anys · 6 revisions