How to create Invenio collections in Muscat¶
Collections, in Invenio, are several things:
- The result of a search, usually a 980 $a or $b value.
- The sum of the records of the subcollections, when a collection has subcollections.
- A permanent url for this group of records, optionally with a portalbox with some informative text.
- Given that this group of records is somewhat fixed, it allows for other features, like statistics.
The first one can be easily solved via facets, but not the second or the third. At UAB, for example, we have more than 400 collections for document types, but also research groups or special collections. We use hierarchical collections and virtual collections.
(aw-bib) Note: to model the second case one would need an authority-based search (on join2s wish-list for invenio since join2 started out almost 10 years ago; its tackled in invenio3 elastic search by expansions, one should check how its done in other systems under consideration). If this this kind of search exists, 2 would be a facet again. (As a side note one might wish for some parmeter like noexplode that do not fetch the children once an id is called up, iow a way to switch of the authority expansion. This is a more advanced concept that could be dealt with another index, that does not expand the authorities.) As these kind of searches would solve a lot of (one might even be bold: almost all) problems wrt searchs and statistics/bean counting, and the way to work around them is quite long (as join2 can tell) one should make sure that the successor of invenio 1 has this capability on board. aw-bib believes that points 1, 2 and 4 are solved with authority based searches while for 3 the only thing lacking is the display of some informative text. A route to tackle this could be to use the 680 field of the authority in question that may either hold the text and/or give a pointer where to get it. At join2 we recently quite successfully implemented such stuff for the library system, e.g. to produce canned replies. (Details go well beyond this discussion here, though.)
I think that each of this issues deserves a specific solution. For all of them, the goal is to minimize the modification of stock Muscat, if possible no modification.
(aw-bib) perfectly agrees wrt the possible no modification.
In the bibliographic record¶
Which tag in Muscat is the equivalent of 980 in Invenio in Sources bibliographic record? (deprecated)¶
In order to have an exhaustive knowledge of which tags RISM uses, I've examined the whole RISM sources catalog, that can be downloaded from https://opac.rism.info/main-menu-/kachelmenu/data.
980 is used for cataloging level, for example:
- 980 __ $a import
- 980 __ $a import $b brief
- 980 __ $a import $b brief $c examined
- 980 __ $a import $b full
- 980 __ $a import $b full $c examined
- 980 __ $a import $b standard
- 980 __ $a import $b standard $c examined
- 980 __ $a import $c examined
- 980 __ $a retro $b full $c examined
- 980 __ $a retro $b standard
- 980 __ $a retro $b standard $c examined
- 980 __ $a retro $b standard $c not_examined
- 980 __ $a RISM
- 980 __ $a RISM $b brief $c examined
The more obvious is 852 (https://www.loc.gov/marc/holdings/hd852.html), because it mostly works without great modifications in Muscat. As Muscat goal is to catalog unique manuscripts, 852 is non-repeatable. $a holds the library siglum, that is very similar to Invenio collection id.
(aw-bib) Note that handling 852 as non-repeatable (I understand the whole field to be non-repeatable) cause quite some issues. The immediate one would be in ILS context if you have more than one item. So I think one has to find a way to make it repeatable. If it should be used for collections (which sounds sensible, as there is no difference between physical and electronic items on a cataloguing level) one should also bear in mind that (at least at join2) records always belong to many collections.
[Ferran] A possible solution would be to make some subfields repeteable, but maybe other (unexplored) alternatives exist.
(aw-bib) Note that it might be desirable to have more than one field to build collections, IOW not be restricted to have it in only one special field. E.g. at join2 we modelled a workflow by moving it through collections (in our case submission). This requires juggling with 980__. Later on we got use cases that are built orthogonal to this workflow (in our case e.g. apc handling, ils). Due to the sometimes quite complex juggling of 980 for submission we decided to do the other collection assignments on other fields. E.g. 9801_. Had we known back in the day, that invenio could build a collection on any tag, we would not have repeated publication types (journal article, book etc.) in 980__ but just used 3367_ that holds this information including unique ids already.
[Ferran] Muscat has specific workflow tools to review records, so I think that we should explore them without prejudices. I still don't master them, but my advice would be to learn what Muscat offers and evaluate by itself, because they may work. Muscat has also some specific resources like folders, that can group any set of Marc records. They can also be a useful tool or many internal workflows. Probably not as (public) collections, because it sees that they don't have any relation with the (public) Blacklight interface, it is specific for (internal) Muscat pages.
(aw-bib) It might well be worth to explore if collections in Muscat employ what's called an index / logical field in invenio-speak. This was the case in invenio all the time, we just thought collection:book is something magical. Maybe this is the same in Muscat => one could just use appropriate index definitions, and stick to the most standard conform marc tags.
[Ferran] What Muscat call collections are specific mather-to-son links in 773-774, not searches, ex: http://www.rism-ch.org/catalog/400111146. There is a name conflict here, we must be careful when using this word.
Here are a few examples on 852 usage in RISM catalog. It seems the only tag where the id is not stored in $0, but in $x:
- 852 __ $a US-CAe $c Mus 779.8.621 $e Harvard University, Eda Kuhn Loeb Music Library $x 30003169
- 852 __ $a US-CAe $c Mus 696.572.510 $e Harvard University, Eda Kuhn Loeb Music Library $x 30003169
- 852 __ $a F-Pn $c RES VM1-264 $e Bibliothèque nationale de France, Département de la Musique $x 30001488 $3 325369
- 852 __ $a A-LA $c 2377 $e Benediktinerstift, Musikarchiv $q No 1, 6, 7, 10 $x 30000354 $3 51017466
- 852 __ $a D-Mbs $c Mus.Schott.As 3647 $d Altes Schott-Archiv, Nr. 3669 $e Bayerische Staatsbibliothek $x 30000882
So, we see that:
- $a would be the collection siglum. It has autocomplete feature, the resolved names is displayed in the next field.
- $x the institution authority ID.
Which tag in Muscat is the equivalent of 980 in Invenio in Secondary literature bibliographic record? (preferred)¶
The current tags used for secondary literature in the whole RISM database are:
- 001 36303
- 003 36303
- 005 36303
- 020 540
- 022 172
- 024 114
- 041 10528
- 044 1660
- 100 6960
- 210 35375
- 240 12959
- 250 35
- 260 31065
- 264 91
- 300 4448
- 337 11881
- 500 9563
- 502 196
- 520 328
- 650 2177
- 651 2261
- 700 13010
- 710 2632
- 730 167
- 760 5952
- 780 48
- 785 39
- 856 891
So, 980 is not used here. As secondary literature is a different bibliographic record from sources, we could, in principle, reuse 980. (To expand)
Virtual collections¶
In principle, 852 $b seems the easiest one, with some caveats:
- in stock Muscat, is non-repeatable. To make it repeatable, change "?" to "*" in :occurrences: of $b in https://github.com/rism-ch/muscat/blob/develop/config/marc/tag_config_source.yml#L757.
- it is not indexed.
- it does not have the autocomplete feature like $a.
(aw-bib) this hints at an IMHO important point that probably needs to be explored in more depth. Why doesn't b have the autocomplete? What would be required for it to learn it? IOW: do we have autocomplete if and only if stock Muscat built it into the code in some uncharted backwaters we never want to dive into or is this configurable? I think this question is quite crucial (as at least at join2) we have a host of fields that can be authority controlled in principle, but hardly ever are in other systems. E.g. 773__ (journal pub note) is authority controlled in our case wrt subfields 0, 2, t,x plus setting 773__ triggers further addition of full authority controlled fields 9151_ (statistics keys, derived from the journal record) and some other non-controlled fields like 0247_$2ISSN. This kind of if I set X something more happens to the record-thing (join2 has this only in websubmit, but it is not available in bibedit, to use again some invenio-slang) helps a lot wrt data consistency.
[Ferran] It seems that $b has no autocomplete feature because Muscat currently only accepts a single authority based subfield per tag; so, if $a has it, no other one can have it. I saw that in an a quite explicit error message; sorry I don't recall the exact terms.
Parent collections¶
The previous paragraph deals with the leaf collections. How do we create parent collections?
One option would be to explicity set those collections in some 852 subfield. In a way, Invenio webcoll does that: calculates and stores the recids that belong to each collection. We could store them in some 852 subfield, other than $a or $b, like $c (Former shelving location (R), see: https://www.loc.gov/marc/bibliographic/bd852.html). So we could easily know which collection does a record belong to.
(aw-bib) Note that this would drop the IMHO a good reason to have collections in invenio: that the parents collect all their children. Couldn't this be done in indexing? Say, we use the proposed outline with $a and $b, where those two are not just some random strings but IDs for some authority record, couldn't we just employ vertical links in the authority records to create the collection tree and associate records accrodingly? (Note that at join2 we use this strategy to build the major part of the collection tree. That each parent collects the records in it's child collections is then done by invenio, but this is just what would be solved by an authority based search: the parent would explode get all it's children and collect their records.)
[Ferran] It seems sensible; but at UAB we are mostly happy as collections work in classic Invenio. The way we have is: primary collection (980 $a) is for document type (article, book, musical score, picture, etc), and virtual collections (980 $b) for research group or specific material grouping. We quite happily add (and sometimes remove) 980 $b codes, and they magically appear or disapear from (virtual) collections.
(aw-bib) Note also that we found (over time) that some of the "trees" that we were asked to model are actually thickets. That is of course one parent has many children, but also the children have more than one parent. Plus, recently some imaginative minds invented entities that don't live on the leaves, but on some junction only. (If they take this serious it will break a lot of IT systems though ;)
[Ferran] I don't know, sorry.
(aw-bib) How is the access handled? IOW do we have different rights on collections? Does someone who has access to some collection automatically get access to it's children? Or is this / can this be manual? (We found some limitations here in our current use at join2. Again one could go to model access on authorities and base everything on people authority records, but once we started out we didn't do this. "Short cuts make long delays" as Tolkien put it in the mouth of Bilbo.)
[Ferran] As far as I know, a bibrecord is either public (published) or private (not published). Published records are public for everybody; they are accessible in the (public) Blacklight interface. Unpublished records are accessible only in the (private) Muscat interface. Records have owners (cataloguers), and only cataloguers of the same workgroup (library), or with more privileges can edit records. But I haven't properly seen any access restriction like Invenio when records are published.
Collection management¶
Collection hierarchies¶
In Invenio, there are collection that have sub collections, and so on. Those top-level collections are the sum of the records of the subcollections. How do we solve it in Muscat?
How do we use them? Are they just a visualization issue?
(aw-bib) given an extensive tree there is also the question: How do we create it? How do we manage changes? As outlined above in join2 we use authority records that give the structure plus some not to complex but also not entirely trivial piece of code that climbs trough the authority tree and triggers creation etc.
Portalboxes¶
There are several aspects about portalboxes.
Where do we store the texts and images? In Muscat itself? As is now, or do we extend it? Or do we store them externally?
In my tests, I have successfully created institution authority records, multilingual, using this information. Even I have filled the 678 field with multilingual portalboxes. I've choosen 678 because it is repeteable and can be as long as we want. The only problem is that it seems that Muscat reformats the text, so all formating (lists, links, etc) is lost. Maybe with some wiki-like markup this can be solved. I'll attach an example.
(aw-bib) It might proof worthwhile to fetch the content from some system-independent source where it is easy for STAFF to manage. The idea of 678 outlined is immediate. However, curating html in a MARC record (while keeping it valid ;) is a bit cumbersome. A flat idea might be just store the URL where to get the info from and point to some Wiki: keep the info in the wiki where it is also easy to curate, and draw it in via xhr or html5-imports. (I played with the latter recently, it might be a bit too new for most browsers, so the xhr-js is probably the sweet spot for now that could be replaced later.) Ideally, the wiki would have a way to server only the payload and not the decoration.

Another aspect is how do we display them on screen.
- One option could be http://spotlight.projectblacklight.org/
- Another one, http://thoughtbot.github.io/high_voltage/
- Generic Rails solutions (including high_voltage): https://stackoverflow.com/questions/3992760/static-pages-in-rails
End user interface¶
Collection tree display¶
One distinctive feature of classic Invenio installations is the collection tree that is displayed in the front page, or as in each branch of real or virtual collections.
In Invenio, the order of this tree is done manually, so each administrator can decide which collections appear at top or bottom of the screen. How do we solve it in Muscat?
Actualitzat per Ferran Jorba fa més de 5 anys · 15 revisions