Tasca #3816
tancatCrear un sitemap per a millorar l'accés des de Google (no Google Scholar)
Afegit per Ferran Jorba fa aproximadament 10 anys. Actualitzat fa quasi 7 anys.
Descripció
Hem estat contactats personalment per Google Scholar i de moment ens demanen que oferim un sitemap. El correu fa així:
Subject: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Tue, 19 Jan 2016 16:22:46 -0800
From: Darcy Dapra <darcyd@google.com>
To: digital.utp@uab.cat
Dear UAB Digital Library folks,
I hope that this message finds you well. My name is Darcy Dapra, and I work
closely with the Google Scholar engineering team on outreach to the
scholarly-publishing and library communities.
The Scholar engineers noticed that our system is having some trouble
discovering new articles/papers within the UAB repository (ddd.aub.cat) for
indexing, and they're wondering if it might be possible to add a sitemap/
htmlmap to the individual records' landing pages, e.g.
http://ddd.uab.cat/record/144996
within your robots.txt file? And then, once added, would it be possible to
send me email?
The sitemap would help to ensure that papers, as they're posted to the
repository, are able to be findable as quickly as possible in Google Scholar
search results.
If you have questions or need more information, then please do not hesitate to
let me know.
Many thanks, and I look forward to hearing from you,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar | darcyd@google.com
Fitxers
| MetatagsforBibliographicMetadataGoogleScholar-GUIDELINES.pdf (369 KB) MetatagsforBibliographicMetadataGoogleScholar-GUIDELINES.pdf | Ferran Jorba, 26-01-2016 11:33 |
Tasques relacionades 6 (0 obertes — 6 tancades)
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #1
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Wed, 20 Jan 2016 16:55:00 +0100
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
thank you for contacting us regarding UAB institutional repository
http://ddd.uab.cat (please note: not http://ddd.aub.cat).
You are right to point us to the recommendation about a sitemap file.
This is well documented in your recommendations
(https://scholar.google.com/intl/en/scholar/inclusion.html), that we
have keen to follow. Plese give us somte time to generate it and
we'll keep you informed. We may have some questions afterwards.
Best regards,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
Ferran.Jorba@uab.cat
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #2
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Wed, 20 Jan 2016 10:26:59 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Thank you very much for your kind, quick response, Ferran--they're much
appreciated! Please let me know if you have any questions as you generate
your sitemap; I'm glad to help in any way that I can.
Thanks again, and I look forward to hearing from you soon,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar | darcyd@google.com
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #3
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Thu, 21 Jan 2016 09:50:33 +0100
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
as I found out yesterday, the software we use for our institutional
repository, Invenio (http://invenio-software.org/, written mainly at
CERN), has an utility to generate sitemaps, but for version 1.1.3,
while we are still at 1.1.2. I'll try to backport this utility for
our site.
Thanks again for your interest,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
Ferran.Jorba@uab.cat
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #4
- Estat ha canviat de Creada a En curs
M'ha costat molt configurar el programa de creació de sitemaps d'Invenio. M'estava donant un error però que finalment he pogut solucionar gràcies a que a algú més li havia passat, i jo no recordava. Està tot explicat a:
https://www.mail-archive.com/project-invenio-general%40cern.ch/msg00624.html
És a dir, que cal omplir primer la taula expJOB uns valors perquè funcioni el job. Per exemple, abans d'executar-la està buida:
ddd@homs:~$ echo "select * from expJOB;" | dbexec ddd@homs:~$
Segons el Tibor, hauria de contenir aquests valors:
$ grep expJOB ~/download/invenio/invenio-1.1.2/modules/miscutil/demo/democfgdata.sql
INSERT INTO expJOB (jobname) VALUES ('sitemap');
INSERT INTO expJOB (jobname) VALUES ('googlescholar');
INSERT INTO expJOB (jobname) VALUES ('marcxml');
Els insertarem a la taula, doncs:
$ grep expJOB ~/download/invenio/invenio-1.1.2/modules/miscutil/demo/democfgdata.sql | dbexec $ echo "select * from expJOB;" | dbexec id jobname jobfreq output_format deleted lastrun output_directory 1 sitemap 0 0 0 0000-00-00 00:00:00 NULL 2 googlescholar 0 0 0 0000-00-00 00:00:00 NULL 3 marcxml 0 0 0 0000-00-00 00:00:00 NULL
I cal configurar les col·leccions a exportar. Els valors per defecte són:
$ cat ~/invenio/etc/bibexport/sitemap.cfg [export_job] export_method = sitemap collection1 = Articles collection2 = Preprints collection3 = Reports #fulltext_status = restricted_picture fulltext_status = restricted_picture
Això del fulltext_status no ho acabo d'entendre; he fet quatre proves al ddd-test i encara no sé ben bé què vol dir, tot i que sense aquest camp no funciona, diu que és imprescindible. De moment l'he generat així:
$ cat ~/invenio/etc/bibexport/sitemap.cfg [export_job] export_method = sitemap collection1 = artpub collection2 = docrec collection3 = jorcon fulltext_status = restricted_picture
Aleshores, ja funciona:
ddd@homs:~/invenio$ bibexport -w sitemap -s1d -u admin BibExport Task Submission ========================= Username: admin Password: 2016-01-21 16:39:47 --> Task #291933 submitted.
El job s'ha executat en menys d'un minut, i m'ha generat tres fitxers:
- http://ddd.uab.cat/sitemap-index.xml (aquest és molt petit)
- http://ddd.uab.cat/sitemap-1.xml (compte, que és molt gran!)
- http://ddd.uab.cat/sitemap-2.xml (aquest és mitjanet)
Ara mateix, doncs, aquests tres fitxers s'actualtizaran cada dia, i el tamany anirà creixent, suposo. Podem parlar de quines col·leccions exportem per sitemap, però de moment hi he posat les que m'ha semblat més raonable per Google Scholar (artpub, docrec i jorcon).
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #5
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Thu, 21 Jan 2016 17:53:40 +0100
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
I hope I've completed our sitemap configuration. Invenio sitemap
utility has generated three files:
http://ddd.uab.cat/sitemap-index.xml (a very small file)
http://ddd.uab.cat/sitemap-1.xml (warning! this one is quite large!)
http://ddd.uab.cat/sitemap-2.xml (you can peek better this one)
Those xml files well be updated daily. Please note that at this
moment we've exported only the more clearly scholarly collections of
our site, but it may change as we discuss it with the library.
If I understand well the (sparse) instructions, and hoping that the
CERN people have digged into the details of those conventions, we only
need to reference the site-index.xml in our robots.txt file, like
this:
Could you please confirm us that this is correct?
We have also found a googlescholar export utility that I haven't been
able to successfully run yet, but I'll try to understand the following
days:
Thanks again,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
Ferran.Jorba@uab.cat
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #6
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Thu, 21 Jan 2016 10:39:04 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hi Ferran,
Thanks very much for your notes, and I'm glad to hear that you were able to
generate the sitemaps in spite of the Invenio versioning! That's great. I've
asked the Scholar engineers to confirm that access is good, etc. (it looks like
it to me, but they'll double-check), and I hope to have confirmation for you
soon.
Regarding the sitemap in the robots.txt file, would it be possible to include
the absolute URL for the sitemap index file rather than the current relative
one? So, ideally, it would be:
Sitemap: http://ddd.uab.cat/sitemap-index.xml
Thanks again for your help, and I'll be back in touch soon,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar | darcyd@google.com
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #7
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Thu, 21 Jan 2016 11:15:26 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hi again, Ferran,
The Scholar engineers confirmed just now that all looks good (thank you!), but
you're right, the sitemap-1.xml file is very, very large (8.9MB), and they're
wondering if you can break it apart into two additional files, that are each
less than 5MB?
Otherwise, the Scholar system will have difficulty making its way through the
file, and might miss some of the URLs included.
(This recommendation is in addition to adding the absolute URL for your sitemap
index file to the robots.txt.)
Many thanks, and I look forward to hearing from you,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar | darcyd@google.com
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #8
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Fri, 22 Jan 2016 08:37:37 +0100
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
I hope both requests are now completed: there is an absolute URL to
the sitemap-index in the http://ddd.uab.cat/robots.txt file, and the
individual files are smaller: I've reduced the number of records for
each file from 50000 to 10000, and now they are about 1.7-1.8 MB each.
As a matter of fact, I didn't have to do any backport from this
utility. We already have it in our installed version, but it was not
documented, or I missed to spot it before your request.
On the other site, I'll investigate the other googlescholar export
method that Invenio provides. Which is more relevant for Google
Scholar, sitemap or googlescholar xml export?
Again, thanks very much for your request and support.
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
Ferran.Jorba@uab.cat
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #9
- Estat ha canviat de En curs a Tancada
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Fri, 22 Jan 2016 16:27:44 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hi Ferran,
Thanks very much for your note! Please see inline below:
On Thu, Jan 21, 2016 at 11:37 PM, Ferran Jorba <Ferran.Jorba@uab.cat> wrote:
Dear Darcy,
I hope both requests are now completed: there is an absolute URL to
the sitemap-index in the http://ddd.uab.cat/robots.txt file, and the
individual files are smaller: I've reduced the number of records for
each file from 50000 to 10000, and now they are about 1.7-1.8 MB each.
Fantastic--thanks very much for making these changes so quickly! I've asked
the Scholar engineers to confirm, and if they notice anything that might need
adjusting, then I'll let you know.
As a matter of fact, I didn't have to do any backport from this
utility. We already have it in our installed version, but it was not
documented, or I missed to spot it before your request.
Ah, interesting and no worries! :)
On the other site, I'll investigate the other googlescholar export
method that Invenio provides. Which is more relevant for Google
Scholar, sitemap or googlescholar xml export?
The key setups for Scholar indexing are (1) the sitemap, and (2) bibliographic
metadata for each publication added as "citation_XX" metatags in the HTML
record landing pages. As long as these are in place for your repository (and
it appears that they are!), then indexing should proceed without issue, and
nothing else is needed.
Again, thanks very much for your request and support.
Likewise, and I hope that you have a great weekend!
Darcy
CA Actualitzat per Cristina Azorin fa aproximadament 10 anys Accions #10
- Paraula clau s'ha establert a JR
CA Actualitzat per Cristina Azorin fa aproximadament 10 anys Accions #11
- Data de venciment s'ha establert a 25-01-2016
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #12
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Mon, 25 Jan 2016 13:11:08 +0100
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
while we were discussing this with the libray, we have a couple of
doubts that, if you could answer us, we would be very grateful.
First, is there an authoritative list of Highwire Press meta tags as
understood by Google Scholar? This question arises often among
digital repository admins, and there is no definitive answer:
https://www.google.cat/search?q=highwire+press+meta+tags
The second is: we have some bibligraphic records in our repository
with the fulltext embargoed for some time. It is not currently
possible to take those records out the automatically generated sitemap
list. Does it harm, from Google Scholar point of view. If it does,
what should we preferably do with them?
Thanks again,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
Ferran.Jorba@uab.cat
FJ Actualitzat per Ferran Jorba fa aproximadament 10 anys Accions #13
- S'ha afegit Fitxer MetatagsforBibliographicMetadataGoogleScholar-GUIDELINES.pdf MetatagsforBibliographicMetadataGoogleScholar-GUIDELINES.pdf
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Mon, 25 Jan 2016 13:02:59 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hi Ferran,
Thanks for your note, and I'm glad to help!
Please see comments inline below:
On Mon, Jan 25, 2016 at 4:11 AM, Ferran Jorba <Ferran.Jorba@uab.cat> wrote:
Dear Darcy,
while we were discussing this with the libray, we have a couple of
doubts that, if you could answer us, we would be very grateful.First, is there an authoritative list of Highwire Press meta tags as
understood by Google Scholar? This question arises often among
digital repository admins, and there is no definitive answer:
While we don't as of yet have an authoritative list published outside of our
Inclusion Guidelines, I'm attaching a document* that we recently created that
includes instructions for the "citation_XX" metatags that might be good to
include within repository scholarly research paper content.
Note that if you have other non-scholarly publications within your repository,
e.g. photos, videos, posters, etc., then please remove the metatags from those
items' record landing pages. This includes not only the "citation_XX" tags,
but also any DC or other type of metatag containing bibliographic metadata.
This ensures that those records still can be indexed in main Google (but not
in Scholar, which focuses on research papers, dissertations, chapters from
edited book volumes, technical reports, meeting abstracts, etc.).
*Because the "citation_XX" metatag document is evolving/not quite ready for
public primetime, it would be great if you could keep it within UAB for now.
Thanks!
The second is: we have some bibligraphic records in our repository
with the fulltext embargoed for some time. It is not currently
possible to take those records out the automatically generated sitemap
list. Does it harm, from Google Scholar point of view. If it does,
what should we preferably do with them?
If the full text for any of the UAB repository content is not viewable to all
users, then please block it to the Google crawler, either via robots or by
adding an "X-Robots-Tag: noindex" directive to the HTTP header of the file
(this works well for PDFs). And then once the content is available, you could
remove the block so that it can be fully indexed.
I hope that this helps, but if you have questions or need more information,
then please do not hesitate to let me know.
Thanks again and take care,
Darcy
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #14
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Mon, 25 Jan 2016 13:02:59 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hi Ferran,
Thanks for your note, and I'm glad to help!
Please see comments inline below:
On Mon, Jan 25, 2016 at 4:11 AM, Ferran Jorba <Ferran.Jorba@uab.cat> wrote:
Dear Darcy,
while we were discussing this with the libray, we have a couple of
doubts that, if you could answer us, we would be very grateful.
First, is there an authoritative list of Highwire Press meta tags as
understood by Google Scholar? This question arises often among
digital repository admins, and there is no definitive answer:
While we don't as of yet have an authoritative list published outside of our
Inclusion Guidelines, I'm attaching a document* that we recently created that
includes instructions for the "citation_XX" metatags that might be good to
include within repository scholarly research paper content.
Note that if you have other non-scholarly publications within your
repository, e.g. photos, videos, posters, etc., then please remove the
metatags from those items' record landing pages. This includes not only the
"citation_XX" tags, but also any DC or other type of metatag containing
bibliographic metadata. This ensures that those records still can be indexed
in main Google (but not in Scholar, which focuses on research papers,
dissertations, chapters from edited book volumes, technical reports, meeting
abstracts, etc.).
*Because the "citation_XX" metatag document is evolving/not quite ready for
public primetime, it would be great if you could keep it within UAB for now.
Thanks!
The second is: we have some bibligraphic records in our repository
with the fulltext embargoed for some time. It is not currently
possible to take those records out the automatically generated sitemap
list. Does it harm, from Google Scholar point of view. If it does,
what should we preferably do with them?
If the full text for any of the UAB repository content is not viewable to all
users, then please block it to the Google crawler, either via robots or by
adding an "X-Robots-Tag: noindex" directive to the HTTP header of the file
(this works well for PDFs). And then once the content is available, you
could remove the block so that it can be fully indexed.
I hope that this helps, but if you have questions or need more information,
then please do not hesitate to let me know.
Thanks again and take care,
Darcy
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #15
Thanks, Ferran. See additional comments following:
On Tue, Jan 26, 2016 at 8:44 AM, Ferran Jorba <Ferran.Jorba@uab.cat> wrote:
Hi Darcy,
thanks a lot for your draft document. We'll send our comments in a
few days.
Great--thanks!
Regarding another issue you raised, yes, I still have doubts. In our
institutional repository we have indeed scholarly and non-scholarly
articles. In the sitemap files we have exported just our scholarly
bibliographic records, that was easy (in our setup). But it is not
possible, I'm afraid, to remove citation_XXX tags for the
non-scholarly records, as they are generated by the software using the
same functions.
So, my question is: is it enough for Google Scholar to differentiate
them given that they are in the sitemap files?
Unfortunately the sitemaps are only additive and can't be used to restrict crawl to certain areas of your site.
We understand that this citation_XXX tags may be useful for other
purposes beyond Google Scholar. We expose bibliographic data in a
format easy to parse, so it may be reused for any third party. It
would be a pity that external users could not import this information
just because Google Scholar flags those records for its own use.
I completely understand. It certainly is up to you and your library needs to keep the tags within non-scholarly content; note that if any non-scholarly content is flagged by the Scholar system for falling outside of indexing guidelines, then it might cause removal of your repository from search results--this is why it would be good to remove the tags from the landing pages for video, ppt., images, etc. But again, this would be your decision. Do you perhaps have a separate interface, e.g. OAI-PMH where other services might glean the data?
We appreciate very much the time you are taking to help us.
My pleasure! Please let me know if you have further questions--I'm glad to help.
Cheers,
Darcy
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #16
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Tue, 9 Feb 2016 21:25:59 -0800
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hi Ferran,
I hope that you're doing well and that you had a nice weekend!
The Scholar engineers were reviewing your repository sitemaps recently, and they noticed that there seem to be a number of article landing pages missing; see for example the following URLs:
http://ddd.uab.cat/record/136370
http://ddd.uab.cat/record/141983
http://ddd.uab.cat/record/135067
http://ddd.uab.cat/record/135556
http://ddd.uab.cat/record/134013
http://ddd.uab.cat/record/135635
http://ddd.uab.cat/record/134560
http://ddd.uab.cat/record/134609
http://ddd.uab.cat/record/134977
http://ddd.uab.cat/record/135224
http://ddd.uab.cat/record/135317
http://ddd.uab.cat/record/134974
http://ddd.uab.cat/record/134954
http://ddd.uab.cat/record/135125
http://ddd.uab.cat/record/136369
http://ddd.uab.cat/record/136067
http://ddd.uab.cat/record/135301
http://ddd.uab.cat/record/136042
http://ddd.uab.cat/record/135783
Would it be possible to have a look, and to ensure that all of your papers' URLs are included in the sitemaps?
Also, if you're not doing so already, perhaps you could automatically generate the sitemap on a daily basis?
This would help a lot to ensure that the newest papers are indexed in Scholar as quickly as possible, within a few days' time.
Many thanks, and I look forward to hearing from you,
Darcy
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #17
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Organization: Universitat Autonoma de Barcelona
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Tue, 29 Mar 2016 11:40:15 +0200
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
thanks for asking us about those records. The explanation is that our
repository has scholarly records and non scholarly records, like
personal papers, old digitalized books, journal or magazines, political
posters, photos, UAB administrative documents, etc.
We have selected some of the collections of our repository and we have
configured the sitemap tool to expose only records belonging to the
scholarly collections. We know that this decission may affect robots
other than Google Scholar, but at this moment we are not aware of other
robots that use the sitemap files for reviewing our site.
Summing it up: our sitemap files expose at this moment more or less
half of our records (53,278 out 134,881), those that belong to scholarly
collections.
Do you know if the other Google robots (web, images and videos) limit
the crawl to our site if it has a sitemap file that shows only part of
our records?
Thanks and best regards,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
Ferran.Jorba@uab.cat
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #18
- Estat ha canviat de Tancada a En curs
- Prioritat ha canviat de Normal a Alta
Hem creat un oaiset anomenat scholar amb el contingut de les col·leccions que poden interessar a Google Scholar i robots similars. Quan s'hagi creat escriurem a la Darcy per si, tal com sembla d'una de les seves respostes, ho pot utilitzar.
CA Actualitzat per Cristina Azorin fa quasi 10 anys Accions #19
21/04/2016
Hi Ferran,
Thanks for your note! For the most comprehensive Scholar indexing of your repository, as previously mentioned, it would be great if you might be able to include the "citation_XX" metatags in the scholarly research papers' records (those that fall within the indexing guidelines for Scholar), and then to remove metatags (all) completely from the pages that contain 'non-scholarly' content, e.g. videos, PPTs, images, etc.
And then, if you were able to open up the metadata for all records--no exclusions necessary!--via the OAI-PMH interface, then other services might be able to download the data that they would need for the entire collection. Does that sound like a feasible/reasonable thing to do?
That way, you would also be able to include the complete set of URLs from your repository in the current sitemap, too (and you wouldn't need to restrict it to certain segments of content). Our system would then be able to handle all of the URLs without issue--with the scholarly papers well-indexed in Scholar, and the other content indexed well in main Google.
I hope that this helps, but if you have further questions or need more information, then please do not hesitate to let me know.
Thanks again, and I look forward to hearing from you!
Darcy
CA Actualitzat per Cristina Azorin fa quasi 10 anys Accions #20
Caldria respondre a la Darcy explicant-li que no havíem entès el tema del OAI (tan mono que ens havia quedat :-) i també escriure a la llista del Cern per veure si algú altre s'ha trobat amb problemes o si saben com restringir la generació de metatags. La 024 scholar ens serveix ara per si s'ha de portar a terme alguna actuació sobre el que és estrictament acadèmic tal i com vol Google.
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #21
- S'ha afegit relacionat amb Tasca #4007: Fer que els Highwire press meta tags només apareixin en els registres de recerca (Google Scholar)
CA Actualitzat per Cristina Azorin fa quasi 10 anys Accions #22
- Data de venciment ha canviat de 25-01-2016 a 25-06-2016
- Prioritat ha canviat de Alta a Normal
De: Ferran Jorba García
Enviat: dijous, 5 / maig / 2016 12:58
Per a: Darcy Dapra <darcyd@google.com>
A/c: Cristina Azorín Millaruelo <Cristina.Azorin@uab.cat>
Tema: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Dear Darcy,
I'm glad to inform you that we have found a way to restrict the "citation_XX" metatags to our scholarly records. We are polishing it in our test installation before publishing in production. We'll write you again shortly.
As you have suggested, after this change we'll update our sitemap.xml files to include all records.
Best regards,
Ferran
CA Actualitzat per Cristina Azorin fa quasi 10 anys Accions #23
De: Ferran Jorba García
Enviat: dimarts, 10 / maig / 2016 14:13
Per a: Darcy Dapra <darcyd@google.com>
A/c: Cristina Azorín Millaruelo <Cristina.Azorin@uab.cat>
Tema: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Dear Darcy,
as requested (by you) and promised (by us), we have restricted the citation_XXX meta tags to the scholarly records, approximately half of our site. This is valid since a few minutes.
In the following days, we'll update our sitemap.xml files to include all records.
We have one doubt about our doctoral theses and dissertations. We understand that they are of scholar content and, for the time being, they include the citation_XXX meta tags. Reading to your Google Scholar Inclusion Guidelines (https://scholar.google.com/intl/en/scholar/inclusion.html#content),
it seems that there is some (Google's) automatic discrimination according to the size.
So, our question is: should we exclude (manually) our doctoral theses from the records with the citation_XXX meta tags?
Thanks again for your help.
Best regards,
Ferran
CA Actualitzat per Cristina Azorin fa quasi 10 anys Accions #24
De: Darcy Dapra [mailto:darcyd@google.com]
Enviat: dimarts, 10 / maig / 2016 18:54
Per a: Ferran Jorba García <Ferran.Jorba@uab.cat>
A/c: Cristina Azorín Millaruelo <Cristina.Azorin@uab.cat>
Tema: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Hi Ferran,
Thanks for your note--and for the great news about the metatags & sitemap! That's fantastic.
Please do go ahead and include the dissertations and theses if you can; we have recommendations for suitable file sizes in our Guidelines, but there is some leeway. :)
Thanks again and take care,
Darcy
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #25
- Estat ha canviat de En curs a Tancada
Tal com havíem quedat, acabo de reconfigurar el sitemap perquè exporti tots els registres. Potser això pot tenir algun efecte (positiu) en la quantitat de registres que Google (el genèric, no el Scholar) troba al nostre DDD. Havent-ho parlat amb la Cristina, enumero cadascuna de les 9 col·leccions principals del DDD; el fitxer, doncs, ha quedat així:
$ cat ~/invenio/etc/bibexport/sitemap.cfg [export_job] export_method = sitemap collection1 = matcur collection2 = llicol collection3 = docrec collection4 = pubper collection5 = artinf collection6 = jorcon collection7 = docgra collection8 = multimedia collection9 = fonper fulltext_status = restricted_picture
I, donat que la Darcy de Google Scholar troba que ara mateix ho tenim bé, tanco la tasca.
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #26
La Darcy ens va escriure per assegurar-se que havíem acabat la nostra part:
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Mon, 23 May 2016 12:34:31 -0700
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Hello, Ferran,
I hope that you're doing well today! I wanted to touch base on the following
note to find out if the changes to your sitemaps & repository landing pages is
now complete or close to being complete?
Many thanks for your help, as always, and I look forward to hearing from you!
Darcy
Darcy Dapra | Product Partnerships | Google Scholar | darcyd@google.com
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #27
From: Ferran Jorba <Ferran.Jorba@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Tue, 24 May 2016 09:31:39 +0200
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorín <Cristina.Azorin@uab.cat>
Dear Darcy,
yes, as far as we understand, the sitemaps and repository landing
pages changes are completed. As you suggested us, doctoral
dissertations pages do have the scholar meta tags.
I'll answer your other request, regarding authors affiliations, as
reply to your other mail.
And, of course, if you notice anything that we should improve, please
feel welcome to contact us again.
Best regards,
Ferran Jorba
Institutional repository computer manager
Universitat Autònoma de Barcelona
FJ Actualitzat per Ferran Jorba fa quasi 10 anys Accions #28
Aquests de Google Scholar cada cop em recorden més els homes de negre... sort que la Darcy és tan amable, ella:
From: Darcy Dapra <darcyd@google.com>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Tue, 24 May 2016 17:07:00 -0700
To: Ferran Jorba <Ferran.Jorba@uab.cat>
Cc: Cristina Azorín <Cristina.Azorin@uab.cat>
Fantastic, and thanks again for all of your help, Ferran! You're wonderful to
work with and we much appreciate your adding the suggested setups for improved
indexing of your repository in Scholar (we realize that the work to implement
them is non-trivial!).
Given that the sitemap & metatags have been adjusted, I've asked the Scholar
engineers to take a look and verify, and I will be back to you shortly with
their comments/confirmation.
Cheers,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar | darcyd@google.com
CA Actualitzat per Cristina Azorin fa més de 9 anys Accions #29
Més missatges de la Darcy, en relació als metatags
EL Mon, 12 Dec 2016 16:53:44 -0800
Darcy Dapra <darcyd@google.com> escrigué:
Many thanks for your kind note, Ferran! And thanks for putting up with my periodic pings! :)
Sorry to bother you with these, but I noticed that in some of your repository records, some changes might have reverted?
See for example:http://ddd.uab.cat/record/158140
<meta content="10.1029/2009GC002603" name="citation_doi" />
<meta content="Bendle, James A. P." name="citation_author" />
<meta content="Alkenones, alkenoates, and organic matter in coastal environments of NW Scotland: Assessment of potential application for sea level reconstruction" name="citation_title" />
<meta content="2009" name="citation_date" />
<meta content="2009" name="citation_date" />
<meta content="Alkenones" name="citation_keywords" />
<meta content="Biomarker proxies" name="citation_keywords" />
<meta content="C/N" name="citation_keywords" />
<meta content="Organic geochemistry" name="citation_keywords" />
<meta content="Sea level change" name="citation_keywords" />
<meta content="info:eu-repo/semantics/article" name="citation_type" />
<meta content="info:eu-repo/semantics/publishedVersion" name="citation_type" />
<meta content="Rosell Melé, Antoni" name="citation_author" />
<meta content="Cox, Nicholas J." name="citation_author" />
<meta content="Shennan, Ian" name="citation_author" />
<meta content="Geochemistry, geophysics, geosystems" name="citation_conference" />
<meta content="Geochemistry, geophysics, geosystems" name="citation_journal_title" />
<meta content="http://ddd.uab.cat/pub/artpub/2009/158140/geogeogeo_a2009v10n12p1-21.pdf> <https://www.google.com/url?q=http://ddd.uab.cat/pub/artpub/2009/158140/geogeogeo_a2009v10n12p1-21.pdf&sa=D&usg=AFQjCNG05J__Y7tLjiv1JGBzoY8bW9DrIA>" name="citation_pdf_url" />
<meta content="10" name="citation_volume" />
Recommendations for adjustment:
(1) It looks like a duplicate "citation_date" metatag is appearing in
all records. Can you remove one of them?(2) Also, this paper is a journal article, but it's marked as a
conference paper with a "citation_conference" metatag.Can the conference-tag be removed from papers that come from journal
articles?Also, for conference papers specifically, it would be great if you
could include the following tag set:citation_title
citation_conference_title (full name of the conference)
citation_author (one author per field) citation_publication_date
citation_firstpage citation_lastpage citation_pdf_urlIf you have questions on any of this information, then please do not
hesitate to let me know.Many thanks for your help, and I look forward to hearing from you!
Darcy
Dear Darcy,
last month we upgraded our Invenio software, but although I keep track of our local patches, somehow this one wasn't applied so yes, changes had been reverted. Now it's back again. Thanks for pointing it to us.
I'll do a systematic review.
Summarising: no more double citation_date, and no citation_conference for a journal article. Also, no irrelevant pagination. We are back to the previous status, plus this pagination fix.
However, thanks to your note, now we've noticed that with our current setup for those Google Scholar citation tags, conference papers are indistinguishable from journal articles (but no longer vice-versa). We are discussing internally how to differenciate them.
Best regards,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
FJ Actualitzat per Ferran Jorba fa més de 9 anys Accions #30
Em sembla que ja tornem a estar bé, on estàvem:
From: Ferran Jorba <Ferran.Jorba@uab.cat>
To: Darcy Dapra <darcyd@google.com>
CC: Cristina Azorin <Cristina.Azorin@uab.cat>
Subject: Re: Inquiry regarding ddd.aub.cat (Google Scholar)
Date: Wed, 21 Dec 2016 09:32:44 +0100
Organization: Universitat Autonoma de Barcelona
Dear Darcy,
sitemap export is working again, and scheduled to be updated daily:
https://ddd.uab.cat/sitemap-index.xml
I've also reviewed all our local patches and I've fixed a few other
details that we tuned for Google Scholar, like that your metatags only
appear in scholarly records.
Next updates (like Orcid) will come later.
Hope you have a nice Christmas and New Year days!
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de Barcelona
EL Tue, 20 Dec 2016 09:56:08 -0800
Darcy Dapra <darcyd@google.com> escrigué:
Absolutely and understand completely, Ferran!
Looking forward to hearing from you when you've had opportunity to
review the patches.Cheers,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar |
darcyd@google.comOn Tue, Dec 20, 2016 at 9:52 AM, Ferran Jorba <Ferran.Jorba@uab.cat>
wrote:Dear Darcy,
you are right. Again, the problem is that not all our local patches
got applied during the last migration, and sitemap was another one.
Unfortunately, I haven't review all them yet.My fault; please allow me some more time.
Best regards,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de BarcelonaEL Mon, 19 Dec 2016 17:18:37 -0800
Darcy Dapra <darcyd@google.com> escrigué:Hi Ferran,
It's me again! :) The Scholar engineers are wondering if you
might be able to update your sitemap weekly? It looks like the
last update was about 3 weeks ago; it would be great to make sure
that new publications are picked up quickly for indexing.Thanks for your help, and I look forward to hearing from you!
Darcy
Darcy Dapra | Product Partnerships | Google Scholar |
darcyd@google.comOn Tue, Dec 13, 2016 at 1:26 PM, Darcy Dapra <darcyd@google.com>
wrote:Many thanks for your kind note, Ferran (all is good to know),
and thanks for updating the metatags!If you have questions, or if there are additional data added to
your tags soon (e.g. the ORCIDs, etc.), then do let me know.
We're happy to have a look & schedule a recrawl to pick up the
changes.Thanks again and take care,
Darcy
Darcy Dapra | Product Partnerships | Google Scholar |
darcyd@google.comOn Tue, Dec 13, 2016 at 1:43 AM, Ferran Jorba
<Ferran.Jorba@uab.cat> wrote:Dear Darcy,
last month we upgraded our Invenio software, but although I
keep track of our local patches, somehow this one wasn't
applied so yes, changes had been reverted. Now it's back
again. Thanks for pointing it to us. I'll do a systematic
review.Summarising: no more double citation_date, and no
citation_conference for a journal article. Also, no irrelevant
pagination. We are back to the previous status, plus this
pagination fix.However, thanks to your note, now we've noticed that with our
current setup for those Google Scholar citation tags,
conference papers are indistinguishable from journal articles
(but no longer vice-versa). We are discussing internally how
to differenciate them.Best regards,
Ferran Jorba
Institutional Repository computer admin
Universitat Autònoma de BarcelonaEL Mon, 12 Dec 2016 16:53:44 -0800
Darcy Dapra <darcyd@google.com> escrigué:Many thanks for your kind note, Ferran! And thanks for
putting up with my periodic pings! :)Sorry to bother you with these, but I noticed that in some of
your repository records, some changes might have reverted?See for example:
view-source:http://ddd.uab.cat/record/158140?ln=ca
<https://www.google.com/url?q=http://ddd.uab.cat/record/158140?ln%3Dca&sa=D&usg=AFQjCNFMjVVHgxzbatPzo9GzY5nxeo_sHw>
<meta content="10.1029/2009GC002603" name="citation_doi" />
<meta content="Bendle, James A. P." name="citation_author" />
<meta content="Alkenones, alkenoates, and organic matter in
coastal environments of NW Scotland: Assessment of potential
application for sea level reconstruction"
name="citation_title" /> <meta content="2009"
name="citation_date" /> <meta content="2009"
name="citation_date" /> <meta content="Alkenones"
name="citation_keywords" /> <meta content="Biomarker proxies"
name="citation_keywords" /> <meta content="C/N"
name="citation_keywords" /> <meta content="Organic
geochemistry" name="citation_keywords" /> <meta content="Sea
level change" name="citation_keywords" /> <meta
content="info:eu-repo/semantics/article"
name="citation_type" /> <meta
content="info:eu-repo/semantics/publishedVersion"
name="citation_type" /> <meta content="Rosell Melé, Antoni"
name="citation_author" /> <meta content="Cox, Nicholas J."
name="citation_author" /> <meta content="Shennan, Ian"
name="citation_author" /> <meta content="Geochemistry,
geophysics, geosystems" name="citation_conference" /> <meta
content="Geochemistry, geophysics, geosystems"
name="citation_journal_title" /> <meta content="
http://ddd.uab.cat/pub/artpub/2009/158140/geogeogeo_a2009v10n12p1-21.pdf
<https://www.google.com/url?q=http://ddd.uab.cat/pub/artpub/
2009/158140/geogeogeo_a2009v10n12p1-21.pdf&sa=D&usg=AFQjCNG0
5J__Y7tLjiv1JGBzoY8bW9DrIA>"name="citation_pdf_url" />
<meta content="10" name="citation_volume" />Recommendations for adjustment:
(1) It looks like a duplicate "citation_date" metatag is
appearing in all records. Can you remove one of them?(2) Also, this paper is a journal article, but it's marked
as a conference paper with a "citation_conference" metatag.Can the conference-tag be removed from papers that come from
journal articles?Also, for conference papers specifically, it would be great
if you could include the following tag set:citation_title
citation_conference_title (full name of the conference)
citation_author (one author per field)
citation_publication_date
citation_firstpage
citation_lastpage
citation_pdf_urlIf you have questions on any of this information, then
please do not hesitate to let me know.Many thanks for your help, and I look forward to hearing from
you!Darcy
Darcy Dapra | Product Partnerships | Google Scholar |
darcyd@google.com
CA Actualitzat per Cristina Azorin fa aproximadament 9 anys Accions #31
- Paraula clau s'ha suprimit (
JR)
FJ Actualitzat per Ferran Jorba fa quasi 7 anys Accions #32
- Tema ha canviat de Crear un sitemap per a millorar l'accés des de Google Scholar a Crear un sitemap per a millorar l'accés des de Google (no Google Scholar)
Me n'acabo d'adonar que les urls dels sitemaps encara eren http, ara ja seran https: