Projecte

General

Perfil

Accions

Tasca #5192

tancat
CA NC

Rànking de repositoris - comprovació de les dades de recollida amb Google

Tasca #5192: Rànking de repositoris - comprovació de les dades de recollida amb Google

Afegit per Cristina Azorin fa més de 7 anys. Actualitzat fa aproximadament 6 anys.

Estat:
Tancada
Prioritat:
Normal
Categoria:
Tecnologia
Inici:
13-11-2018
Data de venciment:
13-11-2019
Paraula clau:
JR

Descripció

Comentar-li a la Darcy com està el tema de la indexació del DDD a Google Scolar.

Benvolguda Darcy,

En un dels teus darrers correus ens vas anunciar una nova recol·lecció global del repositori per el mes de maig. Hem observat que al llarg del temps s'han produït alguns canvis en la visualització del DDD a Google Scholar, però ens és molt difícil esbrinar si és per algun defecte en els nostres registres, perquè els resultats són "personalitzats" per les cerques que realitzem més habitualment, o simplement són els correctes per a una plataforma com la nostra.

Segons els nostres càlculs s'haurien d'estar recollint a Google Scholar uns 80.000 registres, però si fem una cerca per "ddd.uab.cat", ens retorna uns 20.000 resultats. Això és normal?
Cerca al DDD: https://ddd.uab.cat/search?ln=en&sc=1&p=scholar
Cerca a Google Scholar: https://scholar.google.es/scholar?hl=ca&as_sdt=0%2C5&q=%22ddd+uab+cat%22&btnG=


Tasques relacionades 2 (0 obertes2 tancades)

relacionat amb DDD - Tasca #5734: Arreglar i ampliar les metadades de la pàgina HTML dels registresTancadaCristina Azorin30-04-202017-12-2020Accions
relacionat amb DDD - Tasca #6647: Corregir errors que fan que el DDD no aparegui prou bé a Google ScholarTancadaAnna Florensa28-09-202131-03-2023Accions

CA Actualitzat per Cristina Azorin fa aproximadament 7 anys Accions #1

  • Data de venciment ha canviat de 20-12-2018 a 21-01-2019

CA Actualitzat per Cristina Azorin fa aproximadament 7 anys Accions #2

  • Data de venciment ha canviat de 21-01-2019 a 30-04-2019
  • Prioritat ha canviat de Normal a Alta

CA Actualitzat per Cristina Azorin fa quasi 7 anys Accions #3

  • Data de venciment ha canviat de 30-04-2019 a 05-06-2019
  • Prioritat ha canviat de Alta a Urgent

CA Actualitzat per Cristina Azorin fa quasi 7 anys Accions #4

  • Prioritat ha canviat de Urgent a Immediata

CA Actualitzat per Cristina Azorin fa quasi 7 anys Accions #5

esbrinar que vol dir exactament 'inclou cites'

FJ Actualitzat per Ferran Jorba fa quasi 7 anys Accions #6

Jo diria que per saber què hi a a Google Scholar del DDD, la cerca hauria de ser:

I a nosaltres, cercant scholar (la pseudo-col·lecció que en principi recol·lecta Google Scholar) en surten 80.800:

Ara bé, no sé com treu els 42.600 el Sr. Aguillo:

CA Actualitzat per Cristina Azorin fa quasi 7 anys Accions #7

Ferran, ara mateix no trobo el correu però la Mercè Pi ja li va demanar a l'Aguillo d'on sortien els 40.000 i era pel fet de desmarcar l'opció 'inclou cites' que ell considera que són autocites.

CA Actualitzat per Cristina Azorin fa quasi 7 anys Accions #8

en el seu dia ho vam escriure en aquesta altra tasca #5039

FJ Actualitzat per Ferran Jorba fa quasi 7 anys Accions #9

Per fi, Ferran, ja tocava!

From: Ferran Jorba <>
To: Darcy Dapra <>
CC: Cristina Azorín <>
Subject: What does 'Include citations' mean in Google Scholar?
Date: Thu, 23 May 2019 15:22:01 +0200

Dear Darcy,

we have noticed that there have been changes how Google Scholar sees
our institutional repository (https://ddd.uab.cat).

According to our figures, Google Scholar should collect about 80,000
records (https://ddd.uab.cat/search?p=scholar). However, we see less
than 70.000 records:

https://scholar.google.cat/scholar?q=site:ddd.uab.cat

Then, there are a couple of flags ('Include patents' and 'Include
citations') that we are not sure to understand. Patents are easy,
because we don't have them, and flagging it or not doesn't change the
results (that's good). However, the 'Include citations' is a mystery.
As we have not been able to find what is it, may we ask you where is it
explained or, if it is not explained anywhere, could you please explain
us what it means?

Sincerely,

Ferran

--
Ferran Jorba
Administrador del https://ddd.uab.cat
Servei d'Informàtica
Universitat Autònoma de Barcelona

Tel. 93.581.42.40

FJ Actualitzat per Ferran Jorba fa quasi 7 anys Accions #10

La nostra amiga Darcy ja no treballa per Google Scholar, però ens ha posat en contacte amb la Monica Westin, que molt amablement ens ha contestat:

From: Monica Westin <>
To: Darcy Dapra <>
Cc: Ferran Jorba García <>, Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Thu, 23 May 2019 16:12:41 -0700

Thank you, Darcy! Ferran, I'm happy to explain why you're seeing lower
numbers than expected.

The fact that there are 70K records from your repository in Scholar search
results means that Google Scholar is able to index it well. However, when
you search Google Scholar for a site ("site:ddd.uab.cat"), the resulting
number only includes the items for which the repository is the primary
link. It doesn't include results where your site is providing the primary
version. For repositories, this is very common, as the publisher version is
often primary.

I recommend checking a number of randomly selected items across the
repository and searching for titles in Google Scholar (make sure to click
"All XXX versions" from the search results page). If you do find items not
included in Scholar that should be, feel free to email me again.

My very best,
Monica

On Thu, May 23, 2019 at 8:35 AM Darcy Dapra <> wrote:

Many thanks for your note, Ferran -- it's nice to hear from you! I've
changed teams and am now working on digitization partnerships with
libraries for Google Books, so by way of cc: I'd like to introduce you to
Monica Westin, who is now heading up outreach for Google Scholar, and who
would be able to point you in the right direction / provide information on
your repository coverage as well as the search-restricts / filters that you
mention. You'll be in great hands with Monica. :)

Thanks again,

Darcy

Darcy Dapra | Library Partnerships | Google Books |

On Thu, May 23, 2019 at 6:22 AM Ferran Jorba García <>
wrote:

Dear Darcy,

we have noticed that there have been changes how Google Scholar sees
our institutional repository (https://ddd.uab.cat).

According to our figures, Google Scholar should collect about 80,000
records (https://ddd.uab.cat/search?p=scholar). However, we see less
than 70.000 records:

https://scholar.google.cat/scholar?q=site:ddd.uab.cat

Then, there are a couple of flags ('Include patents' and 'Include
citations') that we are not sure to understand. Patents are easy,
because we don't have them, and flagging it or not doesn't change the
results (that's good). However, the 'Include citations' is a mystery.
As we have not been able to find what is it, may we ask you where is it
explained or, if it is not explained anywhere, could you please explain
us what it means?

Sincerely,

Ferran

--
Ferran Jorba
Administrador del https://ddd.uab.cat
Servei d'Informàtica
Universitat Autònoma de Barcelona

Tel. 93.581.42.40

--
Monica Westin | Partnerships | Google Scholar |

CA Actualitzat per Cristina Azorin fa quasi 7 anys Accions #11

  • Data de venciment ha canviat de 05-06-2019 a 19-09-2019

Donar les gràcies per la resposta però reclamar aquesta segona part, l'explicació del 'inclou cites'.

Posar exemples de la upc: https://scholar.google.cat/scholar?as_vis=0&q=site:upcommons.upc.edu&hl=ca&as_sdt=0,5 72.400
sense el inclou cites són 64.500
No varia tan com el DDD

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #12

Cara a la resposta a la Monica Westin, ens demana registres que exportem com a scholar al DDD i que no apareguin a Google Scholar. Es tracta d'una cerca manual, i hem trobat aquests:

Presentacions: Articles: Pla de gestió de dades:

Capítols de llibres: *

TFG:

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #13

From: Ferran Jorba García <>
To: Monica Westin <>
CC: Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Thu, 11 Jul 2019 14:04:20 +0200

Dear Monica,

thank your for your explanations. However, there is something that we
don't understand: which is exactly the behaviour of the 'Include
citations' in the left side of the search result page? All our records
include full text, we don't have 'citations' as such. So, why do the
result change when the 'Include citations' is checked on and off?

Referring to the second part of your answer: we have found some
examples of records from our repository that have the citation metadata
schema and thus, should be in Google Scholar, but they aren't:

May we have an explanation of why they are not in Scholar?

Thanks again,

Ferran Jorba

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #14

From: Monica Westin <>
To: Ferran Jorba García <>
Cc: Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Fri, 19 Jul 2019 15:11:43 -0700

Answers in-line:

which is exactly the behaviour of the 'Include
citations' in the left side of the search result page? All our records
include full text, we don't have 'citations' as such. So, why do the
result change when the 'Include citations' is checked on and off?

-- "Include citations" is a search option for end users. It includes
records of citations to your publications, not the citations within your
publications.

May we have an explanation of why they are not in Scholar?

-- I will take a look at these. I apologize in advance that this will take
some time. It would be helpful to know when these were added to the
repository.

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #15

From: Ferran Jorba García <>
To: Monica Westin <>
CC: Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Thu, 25 Jul 2019 12:09:18 +0200

Dear Monica,

sorry for the delay, but thanks for your answer.

We understand now the 'Include citations', but certainly it is not
obvious, even for us, librarians working in an academic environment.
We don't have a better phrasing either, though ;-)

Regarding your question about when our records had been added to our
repository, they appear at the end of the record (search for 'Record
created'), for example:

https://ddd.uab.cat/record/206906?ln=en

Hope it helps and again, thanks for your attention,

Ferran Jorba
Universitat Autònoma de Barcelona

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #16

  • Estat ha canviat de Creada a En curs

Ferran, has respost???

De: Monica Westin <>
Enviat: martes, 30 de julio de 2019 0:48
Per a: Ferran Jorba García <>
A/c: Cristina Azorín Millaruelo <>
Tema: Re: What does 'Include citations' mean in Google Scholar?

Dear Ferran,

I started researching this issue today, and I noticed that we have a lot of internal logs showing that individual publications have historically been missing from the sitemap at http://ddd.uab.cat/sitemap-index.xml.

Could you confirm whether the URLs listed in your email are indeed included in the sitemap at http://ddd.uab.cat/sitemap-index.xml?

That will help me narrow down causes for these items not being indexed.

My very best,
Monica

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #17

Gràcies per l'avís, se m'havia passat (els darrers dies abans de vacances sempre són caòtics). L'acabo de contestar ara mateix:

From: Ferran Jorba García <>
To: Monica Westin <>
CC: Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Mon, 2 Sep 2019 10:56:36 +0200

Dear Monica,

I've just come back from holidays, sorry for the delay.

You are right about sitemap records. When we reorganized the colletion
tree I forgot to update it. Instead of 189,907 records, only 189,027
were published (880 were missing).

Anyway, the records we were claiming in our Jul 11 mail now exist in
our sitemap files, and they did not belong to the misconfiguration I
just fixed a while ago (and will update the sitemap files tomorrow).

However, we understand that sitemap file is for the general Google
engine, not for Google Scholar. Is that right?

In other words, after some helpful mails with Darcy Dapra, we
understood that Google Scholar select its records because the html page
includes the citation_XX meta tags (ex: citation_title,
citation_author, citation_author_institution,
citation_abstract_html_url, citation_publication_date, etc). As a
matter of fact, Darcy was kind enough to send as a draft internal
list. I've just checked
https://scholar.google.com/intl/en/scholar/inclusion.html and I haven't
been able to find it. In our repository, we only use those tags for
the records of scholar nature, the ones we belive should be included in
Google Scholar.

So, we may rephrase the question: is (still) that the criterium Google
Scholar follows to include a pdf in its indexes?

Thanks again,

Ferran Jorba

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #18

De: Monica Westin <>
Enviat: lunes, 16 de septiembre de 2019 20:47
Per a: Ferran Jorba García <>
A/c: Cristina Azorín Millaruelo <>
Tema: Re: What does 'Include citations' mean in Google Scholar?

Hi Ferran,

Replies in-line below.

You are right about sitemap records. When we reorganized the colletion
tree I forgot to update it. Instead of 189,907 records, only 189,027
were published (880 were missing).

Anyway, the records we were claiming in our Jul 11 mail now exist in
our sitemap files, and they did not belong to the misconfiguration I
just fixed a while ago (and will update the sitemap files tomorrow).

However, we understand that sitemap file is for the general Google
engine, not for Google Scholar. Is that right?

The sitemap is used by Google Scholar for indexing. Updating the sitemap with URLs for item-level landing pages helps ensure the indexing system can find new items as they are added and index them as quickly as possible.

In other words, after some helpful mails with Darcy Dapra, we
understood that Google Scholar select its records because the html page
includes the citation_XX meta tags (ex: citation_title,
citation_author, citation_author_institution,
citation_abstract_html_url, citation_publication_date, etc). As a
matter of fact, Darcy was kind enough to send as a draft internal
list. I've just checked
https://scholar.google.com/intl/en/scholar/inclusion.html and I haven't
been able to find it. In our repository, we only use those tags for
the records of scholar nature, the ones we belive should be included in
Google Scholar.

So, we may rephrase the question: is (still) that the criterium Google
Scholar follows to include a pdf in its indexes?

Google Scholar does index metadata-only records, but these are not prioritized in terms of the order they appear in Scholar search results. In order to index repository PDFs, it is necessary to have the citation_pdf_url metatag in place for each item, with the URL of the location of this PDF.

Thanks again,

Ferran Jorba

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #19

  • Prioritat ha canviat de Immediata a Alta

Hello Monica,

thank you for your time and your answers. Of the four examples we send you on our original mail, one of them already is in Scholar:

https://ddd.uab.cat/record/208072
https://scholar.google.cat/scholar?q=Us+de+tecnologies+per+a+la+tracabilitat+al+sector+textil

However, this other one appears in Scholar but our pdf is not among the versions cited:

https://ddd.uab.cat/record/206906
https://scholar.google.cat/scholar?q=%22Deep+Sequencing+Reveals+Early+Reprogramming+of+Arabidopsis%22

We have noted, however, that our record didn't have DOI. We have just added it now. Could that be the cause?

Best regards,

Ferran Jorba

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #20

a veure si ens dona una mica de llum sobre scholar: https://demo.openrepository.com/handle/2384/582854

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #21

Resposta de la Monica wenstin, de Google Scholar.

From: Monica Westin <>
To: Ferran Jorba García <>
Cc: Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Tue, 1 Oct 2019 11:19:36 -0700

Dear Ferran,

Thanks for the email.

Short answer, this is a timing issue and will be cleared up on the next
index build which would be later this winter. The Scholar indexing system
has crawled this item and is grouping it appropriately-- it just isn't
appearing in search results yet.

Details below:
Scholar indexing is designed to fit the archival nature of scholarly
publishing - new articles appear frequently and are of very high interest;
older articles are archival - they are usually of steady interest and
change rarely if at all. Accordingly, we scan for newly published articles
daily and add them to the index several times a week. Since newly published
articles often have new versions or may transition from preprint to
formally published or ahead-of-print version to final-version, we recrawl
and reindex recently published articles at a much higher rate than older
articles. This usually handles the version transitions that occur early in
the life of an article. In addition, we recrawl all articles and rebuild
the entire index periodically to deal with all the changes that happen for
older articles. This includes changes in article presentation, platform
transitions, host transitions, grouping updates with new versions (eg if
the article now also appears in an anthology etc). This approach also
optimizes the use of server resources on publisher sites since archival
articles are recrawled less frequently.

Cheers,
Monica

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #22

  • Data de venciment ha canviat de 19-09-2019 a 13-11-2019
  • Assignat a ha canviat de Ferran Jorba a Núria Casaldaliga

Núria, pensem que les configuracions del DDD ja són correctes. Per part nostra no tenim altres consultes a fer, tot i que les respostes no ens hagin aclarit tots els dubtes que teníem. Podem tancar la tasca?

FJ Actualitzat per Ferran Jorba fa més de 6 anys Accions #23

La meva reposta de cortesia a la Monica:

From: Ferran Jorba García <>
To: Monica Westin <>
CC: Cristina Azorín Millaruelo <>
Subject: Re: What does 'Include citations' mean in Google Scholar?
Date: Thu, 3 Oct 2019 09:46:46 +0200

Dear Monica,

again, thanks a lot for your time and patience explaining those details
to us. We hope that with this information we can answer ourselves when
our staff asks why their papers (still) don't appear in Google Scholar.

Best wishes,

Ferran Jorba
Universitat Autònoma de Barcelona

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #24

La grabación del webinar sobre indexación de repositorios DSpace por Google Scholar del pasado 3 de octubre de 2019 ya está disponible en la página de usuarios Dspace en España, https://wiki.duraspace.org/pages/viewpage.action?pageId=142540835.

Les diapositives 23 i 24 raonen que no és efectiva la cerca pel 'site' per saber tot el que tens indexat a Google Scholar. El rànking parteix d'una sistemàtica incorrecta.

Repassem el tema amb el Ferran però veiem que tot el que explica la Monica ja ho apliquem correctament al DDD.

CA Actualitzat per Cristina Azorin fa més de 6 anys Accions #25

Acabo de veure que en l'edició de juliol hem millorat en el rànking ;-) Ara estem a la posició 42, per sobre del CSIC però darrere de UPC (16) i UPV (35). Ho deixo aquí com a recordatori només, Núria, la gent de rànkings del rectorat em van comentar que havíem de dir què volíem que surtis a la pàgina dels rànkings... encara aputen al vell, em penso.

CA Actualitzat per Cristina Azorin fa aproximadament 6 anys Accions #26

  • Estat ha canviat de En curs a Tancada
  • Prioritat ha canviat de Alta a Normal

La Cristina contactarà amb l'Àrea del rectorat per dir que actualitzin les dades amb el nou rànking

CA Actualitzat per Cristina Azorin fa quasi 6 anys Accions #27

  • S'ha afegit relacionat amb Tasca #5734: Arreglar i ampliar les metadades de la pàgina HTML dels registres

CA Actualitzat per Cristina Azorin fa més de 3 anys Accions #28

  • S'ha afegit relacionat amb Tasca #6647: Corregir errors que fan que el DDD no aparegui prou bé a Google Scholar
Accions

També disponible a: PDF Atom