Projecte

General

Perfil

Tasca #1246 » mysqlconfig-16gb_a2011m2d28.txt

Exemple de configuració en un servidor amb 16 GB de RAM - Ferran Jorba, 08-09-2011 17:37

 
From: Cornelia Plott <c.plott@fz-juelich.de>
Subject: Re: runtime experience for bibrank citation calculation?
To: Tibor Simko <tibor.simko@cern.ch>
Cc: "project-cdsware-users@cern.ch" <project-cdsware-users@cern.ch>,
"Haustein, Stefanie" <s.haustein@fz-juelich.de>,
"Tunger, Dirk" <d.tunger@fz-juelich.de>,
"Holzke, Christoph" <c.holzke@fz-juelich.de>
Date: Mon, 28 Feb 2011 10:44:54 +0100
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de;
rv:1.9.2.13)Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7

Hi Tibor,

Thanks for your answer and your hints.

>> We have loaded and indexed about 1,8 Mio records into our only local
>> open invenio instance. Normaly the records have a large reference
>> block (like below).
> Do you know how many citer-citee pairs do your records generate? How
> many references do you have in total for these 1.8M records? Do
> references usually refer to other existing records in your system, or do
> they refer to outside records that you do not store?

In total we have in for this 1.8 Mio records 41 .6 Mio references. Yes,
there exist references, which point to outside records that we do not
store. We dosn't now how many citer-citee pairs our records will generate.

>> 2011-02-22 03:03:39 --> d_report_numbers done 0 of 15000
>> 2011-02-23 10:14:24 --> d_report_numbers done fully
> Citation ranking method works with big citation dictionaries that are
> usually held in memory. Do you have enough RAM on your box to hold
> them, or did your box start to swap perhaps? Have you tuned your MySQL
> DB settings and do you have large enough max_allowed_packet and friends
> in your /etc/my.cnf?

This invenio instance not runs on a virtual machine and have really 16
GB RAM.

MemTotal: 16627700 kB
MemFree: 5153924 kB
Buffers: 327200 kB
Cached: 9792016 kB
SwapCached: 0 kB
Active: 2613668 kB
Inactive: 8401064 kB
HighTotal: 15854912 kB
HighFree: 5144616 kB
LowTotal: 772788 kB
LowFree: 9308 kB
SwapTotal: 5144568 kB
SwapFree: 5144476 kB

We had also tuned our MySql DB settings like this:
[mysqld]
...
#key_buffer = 384M
key_buffer = 2G
#key_buffer_size = 2M
key_buffer_size = 512M
max_allowed_packet = 16M
table_cache = 512
#sort_buffer_size = 2M
sort_buffer_size = 16M
#read_buffer_size = 2M
read_buffer_size = 64M
#read_rnd_buffer_size = 8M
read_rnd_buffer_size = 128M
#myisam_sort_buffer_size = 64M
myisam_sort_buffer_size = 256M
thread_cache_size = 8
query_cache_size = 32M
...

We change the settings like an recommendation from Baron Schwarz "High
performance MySQL: optimization, backups, replication and more". We
don't changed the max_allowed_packet. What would be a good size?


> Moreover, it would be helpful if you could also run bibrank for say ~100
> sample records via Python profiler so that we'd know where the inside
> bottlenecks are. Here is an example of how to submit such a profiled
> bibrank task:
>
Here our result from the profiled bibrank task:

./bibrank -u admin -w citation -a -i 1-100 --profile=t

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.029 12.029 bibtask.py:755(_task_run)
1 0.000 0.000 12.025 12.025 bibrank.py:128(task_run_core)
1 0.000 0.000 12.025 12.025 bibrank_tag_based_indexer.py:482(citation)
1 0.043 0.043 12.025 12.025 bibrank_tag_based_indexer.py:329(bibrank_engine)
1 0.016 0.016 11.737 11.737 bibrank_tag_based_indexer.py:86(citation_exec)
1 0.001 0.001 11.656 11.656 bibrank_citation_indexer.py:60(get_citation_weight)
1 0.118 0.118 11.310 11.310 bibrank_citation_indexer.py:570(ref_analyzer)
17303 0.330 0.000 9.560 0.001 dbquery.py:121(run_sql)
2141 0.021 0.000 9.300 0.004 search_engine.py:1988(search_unit)
17303 0.545 0.000 8.506 0.000 cursors.py:127(execute)
1360 0.059 0.000 8.293 0.006 search_engine.py:2032(search_unit_in_bibwords)
17303 0.099 0.000 7.480 0.000 cursors.py:308(_query)
17303 6.400 0.000 7.045 0.000 cursors.py:270(_do_query)
2725 0.011 0.000 6.139 0.002 data_cacher.py:71(recreate_cache_if_needed)
2720 0.012 0.000 6.130 0.002 search_engine.py:320(get_index_stemming_language)
2729 0.056 0.000 6.117 0.002 dbquery.py:256(get_table_update_time)
2720 0.011 0.000 6.108 0.002 search_engine.py:310(timestamp_verifier)
6193 0.083 0.000 2.499 0.000 search_engine.py:536(get_index_id_from_field)
8 0.001 0.000 1.186 0.148 bibrank_citation_indexer.py:947(insert_into_cit_db)
892 0.005 0.000 1.044 0.001 bibrank_citation_indexer.py:47(__call__)
781 0.015 0.000 1.039 0.001 bibrank_citation_indexer.py:54(get_recids_matching_query)
782 0.023 0.000 1.023 0.001 search_engine.py:1726(search_pattern)
9 0.936 0.104 0.936 0.104 dbquery.py:315(serialize_via_marshal)
666 0.025 0.000 0.725 0.001 search_engine.py:2091(search_unit_in_idxphrases)
17287 0.353 0.000 0.608 0.000 cursors.py:105(_do_get_result)
2113 0.028 0.000 0.575 0.000 bibrank_citation_indexer.py:997(insert_into_missing)
17303 0.074 0.000 0.481 0.000 cursors.py:55(__del__)
17303 0.086 0.000 0.408 0.000 cursors.py:60(close)
1 0.000 0.000 0.398 0.398 bibrank_citation_indexer.py:921(insert_cit_ref_list_intodb)
...

Have you already some optimisation hints or need you more informations
about our system?

Thanks & Kind Regards
Cornelia

Cornelia Plott
Zentralbibliothek
Forschungszentrum Jülich
D-52425 Jülich
GERMANY

Tel: ++49-2461-616206
Email: c.plott@fz-juelich.de
Web: http://www.fz-juelich.de/zb



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

(2-2/3)