# $Id$ Issues - folder generation? - addition der item-generations geht nicht, weil auch records geloescht werden koennen - separate tabelle fuer "proposed" appointments? - damit wir die Apts nicht schicken muessen und die erst im scheduler angezeigt werden, wenn sie aktuell sind Move Databases ============== HOST="localhost" USER="agenor" DB="blah2" DB="agenor%i" % ( i, ) DB="agenortabledb" NEWTABLE="agenor_tab_%i" % ( i, ) DB="agenor_fldinfodb" DB="agenor_testhugeperf" Schemas ======= CREATE TABLE SOGo_folder_info ( c_foldername VARCHAR(255) NOT NULL, c_tablename VARCHAR(255) NOT NULL, c_dbname VARCHAR(255) NOT NULL ); INSERT INTO SOGo_folder_info (c_foldername,c_tablename,c_dbname) VALUES ('hh calendar', 'agenor_tab_105', 'agenortabledb'); CREATE UNIQUE INDEX SOGo_folder_info_idx ON SOGo_folder_info USING BTREE(c_foldername); (ca 5s) --- CREATE TABLE SOGo_huge_ical ( c_pkey INT PRIMARY KEY, c_data VARCHAR(32000) NOT NULL ); CREATE TABLE SOGo_huge_quick ( c_pkey INT PRIMARY KEY, c_sourceid VARCHAR(255) NOT NULL, c_startdate INT NOT NULL, c_enddate INT NOT NULL, c_title VARCHAR(255) NOT NULL, c_attendees VARCHAR(4096) NOT NULL, c_isallday INT NOT NULL, c_sequenceid INT NOT NULL, c_generation INT NOT NULL ); CREATE INDEX SOGo_huge_quick_idx ON SOGo_huge_quick USING BTREE(c_startdate); (ca 15s on 1.000.000) --- Performance =========== agenor_fldinfodb: 10000 Folder Info Entries, kein Index: 71s 100000 Folder Info Entries: 12:09m, 729s, 137 inserts/s 5992424=>6001088, diff 8664KB data size (von ca 16000 auf 110000) ~94 byte per row (raw: ~12+14+13=39 byte) 110001-250000 Folder Info Entries: 15:59m, 959s, 145 inserts/s 6001088-6014316, diff 13228KB data size (~96 byte per row) ohne Index, via Python: COUNT(*) => 360ms c_tablename, c_dbname | * => 1128ms c_tablename, c_dbname | c_foldername='privcal_99827' => 345ms mit Index, via Python COUNT(*) => 350ms c_tablename, c_dbname | * => 1124ms c_tablename, c_dbname | c_foldername='privcal_99827' => 18,5,5ms agenor_testhugeperf: 1.000.000 entries, kein Index: 10000=79.37s, 20000=162s, 30000=245s,340000=2831s,790000=6670s (~120 rows per sec) = > ca 2h fuer 1.000.000, 20h fuer 10.000.000 30000=65MB => ~2KB per Record => ~2GB fuer 1.000.000 220000=440MB,810000=1.55GB,1.000.000= ~1.92GB x 20 = 40GB duration:~2:50h ohne Index, via Python: COUNT(*) => 20.8s pkey,sourceid,start,end | c_start>x&c_end 13.4s c_sourceid | * => 10.5s mit Index, via Python COUNT(*) => 9.7s,3s,2.5s pkey,sourceid,start,end | c_start>x&c_end 86ms,59ms,17ms,17ms c_sourceid | * => 9.3s,4.8s,4.8s vacuum analyze: 30s? ohne fsync, 2 gleichzeitig - 1108 rows pro sec in einer connection! (1.000.000=ca 15min) 53:59m 53:52m ~8 gleichzeitig ~20:00:- http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html http://www.lyris.com/lm_help/7.8/tuning_postgresql.html http://www.linuxjournal.com/article.php?sid=4791 http://www.argudo.org/postgresql/soft-tuning.php#s2 PostgreSQL Index ================ http://www.postgresql.org/docs/current/static/sql-createindex.html http://postgis.refractions.net/docs/x511.html "After building an index, it is important to force PostgreSQL to collect table statistics, which are used to optimize query plans: VACUUM ANALIZE;" PostgreSQL provides the index methods B-tree, R-tree, hash, and GiST - only the B-tree and GiST index methods support multicolumn indexes ---snip--- An index field can be an expression computed from the values of one or more columns of the table row. This feature can be used to obtain fast access to data based on some transformation of the basic data. For example, an index computed on upper(col) would allow the clause WHERE upper(col) = 'JIM' to use an index. ---snap--- ---snip--- Note: Because of the limited utility of hash indexes, a B-tree index should generally be preferred over a hash index. We do not have sufficient evidence that hash indexes are actually faster than B-trees even for = comparisons. Moreover, hash indexes require coarser locks; see Section 9.7. ---snap-- PostgreSQL Arrays ================= ---snip--- It all depends, I have found array operations to be slow. So if you have just a few elements, like less than 10 then arrays are ok, but with more e.g. 30+ elements, as in my case, imho the whole exercise is not really feasable. I am going to re-design the schema to get rid of the arrays as soon as I have a moment. Also the code around that part of PostgreSQL has not been visited for a fair while and needs some polishing up. I'd avoid them. ---snap---