首页 > 开发 > 综合 > 正文

PostgreSQL的磁盘空间占用问题

2024-07-21 02:13:47
字体:
来源:转载
供稿:网友

  我知道通过vacuum可以把一个表的内容整理一下,使磁盘空间的占用变小,但pgsql能不能自动利用表中的空闲空间呢?

  例如我做了这么一个测试:

  在一个表里insert十万条记录,记下它占用的磁盘空间,比如说是1000k,再删除其中的5万条记录,接着再往里插5万条记录,这时这个表的磁盘空间不是接近于先前看到的1000k,而是变成了1500k,怎么会是这样呢?删掉的那5万条记录仍然会一直占着空间,等到我来做vacuum吗?

  netkiller 回复于:2004-07-07 11:15:31

  那我建议你去用oracle,sqlserver,sybase

  也做这样的操作.你看看他们增大不???

  你以为是方本文件啊.删了在加上.大小还一样..

  里有要日志,rollback段,......

  qjlemon 回复于:2004-07-07 18:27:00

  班长兄弟请别发火:)这个问题我是真感到比较困惑,所以来这里请教解决方法的。oracle下我做过试验的,我当然没有指望哪个数据库能在一个固定大小的文件里无限制地放数据,这个是最起码的常识了。

  这个问题我后来自已找到答案了,还是要用vacuum来搞定,与max_fsm_pages这个配置的大小有关,看来这个问题太菜了,所以我也不在这里贴了:em10:

  netkiller 回复于:2004-07-08 09:49:36

  哈哈。我以前用sqlserver就发现在天天长大,长啊长啊。。我都没办法。最后得装:)

  netkiller 回复于:2004-07-08 09:50:28

  你可详细介绍一下

  max_fsm_pages

  你的这次操作经验给大家:)

  qjlemon 回复于:2004-07-08 13:58:12

  响应班长号召:em11: 

  在很久很久以前。。。。。在一篇名为<<tuningpostgresqlforperformance>>的文章,其中有这么一段:

|||

收集最实用的网页特效代码!

  max_fsm_pages:

  postgresqlrecordsfreespaceineachofitsdatapages.thisinformationisusefulforvacuumtofindouthowmanyandwhichpagestolookforwhenitfreesupthespace. 

  ifyouhaveadatabasethatdoeslotsofupdatesanddeletes,thatisgoingtogeneratedeadtuples,duetopostgresql'smvccsystem.thespaceoccupiedbydeadtuplescanbefreedwithvacuum,unlessthereismorewastedspacethaniscoveredbythefreespacemap,inwhichcasethemuchlessconvenient"vacuumfull"isrequired.byexpandingthefsmtocoverallofthosedeadtuples,youmightneveragainneedtorunvacuumfullexceptonholidays. 

  thebestwaytosetmax_fsm_pagesisinteractive;first,figureoutthevacuum(regular)frequencyofyourdatabasebasedonwriteactivity;next,runthedatabaseundernormalproductionload,andrun"vacuumverboseanalyze"insteadofvacuum,savingtheoutputtoafile;finally,calculatethemaximumtotalnumberofpagesreclaimedbetweenvacuumsbasedontheoutput,andusethat. 

  remember,thisisadatabaseclusterwidesetting.sobumpitupenoughtocoveralldatabasesinyourdatabasecluster.also,eachfsmpageuses6bytesoframforadministrativeoverhead,soincreasingfsmsubstantiallyonsystemslowonrammaybecounter-productive. 

  大意是说,postgresql.conf中的这个参数(max_fsm_pages)用于告诉postgresql申请多大的内存空间用于保存数据文件的freespace信息,按我的简单理解,如果在一个表中删除了一些记录,postgresql会把这一改动记录在"freespacemap"中,下次如果再往表里插记录时,根据freespacemap中的信息,就能利用以前删记录而腾出来的磁盘空间。不过freespacemap是存在于内存中,大小毕竟是有限的,对于大量数据的删除+插入,要么指定一个较大的max_fsm_pages,要么及时进行vacuum以整理表中的碎片,否则,postgresql只有把新插入的记录添加到文件的末尾,造成文件越来越大。我的一个程序就是意外地因为磁盘空间满了而中止的,它每次要往一个表里插500多万条记录,这之前先要delete同样条数的一批记录,可最后还是占满了整个硬盘。

|||

  我觉得postgresql的这种工作方式有它的一个好处,就是如果内存足够大,可以指定一个很大的freespacemap,对于oltp型的应用,可能会大幅提高性能(猜测,没有验证过),另外用户可以自已选择在合适的时候进行vacuum或vacuumfull,如果你确信一个表只会往里插记录(如记录操作日志),对这个表就可以永远不进行vacuumfull,是不是很灵活?

  不过,使用vacuumfull大量移动数据毕竟是件很耗时的工作,在此期间数据库性能会严重下降,大概这就是“灵活”的代价了。在这方面,oracle的block->extent->segment这种复杂的机制可能更有效一些吧。据说postgresql将引入表空间的概念了,值得期待啊!

  至于freespacemap设多大,上面的文章教了个办法,照着做就行了,只是需要弄明白,这毕竟是一个“map”,如果打算删掉300m的记录,freespacemap并不需要申请300m喔:d

  qjlemon 回复于:2004-07-08 14:09:24

  tuningpostgresqlforperformance 

  shridhardaithankar,joshberkus 

  july3,2003copyright2003shridhardaithankarandjoshberkus. 

  authorizedforre-distributiononlyunderthepostgresqllicense(seewww.postgresql.org/license). 

  tableofcontents

  1introduction

  2somebasicparameters

  2.1sharedbuffers

  2.2sortmemory

  2.3effectivecachesize

  2.4fsyncandthewalfiles

  3somelessknownparameters

  3.1random_page_cost

  3.2vacuum_mem

  3.3max_fsm_pages

  3.4max fsm_relations

  3.5wal_buffers

  4othertips

  4.1checkyourfilesystem

  4.2trytheautovacuumdaemon

|||

  4.3tryfreebsd

  5theconfsettingguide

  1introduction

  thisisaquickstartguidefortuningpostgresql'ssettingsforperformance.thisassumesminimalfamiliaritywithpostgresqladministration.inparticular,oneshouldknow, 

  howtostartandstopthepostmasterservice 

  howtotuneosparameters 

  howtotestthechanges 

  italsoassumesthatyouhavegonethroughthepostgresqladministrationmanualbeforestarting,andtohavesetupyourpostgresqlserverwithatleastthedefaultconfiguration. 

  therearetwoimportantthingsforanyperformanceoptimization: 

  decidewhatlevelofperformanceyouwant 

  ifyoudon'tknowyourexpectedlevelofperformance,youwillendupchasingacarrotalwayscoupleofmetersaheadofyou.theperformancetuningmeasuresgivediminishingreturnsafteracertainthreshold.ifyoudon'tsetthisthresholdbeforehand,youwillendupspendinglotoftimeforminusculegains. 

  knowyourload 

  thisdocumentfocusesentirelytuningpostgresql.confbestforyourexistingsetup.thisisnottheendofperformancetuning.afterusingthisdocumenttoextractthemaximumreasonableperformancefromyourhardware,youshouldstartoptimizingyourapplicationforefficientdataaccess,whichisbeyondthescopeofthisarticle. 

  pleasealsonotethatthetuningadvicesdescribedherearehints.youshouldnotimplementthemallblindly.tuneoneparameteratatimeandtestitsimpactanddecidewhetherornotyouneedmoretuning.testingandbenchmarkingisanintegralpartofdatabasetuning. 

|||注册会员,创建你的web开发资料库,

  tuningthesoftwaresettingsexploredinthisarticleisonlyaboutone-thirdofdatabaseperformancetuning,butit'sagoodstartsinceyoucanexperimentwithsomebasicsettingchangesinanafternoon,whereassomeotheraspectsoftuningcanbeverytime-consuming.theothertwo-thirdsofdatabaseapplicationtuningare: 

  hardwareselectionandsetup 

  databasesareveryboundtoyoursystem'si/o(disk)accessandmemoryusage.assuch,selectionandconfigurationofdisks,raidarrays,ram,operatingsystem,andcompetitionfortheseresourceswillhaveaprofoundeffectonhowfastyourdatabaseis.wehopetohavealaterarticlecoveringthistopic. 

  efficientapplicationdesign 

  yourapplicationalsoneedstobedesignedtoaccessdataefficiently,thoughcarefulquerywriting,plannedandtestedindexing,goodconnectionmanagement,andavoidingperformancepitfallsparticulartoyourversionofpostgresql.expectanotherguidesomedayhelpingwiththis,butreallyittakesseverallargebooksandyearsofexperiencetogetitright...orjustalotoftimeonthemailinglists. 

  2somebasicparameters

  2.1sharedbuffers

  sharedbuffersdefinesablockofmemorythatpostgresqlwillusetoholdrequeststhatareawaitingattentionfromthekernelbufferandcpu.thedefaultvalueisquitelowforanyrealworldworkloadandneedtobebeefedup.however,unlikedatabaseslikeoracle,moreisnotalwaysbetter.thereisathresholdabovewhichincreasingthisvaluecanhurtperformance. 

  thisistheareaofmemorypostgresqlactuallyusestoperformwork.itshouldbesufficientenoughtohandleloadondatabaseserver.otherwisepostgresqlwillstartpushingdatatofileanditwillhurttheperformanceoverall.hencethisisthemostimportantsettingoneneedstotuneup. 

|||

  thisvalueshouldbesetbasedonthedatasetsizewhichthedatabaseserverissupposedtohandleatpeakloadsandonyouravailableram(keepinmindthatramusedbyotherapplicationsontheserverisnotavailable).werecommendfollowingruleofthumbforthisparameter: 

  startat4mb(512)foraworkstation 

  mediumsizedatasetand256-512mbavailableram:16-32mb(2048-4096) 

  largedatasetandlotsofavailableram(1-4gb):64-256mb(8192-32768) 

  pleasenote.postgresqlcountsalotontheostocachedatafilesandhencedoesnotbotherwithduplicatingitsfilecachingeffort.thesharedbuffersparameterassumesthatosisgoingtocachealotoffilesandhenceitisgenerallyverylowcomparedwithsystemram.evenforadatasetinexcessof20gb,asettingof128mbmaybetoomuch,ifyouhaveonly1gbramandanaggressive-at-cachingoslikelinux. 

  thereisonewaytodecidewhatisbestforyou.setahighvalueofthisparameterandrunthedatabasefortypicalusage.watchusageofsharedmemoryusingipcsorsimilartools.arecommendedfigurewouldbebetween1.2to2timespeaksharedmemoryusage. 

  2.2sortmemory

  thisparametersetsmaximumlimitonmemorythatadatabaseconnectioncanusetoperformsorts.ifyourquerieshaveorder-byorgroup-byclausesthatrequiresortinglargedataset,increasingthisparameterwouldhelp.butbeware:thisparameterispersort,perconnection.thinktwicebeforesettingthisparametertoohighonanydatabasewithmanyusers.arecommendedapproachistosetthisparameterperconnectionasandwhenrequired;thatis,lowformostsimplequeriesandhigherforlarge,complexqueriesanddatadumps. 

|||注册会员,创建你的web开发资料库,

  2.3effectivecachesize

  thisparameterallowspostgresqltomakebestpossibleuseoframavailableonyourserver.ittellspostgresqlthesizeofosdatacache.sothatpostgresqlcandrawdifferentexecutionplanbasedonthatdata. 

  saythereis1.5gbraminyourmachine,sharedbuffersaresetto32mbandeffectivecachesizeissetto800mb.soifaqueryneeds700mbofdataset,postgresqlwouldestimatethatallthedatarequiredshouldbeavailableinmemoryandwouldoptformoreaggressiveplanintermsofoptimization,involvingheavierindexusageandmergejoins.butifeffectivecacheissettoonly200mb,thequeryplannerisliabletooptforthemorei/oefficientsequentialscan. 

  whilesettingthisparametersize,leaveroomforotherapplicationsrunningontheservermachine.theobjectiveistosetthisvalueatthehighestamountoframwhichwillbeavailabletopostgresqlallthetime. 

  2.4fsyncandthewalfiles

  thisparameterssetswhetherornotwritedatatodiskassoonasitiscommitted,whichisdonethroughwriteaheadlogging(wal).ifyoutrustyourhardware,yourpowercompany,andyourbatterypowersupplyenough,yousetthistonoforanimmediateboosttodatawritespeed.butbeveryawarethatanyunexpecteddatabaseshutdownwillforceyoutorestorethedatabasefromyourlastbackup. 

  ifthat'snotanoptionforyou,youcanstillhavetheprotectionofwalandbetterperformance.simplymoveyourwalfiles,usingeitheramountorasymlinktothepg_xlogdirectory,toaseparatediskorarrayfromyourmaindatabasefiles.inhigh-write-activitydatabases,walshouldhaveitsowndiskorarraytoensurecontinuoushigh-speedaccess.verylargeraidarraysandsan/nasdevicesfrequentlyhandlethisforyouthroughtheirinternalmanagementsystems. 

|||

  3somelessknownparameters

  3.1random_page_cost

  thisparametersetsthecosttofetcharandomtuplefromthedatabase,whichinfluencestheplanner'schoiceofindexvs.tablescan.thisissettoahighvalueasthedefaultdefaultbasedontheexpectationofslowdiskaccess.ifyouhavereasonablyfastdiskslikescsiorraid,youcanlowerthecostto2.youneedtoexperimenttofindoutwhatworksbestforyoursetupbyrunningavarietyofqueriesandcomparingexecutiontimes. 

  3.2vacuum_mem

  thisparametersetsthememoryallocatedtovacuum.normally,vacuumisadiskintensiveprocess,butraisingthisparameterwillspeeditupbyallowingpostgresqltocopylargerblocksintomemory.justdon'tsetitsohighittakessignificantmemoryawayfromnormaldatabaseoperation.thingsbetween16-32mbshouldbegoodenoughformostsetups. 

  3.3max_fsm_pages

  postgresqlrecordsfreespaceineachofitsdatapages.thisinformationisusefulforvacuumtofindouthowmanyandwhichpagestolookforwhenitfreesupthespace. 

  ifyouhaveadatabasethatdoeslotsofupdatesanddeletes,thatisgoingtogeneratedeadtuples,duetopostgresql'smvccsystem.thespaceoccupiedbydeadtuplescanbefreedwithvacuum,unlessthereismorewastedspacethaniscoveredbythefreespacemap,inwhichcasethemuchlessconvenient"vacuumfull"isrequired.byexpandingthefsmtocoverallofthosedeadtuples,youmightneveragainneedtorunvacuumfullexceptonholidays. 

  thebestwaytosetmax_fsm_pagesisinteractive;first,figureoutthevacuum(regular)frequencyofyourdatabasebasedonwriteactivity;next,runthedatabaseundernormalproductionload,andrun"vacuumverboseanalyze"insteadofvacuum,savingtheoutputtoafile;finally,calculatethemaximumtotalnumberofpagesreclaimedbetweenvacuumsbasedontheoutput,andusethat. 

|||

商业源码热门下载www.html.org.cn

  remember,thisisadatabaseclusterwidesetting.sobumpitupenoughtocoveralldatabasesinyourdatabasecluster.also,eachfsmpageuses6bytesoframforadministrativeoverhead,soincreasingfsmsubstantiallyonsystemslowonrammaybecounter-productive. 

  3.4max_fsm_relations

  thissettingdictateshowmanynumberofrelations(tables)willbetrackedinfreespacemap.againthisisadatabasecluster-widesetting,sosetitaccordingly.inversion7.3.3andlater,thisparametershouldbesetcorrectlyasadefault.inolderversions,bumpitupto300-1000. 

  3.5wal_buffers

  thissettingdecidesthenumberofbufferswal(writeaheadlog)canhave.ifyourdatabasehasmanywritetransactions,settingthisvaluebithigherthandefaultcouldresultbetterusageofdiskspace.experimentanddecide.agoodstartwouldbearound32-64correspondingto256-512kmemory. 

  4othertips

  4.1checkyourfilesystem

  onoslikelinux,whichoffersmultiplefilesystems,oneshouldbecarefulaboutchoosingtherightonefromaperformancepointofview.thereisnoagreementbetweenpostgresqlusersaboutwhichoneisbest. 

  contrarytopopularbelief,today'sjournalingfilesystemsarenotnecessarilyslowercomparedtonon-journalingones.ext2canbefasteronsomesetupsbuttherecoveryissuesgenerallymakeitsuseprohibitive.differentpeoplehavereportedwidelydifferentexperienceswiththespeedofext3,reiserfs,andxfs;quitepossiblythiskindofbenchmarkdependsonacombinationoffilesystem,disk/arrayconfiguration,osversion,anddatabasetablesizeanddistribution.assuch,youmaybebetteroffstickingwiththefilesystembestsupportedbyyourdistribution,suchasreiserfsforsuselinuxorext3forredhatlinux,nottoforgetxfsknownforit'slargefilesupport.ofcourse,ifyouhavetimetoruncomprehensivebenchmarks,wewouldbeinterestedinseeingtheresults! 

|||

  asaneasyperformanceboostwithnodownside,makesurethefilesystemonwhichyourdatabaseiskeptismounted"noatime",whichturnsofftheaccesstimebookkeeping. 

  4.2trytheautovacuumdaemon

  thereisalittleknownmoduleinpostgresqlcontribdirectorycalledaspgavd.itworksinconjunctionwithstatisticscollector.itperiodicallyconnectstoadatabaseandchecksifithasdoneenoughoperationssincethelastcheck.ifyes,itwillvacuumthedatabase. 

  essentiallyitwillvacuumthedatabasewhenitneedsit.itwouldgetridofplayingwithcronsettingsforvacuumfrequency.itshouldresultinbetterdatabaseperformancebyeliminatingoverduevacuumissues. 

  4.3tryfreebsd

  largeupdates,deletes,andvacuuminpostgresqlareverydiskintensiveprocesses.inparticular,sincevacuumgobblesupiobandwidth,therestofthedatabaseactivitiescouldbeaffectedadverselywhenvacuumingverylargetables. 

  os'sfromthebsdfamily,suchasfreebsd,dynamicallyaltertheiopriorityofaprocess.soifyoulowerthepriorityofavacuumprocess,itshouldnotchewasmuchbandwidthandwillbetterallowthedatabasetoperformnormally.ofcoursethismeansthatvacuumcouldtakelonger,whichwouldbeproblematicfora"vacuumfull." 

  ifyouarenotdonewithyourchoiceofosforyourserverplatform,considerbsdforthisreason. 

  5theconfsettingguide

  availablehereisanannotatedguidetothepostgresqlconfigurationfilesettings,inbothopenoffice.organdpdfformat.thisguideexpandsontheofficialdocumentationandmayeventuallybeincorporatedintoit. 

发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表