GuardedSoftwareUpgrading
AnnT.TaiIATech,Inc.
LosAngeles,CA90024,USA
WilliamH.SandersUniversityofIllinoisUrbana,IL61801,USA
1Introduction
InhissurveypaperpresentedattheFirstInternationalWorkshoponPerformabilityModel-ingofComputerandCommunicationSystemsandsubsequentlypublishedinPerformanceEvaluation[1],Prof.JohnF.Meyerremarkedthat“itisimportanttonotethattheobjectofaperformabilityevaluationstudycantaketheformofa‘process’aswellasa‘prod-uct,’e.g.,asoftwaredevelopmentprocess,anautomobileassemblyprocess,etc.Indeed,interestingprospectsforfutureconsiderationareobjectsystemswhicharecombinationsofboth.”Meyerfurtherpointedoutthataperformabilityevaluationofa“product-in-process”objectsystemwouldenableustoassesstheinfluenceofprocessqualityonprod-uctqualityand/ortheoverallsystemservicequality.Theintentofthisextendedabstractistoaddressandfosterthe“product-in-process”performabilityevaluationbypresentinganexampleapplication.
Inparticular,weconducta“product-in-process”performabilitystudyforamethodol-ogycalledguardedsoftwareupgrading(GSU)[2].Theobjectiveofthemethodologyistoguardanevolvable,distributedembeddedsystemforlong-lifedeep-spacemissionsagainsttheadverseeffectsofdesignfaultsintroducedbyanonboardsoftwareupgrade.TheGSUmethodologyissupportedbyamessage-drivenconfidence-driven(MDCD)protocolthatenableseffectiveandefficientuseofcheckpointingandacceptancetest(AT)techniquesforerrorcontainmentandrecovery.Morespecifically,theMDCDprotocolisresponsibleforensuringthatthesystemfunctionsproperlyafterasoftwarecomponentisreplacedbyanupdatedversion,whileallowingtheupdatedcomponenttointeractfreelywithothercom-ponentsinthesystem.TheperiodduringwhichthesystemisundertheprotectionoftheMDCDprotocoliscalled“guardedoperation.”Our(separate)model-basedstudieshaveshownthattheMDCDprotocolsignificantlyimprovessystemreliabilityduringanonboardsoftwareupgradeandeffectivelyreducesperformancecost[3,2].Nonetheless,whenwewanttoidentifytheoptimaldurationofguardedoperationwithrespecttominimizingtheexpectedtotalperformancedegradation(includingthatduetobothdesign-fault-causedfailureandtheperformanceoverheadoffaulttolerance),theeffectsofdependabilitygainandperformancecostmustbeconsideredjointly.Accordingly,theeffortpresentedinthisextendedabstractattemptstoachievetheabovepurposeviaconstructingandsolvingaproduct-in-processperformabilitymodel.Moreprecisely,thenewlyupgradedcomponentinwhichwehavenotyetestablishedenoughconfidenceisviewedasthe“product,”whiletheguardedoperationenabledbytheMDCDprotocolisregardedasthe“process”thatintendstosafeguardtheinitialonboarduseoftheproduct.Thebasemodelisconstructedusingstochasticactivitynetworks(SANs)andsolvedbyUltraSAN[4].
2PerformabilityVariable
WeaimtodefineaperformabilityvariablethatwillhelpustodeterminehowlongweshouldapplytheMDCDprotocolafteranupgradedsoftwarecomponentstartsitson-boardexecution.Inotherwords,thisinterval,whichwecallthe“durationoftheguardedoperation(G-OP)mode,”willbedeterminedbasedonthevalueoftheperformabilityvari-ablethatreflectsthereductionoftheperformancedegradation.WeletthetimebetweenonboardsoftwareupgradesandthedurationoftheG-OPmodebedenotedbyθandφ,respectively,asshowninFigure1.
Time between upgrades(θ)G-OP mode(ϕ)Normal mode(θ−ϕ)Figure1:DurationoftheGuarded-OperationMode
AsmentionedinSection1,weconsidertwotypesofperformancedegradation,namely,1)theperformancedegradationcausedbytheperformanceoverheadofcheckpointestab-lishmentandAT-basedvalidation,and2)theperformancedegradationduetodesign-fault-causedfailure.
Clearly,agreatervalueofφimpliesdecreasedexpectedperformancedegradationduetopotentialsystemfailurecausedbyresidualdesignfaultsintheupgradedsoftwarecom-ponentandincreasedperformancedegradationduetotheoverheadofcheckpointingandAT.IfweletDφdenotetheamountoftotalperformancedegradationwhenthedurationoftheG-OPmodeisφ,thenD0referstothetotalperformancedegradationfortheboundarycaseinwhichtheG-OPmodeiscompletelyabsent(havingazeroduration).Ontheotherextreme,iftheperformanceoverheadofcheckpointingandAT-basedvalidationarenegli-gible,wemayletthesystembeundertheG-OPmodeuntilthenextupgrade.Weviewthisextremecaseasthe“ideal”caseinthesensethatcheckpointingandAT-basedvalida-tionareappliedthroughoutθtoreducefailure-causedperformancedegradationwithout
I
introducingoverhead-causedperformancedegradation.Accordingly,weletDθdenotethetotalperformancedegradationforthisextremecase.
Itfollowsthatavalueofφthatmakestheexpectedtotalperformancedegradation
I
beclosertotheexpectedvalueofDθcanberegardedasabetterchoice.Sinceourgoalistominimizethetotalperformancedegradationviachoosinganoptimalφ,welettheperformabilityvariabletaketheformofaperformabilityindexY,whichevaluateshoweffectivelyaG-OPdurationφreducesthetotalperformancedegradation,relativetothecaseinwhichtheG-OPmodeiscompletelyabsent.Moresuccinctly,Yistheratioofthe
II
differencebetweenE[D0]andE[Dθ]tothatbetweenE[Dφ]andE[Dθ]:
I
E[D0]−E[Dθ]Y=IE[Dφ]−E[Dθ]
(1)
I
BasedonthedefinitionsofD0,Dφ,andDθandtheabovediscussion,wecananticipateperformabilitybenefitfromaguardedoperationthatischaracterizedbyadurationφ
II
when(E[Dφ]−E[Dθ])islessthan(E[D0]−E[Dθ]).Moreprecisely,Y>1impliesthattheapplicationofguardedoperationwillyieldperformabilitybenefitwithrespecttothereductionoftotalperformancedegradation.Ontheotherhand,Y≤1suggeststhatguardedoperationwillnotbeeffectivefortotalperformancedegradationreduction.
ThedevelopmentoftheMDCDprotocolassumesthattheunderlyingembeddedsystemconsistsofthreecomputingnodesandtwofunctionallydifferentinteractingapplicationsoftwarecomponents,oneofwhichhastwoversions,namely,abetter-performanceless-reliableversion(upgradedversion)runningintheforeground,andapoorer-performancemore-reliableversion(earlierversion)runninginthebackgroundtoenableerrorrecovery.Along-lifemissiontypicallyconsistsofcriticalandnon-criticalphases.Tominimizetherisk,onboardsoftwareupgradesaresupposedtotakeplaceduringthenon-criticalmissionphases,duringwhichthespacecraftdoesnotrequirefullcomputationpower.Therefore,weareabletomakeuseofaprocessorthatotherwisewouldbeidle,toallowconcurrentexecutionofthenewandoldversionsoftheapplicationsoftwarecomponentthatisun-dergoinganupgrade.Theuseofnon-dedicatedresourceredundancyishighlydesirableforafault-tolerantavionicssystem.Nonetheless,itcouldstillcauseperformancedegradationinthesensethatdecreasingaprocessor’sidletimemayresultinareductionofitslifetimeindeep-spaceenvironments.Accordingly,thecostofusinganotherwiseidleprocessorforfaulttoleranceisaccountedforinthederivationofthesolutionforY.
Wefirstletρdenotethesteady-statefractionoftimethataprocesswillmake“forwardprogress”(i.e.,performitsapplicationtasksratherthancheckpointingandAT).Then,whenthedurationoftheG-OPmodeisφandnofailureoccursduringθ,theperformancedegradationthatisduetooverheadcanbeexpressedas((1−ρ)2φ+φ),wherethesecondtermrepresentstheperformancecostofusingtheotherwiseidleprocessor.Ontheotherhand,ifafailureoccursduringθ,thentheperformancedegradationwillbethepenaltyfromthefailureandwillbeaccountedas3θ,sinceweconsiderthecomputationcapacityofeachofthethreeprocessorstobewastedthroughoutthedurationθ.Accordingly,theexpectedvalueofDφcanbeformulatedas
MDCDbaseMDCDbase
E[Dφ]=RφRθ−φ((1−ρ)2φ+φ)+(1−RφRθ−φ)3θ
(2)
Thefirsttermevaluatestheperformancedegradationforthefailure-freecaseandreflects
thecostsofthetimesduringwhichtheactiveprocessesarenotmakingforwardprogressandtheidleprocessorisexploitedtohosttheshadowprocess.Thesecondtermevaluatestheperformancedegradationforthefailurecaseandrepresentsthepenaltyfromwastingallthreeprocessorsthroughoutthedurationθ.
FortheextremecaseinwhichtheG-OPmodeiscompletelyabsent,φwillbeequalto
MDCD
zero.AndsinceR0isequaltounity,thesolutionfortheexpectedvalueofE[D0]canbeobtainedfromEquation(2)inastraightforwardmanner:
base
)3θE[D0]=(1−Rθ
(3)
FortheotherextremecaseinwhichtheG-OPmodespansthewholedurationofθand
weassumethatcheckpointingandATtakeanegligibleamountoftime,ρwillbeequalto
baseI
one.Furthermore,sinceR0isequaltounity,againwecanobtainthesolutionofE[Dθ]fromEquation(2):
IMDCDMDCDE[Dθ]=Rθθ+(1−Rθ)3θ
(4)
Thus,theperformabilityindexcanbeevaluatedbysolvingEquations(2),(3),and(4).
MDCDbase
,RtandρareprovidedbytheSANsubmodels,whicharedescribedThevaluesofRt
inthefollowingsection.
3SANSubmodels
AlthoughSANs’richsyntaxandmarking-dependentspecificationcapabilityallowustospecifyeveryaspectoftheprotocolprecisely,theresultingstatespacemaybecomeun-manageableifweattempttomaketheSANmodelaproceduralspecificationoftheMDCDprotocol.Toavoidthisdifficulty,ourapproachistominimizeexplicitrepresentationofthealgorithmicdetails,whileensuringthateveryaspectoftheirimpactontheparticularmea-MDCD
sureweseektosolveiscaptured.Forexample,intheSANmodelforsolvingRt,weavoidmodelingdetailsaboutcheckpointestablishmentandrollbackrecovery.Rather,byexploitingtherelationsamongthemarkingsoftheplacesthatrepresentwhetheraprocessisactuallyerror-contaminatedandtheprocess’sknowledgeaboutitsstatecontamination,weareabletocharacterizethesystem’sfailurebehaviorpreciselywithrespecttowhethermessagessentbypotentiallycontaminatedprocesseswillcausesystemfailure.
Likewise,intheSANsubmodelforsolvingρ,theperformanceoverheadduringfailure-freeoperation,weomitthosefailure-behavior-relatedaspects,suchasfaultmanifestation,undetectederror,anddormanterrorconditionsthatremaininaprocessstateaftererrorrecovery.Instead,wefocusonrepresentingthoseconditionsthatwouldrequireaprocesstotakeactionsthatdonotbelongtothecategoryof“forwardprogress.”
MDCD
Duetothenatureofthemeasure,theSANsubmodelforsolvingRtemphasizestheeffectsonsystemreliabilityoftheinteractionsbetweenthenon-idealenvironmentcon-ditionsandthebehavioroftheMDCDprotocol.Accordingly,inthemodelconstruction,werelaxedthedesignassumptionsforanidealexecutionenvironment.Incontrast,thepurposeoftheSANsubmodelforsolvingρistoevaluatetheperformanceoverheadresult-ingfromprocesses’errorcontainmentactivities.Sincethosefaulttolerancemechanismsaredirectlyinfluencedbythedesignassumptions,theidealenvironmentassumptionsarepreservedinthisSANsubmodel.
Insummary,inordertopreventastatespacefrombecomingunnecessarilylarge,wetakea“measure-adaptive”approachinsubmodelconstruction.
4Discussion
Usingtheperformabilityindex,wedeterminethattheoptimaldurationoftheG-OPmode.ThenumericalresultsofourfirststudyaredisplayedinFigure2(a)(thecurvewithsoliddots).TheoptimalG-OPmodedurationisabout3000hours,whichyieldsthebestreductionofexpectedperformancedegradation.Thisimpliesthatforthisparticularsetting,aφsmallerthan3000wouldleadtoagreaterexpectedperformancedegradationduetoincreasedriskofpotentialdesign-fault-causedfailure.Ontheotherhand,ifweletφbelarger,thentheincreasedperformancedegradationduetoperformanceoverheadforfaulttolerancewouldmorethannegatethebenefitfromextendedguardedoperation.Inthenextstudy,wedecreasethecheckpoint-establishmentcompletionrateandATcompletionrate,implyingthattheperformancecostsforcheckpointestablishmentandAT-basedvalidationbecomehigher.TheevaluationresultsareshownbythecurvewithhollowdotsinFigure2(a).Forthisexamplecase,theoptimaldurationoftheG-OPmodeis2000,asrevealedbythecurve.Thisisbecausetheincreasedperformanceoverheadtendstofurthernegatereliabilitybenefits,andthusresultsinanearliercutofflinefortheguardedoperation.
Figure2(b)illustratestheimpactofreliabilityoftheupgradedsoftwarecomponentontheoptimalityofφ.Specifically,weincreasethefaultmanifestationrateoftheupgraded
2ρ = 0.95Performability Index (Y)Performability Index (Y)1.81.61.41.210.8ρ = 0.923.5ρ = 0.95ρ = 0.9032.521.50500100015002000250030003500400045005000Guarded Operation Duration (φ)
10500100015002000250030003500400045005000Guarded Operation Duration (φ)
(a)(b)
Figure2:OptimalDurationofGuardedOperation
softwarecomponent(i.e.,µnew).HereweseethatthehigherµnewfavorsalongerdurationoftheG-OPmode.Forexample,whenρisequalto0.90,theoptimalφsuggestedbythevalueoftheperformabilityindexis4000;whileagreaterρ(i.e.,0.95)requirestheG-OPmodetospantheentiredurationofθforthemaximumreductionoftotalperformancedegradation.
Asmentionedintheopeningsection,thepurposeofthisstudyhasbeentoinvesti-gatethefeasibilityof“product-in-process”performabilitymodelingforapplicationssuchasguardedsoftwareupgrading,andtoillustratethetypeofresultsthatcanbeobtained.Theanalyticresultspresentedheresuggestthatthe“product-in-process”performabilityevaluationindeedprovidesameansofassessingproduct-processinteractionfordegrad-ablesystems,andwouldbeusefulwithregardtodecision-makingforvariousengineeringprocesses,suchassoftwareupgradeandsystemmaintenance,assurmisedbyProf.Meyer.Indeed,wehavejustbeguntoinvestigatetheanalyticmethodsfor“product-in-process”performabilityevaluation.Weintendtocontinueourstudyinseveralrespects,includingthedeterminationofcriteriafordefiningaperformabilityvariable,methodsforanalyz-ingtheinteractionsbetweenaproductandaprocess,andtechniquesforidentifyingtheoptimalvalue(s)oftheparameter(s)oftheprocess.
References
[1]J.F.Meyer,“Performability:Aretrospectiveandsomepointerstothefuture,”Per-formanceEvaluation,vol.14,pp.139–156,Feb.1992.[2]A.T.Tai,K.S.Tso,L.Alkalai,S.N.Chau,andW.H.Sanders,“Low-costerror
containmentandrecoveryforonboardguardedsoftwareupgradingandbeyond,”IEEETrans.Computers,(Toappear).[3]A.T.Tai,K.S.Tso,L.Alkalai,S.N.Chau,andW.H.Sanders,“Ontheeffective-nessofamessage-drivenconfidence-drivenprotocolforguardedsoftwareupgrading,”PerformanceEvaluation,vol.44,pp.211–236,Apr.2001.[4]W.H.Sanders,W.D.ObalII,M.A.Qureshi,andF.K.Widjanarko,“TheUltraSAN
modelingenvironment,”PerformanceEvaluation,vol.24,no.1,pp.89–115,1995.
因篇幅问题不能全部显示,请点此查看更多更全内容