Earlier  
Posted Nick Remark
#openstack-nova - 2019-06-26
17:04:45 mriedem which resize can do
17:04:52 sean-k-mooney the instace should become active on the source host but it might not fix the allocation in placment properly
17:05:24 mriedem auto-reverting a failed resize could be all sorts of f'ed up
17:05:29 mriedem because rollbacks are near impossible
17:05:40 mriedem hard to test
17:05:58 mriedem i'm fairly certain our live migration rollback code is also quite janky in several ways
17:06:03 mriedem because we don't test it in the gate
17:08:07 sean-k-mooney just looking at that bug the live migration failed right?
17:09:07 sean-k-mooney so we would exepct there to be an error in the instance action log?
17:10:06 mriedem no
17:10:10 mriedem read my comments on the bug
17:10:29 sean-k-mooney maybe im missreading it as its kind of hard to read the initilal bug
17:10:31 mriedem a pre-check on one of the candidate dest hosts failed
17:10:44 mriedem which triggers a reschedule to another dest host in the conductor live migration task
17:10:49 mriedem the 2nd host works
17:11:17 mriedem but b/c the pre-check failed on the first dest host the instance action event for that one is error which sets the overall action message to 'Error'
17:11:36 sean-k-mooney ah ok
17:11:46 mriedem iow, actions aren't reschedule aware
17:11:56 mriedem or what is a non-fatal error
17:12:05 sean-k-mooney right
17:12:27 sean-k-mooney should we be loging the prechecks as events?
17:12:59 sean-k-mooney i was not exepcting to see compute_check_can_live_migrate_destination events
17:13:25 mriedem hard to say
17:13:33 mriedem if you configure nova to not have alternate hosts for reschedules
17:13:46 mriedem then you'd likely want to know it's that dest pre-check that failed right?
17:14:18 sean-k-mooney maybe or jsut that you had no valid hosts?
17:14:29 sean-k-mooney / exausted teh list of alternates
17:14:57 sean-k-mooney i thikn we would still log the failure right
17:15:02 openstackgerrit Merged openstack/nova master: Remove orphaned comment from _get_group_details https://review.opendev.org/667135
17:15:07 mriedem sure, if you set https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.migrate_max_retries to 0 for now retries, or you don't have any alternate hosts
17:15:44 mriedem idk, anyway, it's tangential to the auto-revert failed resize thing mel was asking about
17:15:45 sean-k-mooney for me this feels like we are leaking an implemenation detail as an event
17:16:04 sean-k-mooney ya it is
17:16:11 mriedem instance action events are basically entirely leaked implementation details :)
17:16:17 mriedem the event names come from the methods they decorate
17:16:27 mriedem there is no guarantee on api stability for those things
17:17:17 sean-k-mooney ok personally i would prefer not to decorate that function
17:17:45 sean-k-mooney but as you said its tangental to melwitt's topic
17:36:44 openstackgerrit Lee Yarwood proposed openstack/nova master: libvirt: Add a rbd_connect_timeout configurable https://review.opendev.org/667421
17:36:57 openstackgerrit Eric Fried proposed openstack/nova-specs master: grammar fix for show-server-numa-topology spec https://review.opendev.org/667487
17:38:12 Nick_A any idea why metadata would send all /24 routes in a region to each instance? http://paste.openstack.org/show/y0lE42EA59yhnu7G1KnY/
17:38:25 openstackgerrit Matt Riedemann proposed openstack/nova master: Fix AttributeError in RT._update_usage_from_migration https://review.opendev.org/667687
17:38:26 openstackgerrit Matt Riedemann proposed openstack/nova master: Fix RT init arg order in test_unsupported_move_type https://review.opendev.org/667688
17:48:03 openstackgerrit Ghanshyam Mann proposed openstack/nova master: Multiple API cleanup changes https://review.opendev.org/666889
17:59:52 sean-k-mooney yonglihe: efried just reviewing https://review.opendev.org/#/c/648912/14 but why are we looking up instance by name?
18:01:58 efried sean-k-mooney: I haven't the foggiest. I'm involved here in an administrative capacity :)
18:02:43 sean-k-mooney the domain xml has the instance uuid stored in the uuid field for several release now so im wondering why we would use the instance domain name instead
18:03:43 melwitt sean-k-mooney: been meaning to get to that review. after what we briefly discussed at the ptg, that change would be best to piggyback onto the existing periodic for handling deleted instances. I don't know why it would be looking up instances by name
18:03:45 efried k, hopefully yonglihe will be able to answer on the review. Thanks sean-k-mooney.
18:04:20 sean-k-mooney melwitt: it is piggybacking on that task
18:04:42 melwitt but I see a lot of new methods
18:04:42 sean-k-mooney efried: ok im trying to find where it gets teh list of suspected instances
18:05:16 sean-k-mooney melwitt: ya there are. im not sure if they are all needed.
18:05:36 melwitt yeah, I would think there should be none new needed
18:06:18 sean-k-mooney well we need a new method to query the driver for the instance that are runnin on the host but not in the db
18:07:07 sean-k-mooney and then we can call the old meethods that implement the policy. e.g. reap or log or do nothing whatever you have set in the config
18:07:31 melwitt why? there's a self._get_instances_on_driver method already
18:08:08 sean-k-mooney that is a good question :)
18:08:46 sean-k-mooney i have only properly looked at https://review.opendev.org/#/c/648912/14 whic is doing driver change but i need to look at how that is being used in https://review.opendev.org/#/c/627765/28
18:08:51 melwitt anyway, just saying skimming those patches I don't see why they're so large
18:09:29 melwitt or rather, I expected not such a large patch for this
18:13:16 sean-k-mooney melwitt: this seam oververly complex https://review.opendev.org/#/c/627765/28/nova/compute/manager.py@8884
18:14:32 sean-k-mooney also _destroy_orphan_instances is not a greate name for that since it might not destroy anything .
18:15:02 sean-k-mooney melwitt: i was wrong however its adding a new periodici task not extending the existing tasks
18:41:20 melwitt sean-k-mooney: yeah, that's what I had thought. it should get much simpler if it's changed to extend the existing periodic. and I think the suggestion at the ptg from dansmith was to add another enumerated choice to the existing conf option that is something like "reap-unknown" which will reap both known deleted and unknown guests. and otherwise just log unknowns
18:43:20 openstackgerrit Merged openstack/nova master: Fix update_provider_tree signature in reference docs https://review.opendev.org/667408
18:43:52 openstackgerrit Eric Fried proposed openstack/nova-specs master: grammar fix for show-server-numa-topology spec https://review.opendev.org/667487
18:44:18 mriedem sean-k-mooney: it's named _destroy* because it's similar to the existing _destroy_running_instances
18:44:21 mriedem which might also not destroy anything
18:44:38 sean-k-mooney mriedem: ah ok
18:45:12 sean-k-mooney my main issue with this is it will only really work for the libvirt dirver
18:45:17 sean-k-mooney and the fake driver
18:53:52 mriedem yup
18:54:01 mriedem i believe i noted something along those lines last time i did a deep review on it
18:54:07 openstackgerrit Miguel Ángel Herranz Trillo proposed openstack/nova master: Fix type error on call to mount device https://review.opendev.org/659780
18:54:13 mriedem i seem to remember why they needed to lookup by name at the time, and it was libvirt-specific
18:54:35 sean-k-mooney ya im looking at it again
18:55:02 sean-k-mooney its because we can have libvirt domains that are for instance that nova created but have been deleted form the db
18:55:40 sean-k-mooney so to destory the running guest they need to use the libvirt domain name
18:55:58 mriedem having said all that, i'm not opposed to starting with something that could eventually be implemented by other drivers
18:56:07 mriedem as long as it's graceful about other drivers not implementing the necessary interface
18:56:31 sean-k-mooney ya it handels the not implmetned excetion correctly in the manager
18:56:58 sean-k-mooney im think of asking them to use the uuid instead of the name however
18:57:36 sean-k-mooney the instance uuid is set in the libvirt domains uuid field
18:57:57 mriedem yeah uuid is ideal if we can use it
18:58:14 sean-k-mooney but i libvirt allows us to deleate by uuid
18:58:32 sean-k-mooney me might just be pushing the translation into the dirver
18:59:48 sean-k-mooney im also wondering how to deal with the fact we could be leaking plugged interfaces
19:03:55 mriedem heh,
19:04:01 mriedem well we could be leaking all sorts of things
19:04:07 mriedem storage connections
19:04:42 sean-k-mooney will undefineing the domin delete the root disk?
19:04:55 sean-k-mooney or other disks
19:05:26 mriedem i doubt it
19:05:46 sean-k-mooney if we are reaping the orpahn vms that were created by nova but are deleted in the db we really should be cleaning up all there local resouces in that case which this current patch does not attemtp to do
19:06:03 mriedem otherwise we wouldn't need separate calls during driver.destroy to cleanup the volumes via brick and unplug vifs via os-vif
19:06:19 mriedem at some point this could also be cyborg devices and such couldn't it?
19:06:20 sean-k-mooney ya that is a good point
19:06:26 sean-k-mooney ya
19:07:19 sean-k-mooney so if we are going to reap these we need to try and clean up as much of the resouces as we can or just support powering down the instance but not reaping them?

Earlier   Later