Earlier  
Posted Nick Remark
#openstack-nova - 2019-06-28
14:50:10 gibi mriedem: I don't get your comment at https://review.opendev.org/#/c/668184/1/nova/cmd/manage.py@1891
14:50:11 sean-k-mooney yep. so we whould never need to heal offloaded instances
14:50:47 mriedem now i'm not so sure where we unbind the port on shelve offload,
14:50:49 gibi mriedem: is it about trying the rollback for each port even if one of it fails?
14:50:51 mriedem because https://github.com/openstack/nova/blob/324da0532f3b59aa16233a93a260d289e55860fb/nova/compute/manager.py#L5168 is a noop for neutron
14:51:12 mriedem gibi: yes
14:51:22 mriedem right now you'd stop on the first port update that fails
14:51:31 mriedem i'm saying we should try to clean all and fail at the end
14:51:51 gibi mriedem: OK got it, that could be done
14:52:29 mriedem https://github.com/openstack/nova/blob/324da0532f3b59aa16233a93a260d289e55860fb/nova/network/neutronv2/api.py#L3183
14:52:39 mriedem so are ports unbound when the instance is shelved offloaded....
14:53:12 sean-k-mooney maybe in driver.destroy
14:53:15 sean-k-mooney ill go check
14:53:18 sean-k-mooney but they might not be
14:53:50 sean-k-mooney destroy should at least unplug them
14:54:07 mriedem right
14:54:14 sean-k-mooney acttuly it cant unbind
14:54:18 mriedem driver.destroy just does the unplug, the virt drivers never mess with port bindings
14:54:20 sean-k-mooney because we use destoy in reboot
14:54:28 sean-k-mooney and we dont ubind there
14:54:57 mriedem anyway for gibi's heal_allocation changes it doesn't really matter b/c we wouldn't get that far anyway for a shelved offloaded instance b/c we won't try to heal allocations on an instance that doesn't have a host est
14:55:00 mriedem *set
14:55:46 gibi ack
14:56:08 sean-k-mooney right its just something to look into to see if we have a latent bug
14:56:45 sean-k-mooney i would expect it to happen as part of moving it ot offloaded
14:57:08 sean-k-mooney but i dont think it would break anything either
14:57:10 gibi so in summary. I will add the rollback of neutron updates. But I only add a todo to possible rollback-retry
14:57:36 gibi mriedem: is that works for you ?
14:57:53 gibi mriedem: or you are a bit more on the paranoid side :)
14:58:50 mriedem the backoff retry could be a follow up
14:58:58 gibi mriedem: OK
14:59:00 mriedem try to stagger these changes as much as possible
14:59:04 mriedem since this is a large change
14:59:05 sean-k-mooney oh other edgecase. do we currently allow attaching ports with resouce requests
14:59:13 mriedem sean-k-mooney: no
14:59:17 sean-k-mooney cool
14:59:20 gibi :)
14:59:46 gibi mriedem: I can split out the helper functions from the current commit if that helps
15:00:16 mriedem gibi: sure
15:00:39 mriedem sean-k-mooney: this is where that would fail https://github.com/openstack/nova/blob/324da0532f3b59aa16233a93a260d289e55860fb/nova/compute/api.py#L4408
15:01:25 sean-k-mooney cool. AttachInterfaceWithQoSPolicyNotSupported is slightly missleading but it makes sense
15:01:46 sean-k-mooney we allow attaching ports with QoS polices jut not with min bandwith qos policies
15:01:59 sean-k-mooney e.g. DSCP is fine as is max bandwith
15:02:40 sean-k-mooney its really we dont allow attaching with resource requests.
15:02:42 gibi sean-k-mooney: file a bug, I'm happy to amend the message of AttachInterfaceWithQoSPolicyNotSupported exception with a more specific message
15:04:08 mriedem efried: the reason i didn't jump on https://storyboard.openstack.org/#!/story/2005258 was because of the osc semantic patterns to follow for other osc commands as mentioned in the comments - i gets a bit hairy
15:04:30 mriedem rather than just like a simple --append or --remove
15:04:31 sean-k-mooney gibi: sure i could but we will also be changing that at some point too right
15:04:53 sean-k-mooney e.g. we will eventrually allow it if the host can fulfil the request but like not until U+
15:04:58 gibi sean-k-mooney: based on my current speed of progress with the server move operation patches it will be far in the future
15:05:29 gibi sean-k-mooney: but yes, in theory I would like to get the support for attach interface if the request fitts on the current host
15:05:52 sean-k-mooney well i know that neutorn want to start modeing ip pools in placment and use resouce resquest in the future
15:06:08 sean-k-mooney so i expect it to become more common
15:06:15 gibi yeah
15:06:31 mriedem are you talking about for routed networks?
15:06:35 mriedem modeling routed networks
15:06:49 sean-k-mooney not just routed netwrok but that is one usecase
15:06:51 mriedem i.e. https://review.opendev.org/#/c/656885/
15:07:19 mriedem that's been kicked around for many years and is going to be complicated i imagine
15:07:42 mriedem we don't even have integration testing for the routed networks support that is documented in the neutron docs
15:08:08 mriedem https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html
15:08:11 sean-k-mooney i think they also wanted to supprot using sharing aggrates to model things like floating ips and what host can recive an ip form which subnet pool
15:08:44 mriedem well the good news is pop up teams officially exist now so all of this work can get done!
15:09:14 sean-k-mooney ya without a cross project effort this is hard to make progress on
15:09:52 mriedem i was being a sarcastic jerk but yeah
15:11:09 sean-k-mooney hehe well i have tried to do enough stuff that crosses the nova neutron boundry that i know without getting buyin form both core teams up front and finding people to care about it it will fail
15:11:33 sean-k-mooney anyway thats a tangent
15:11:56 mriedem lyarwood: i guess you got busy https://review.opendev.org/#/q/topic:bug/1653953+(status:open+OR+status:merged)
15:13:11 lyarwood mriedem: busy spamming yes, hopefully that's what you had in mind.
15:13:24 mriedem it is
15:14:01 lyarwood wonderful, there are also DNM changes actually testing things on each branch btw
15:14:08 lyarwood I'll add them to the same topic now to make it clear
15:14:45 mriedem oh i see
15:15:05 mriedem nova DNM changes that depend on the related devstack-plugin-ceph change per branch that depeds on the backport per nova branch....
15:15:28 lyarwood and around and around we go
15:16:08 lyarwood but that should allow the fix to land then the change to enable the test without everything blowing up like it did with my cinder migrate series.
15:16:08 mriedem this is going to be a pain for cross-cell resize, i'm not sure how i'm going to handle that
15:17:37 lyarwood I really need to go over the spec and code for that, are you using shelve/unshelve there?
15:17:44 mriedem no,
15:18:02 mriedem but for non-volume-backed servers, like shelve i'm creating a snapshot to get the disk from the source cell to the target cell
15:18:16 mriedem and then spawning from that snapshot in the target cell and then delete the temporary snapshot, like unshelve
15:18:32 mriedem so the fix here about checking the vm_state wont' work for cross-cell resize
15:19:26 lyarwood yeah true
15:19:30 mriedem we could check something like the task_state used during that part of the cross-cell resize spawn, but that's going to be used during a normal resize as well, and i'm not sure if flatten should be used there
15:19:45 mriedem we could temporarily hack something into the image meta or the instance system_metadata to be read by the driver, but again that's super hacky
15:19:59 lyarwood You could make it optional for resize and spawn tbh
15:20:14 mriedem optional how?
15:20:44 lyarwood with a configurable, but it would slightly slow down spawns and resizes etc in any env that used it
15:20:54 lyarwood so maybe not
15:20:59 mriedem i think i know a bit better solution,
15:21:20 mriedem the instance.migration_context should be set during a cross-cell resize, and from that we can get the Migration record which will have a cross_cell_move boolean on it so i'd just check that
15:21:22 mriedem from the driver
15:21:54 mriedem it's not fun but it's better than driver interface changes or hacking metadata temporarily or checking vm/task state
15:22:12 lyarwood ack that works
15:22:36 lyarwood we don't have a migration record for shelve/unshelve right?
15:22:49 lyarwood IOW I haven't missed an easier way of fixing my issue?
15:23:19 lyarwood ignore me, of course we don't.
15:23:20 mriedem nope
15:24:33 dansmith that's interesting,
15:24:36 dansmith I've never thought about that

Earlier   Later