| Posted | Nick | Remark | |
|---|---|---|---|
| #openstack-nova - 2019-06-28 | |||
| 14:52:29 | mriedem | https://github.com/openstack/nova/blob/324da0532f3b59aa16233a93a260d289e55860fb/nova/network/neutronv2/api.py#L3183 | |
| 14:52:39 | mriedem | so are ports unbound when the instance is shelved offloaded.... | |
| 14:53:12 | sean-k-mooney | maybe in driver.destroy | |
| 14:53:15 | sean-k-mooney | ill go check | |
| 14:53:18 | sean-k-mooney | but they might not be | |
| 14:53:50 | sean-k-mooney | destroy should at least unplug them | |
| 14:54:07 | mriedem | right | |
| 14:54:14 | sean-k-mooney | acttuly it cant unbind | |
| 14:54:18 | mriedem | driver.destroy just does the unplug, the virt drivers never mess with port bindings | |
| 14:54:20 | sean-k-mooney | because we use destoy in reboot | |
| 14:54:28 | sean-k-mooney | and we dont ubind there | |
| 14:54:57 | mriedem | anyway for gibi's heal_allocation changes it doesn't really matter b/c we wouldn't get that far anyway for a shelved offloaded instance b/c we won't try to heal allocations on an instance that doesn't have a host est | |
| 14:55:00 | mriedem | *set | |
| 14:55:46 | gibi | ack | |
| 14:56:08 | sean-k-mooney | right its just something to look into to see if we have a latent bug | |
| 14:56:45 | sean-k-mooney | i would expect it to happen as part of moving it ot offloaded | |
| 14:57:08 | sean-k-mooney | but i dont think it would break anything either | |
| 14:57:10 | gibi | so in summary. I will add the rollback of neutron updates. But I only add a todo to possible rollback-retry | |
| 14:57:36 | gibi | mriedem: is that works for you ? | |
| 14:57:53 | gibi | mriedem: or you are a bit more on the paranoid side :) | |
| 14:58:50 | mriedem | the backoff retry could be a follow up | |
| 14:58:58 | gibi | mriedem: OK | |
| 14:59:00 | mriedem | try to stagger these changes as much as possible | |
| 14:59:04 | mriedem | since this is a large change | |
| 14:59:05 | sean-k-mooney | oh other edgecase. do we currently allow attaching ports with resouce requests | |
| 14:59:13 | mriedem | sean-k-mooney: no | |
| 14:59:17 | sean-k-mooney | cool | |
| 14:59:20 | gibi | :) | |
| 14:59:46 | gibi | mriedem: I can split out the helper functions from the current commit if that helps | |
| 15:00:16 | mriedem | gibi: sure | |
| 15:00:39 | mriedem | sean-k-mooney: this is where that would fail https://github.com/openstack/nova/blob/324da0532f3b59aa16233a93a260d289e55860fb/nova/compute/api.py#L4408 | |
| 15:01:25 | sean-k-mooney | cool. AttachInterfaceWithQoSPolicyNotSupported is slightly missleading but it makes sense | |
| 15:01:46 | sean-k-mooney | we allow attaching ports with QoS polices jut not with min bandwith qos policies | |
| 15:01:59 | sean-k-mooney | e.g. DSCP is fine as is max bandwith | |
| 15:02:40 | sean-k-mooney | its really we dont allow attaching with resource requests. | |
| 15:02:42 | gibi | sean-k-mooney: file a bug, I'm happy to amend the message of AttachInterfaceWithQoSPolicyNotSupported exception with a more specific message | |
| 15:04:08 | mriedem | efried: the reason i didn't jump on https://storyboard.openstack.org/#!/story/2005258 was because of the osc semantic patterns to follow for other osc commands as mentioned in the comments - i gets a bit hairy | |
| 15:04:30 | mriedem | rather than just like a simple --append or --remove | |
| 15:04:31 | sean-k-mooney | gibi: sure i could but we will also be changing that at some point too right | |
| 15:04:53 | sean-k-mooney | e.g. we will eventrually allow it if the host can fulfil the request but like not until U+ | |
| 15:04:58 | gibi | sean-k-mooney: based on my current speed of progress with the server move operation patches it will be far in the future | |
| 15:05:29 | gibi | sean-k-mooney: but yes, in theory I would like to get the support for attach interface if the request fitts on the current host | |
| 15:05:52 | sean-k-mooney | well i know that neutorn want to start modeing ip pools in placment and use resouce resquest in the future | |
| 15:06:08 | sean-k-mooney | so i expect it to become more common | |
| 15:06:15 | gibi | yeah | |
| 15:06:31 | mriedem | are you talking about for routed networks? | |
| 15:06:35 | mriedem | modeling routed networks | |
| 15:06:49 | sean-k-mooney | not just routed netwrok but that is one usecase | |
| 15:06:51 | mriedem | i.e. https://review.opendev.org/#/c/656885/ | |
| 15:07:19 | mriedem | that's been kicked around for many years and is going to be complicated i imagine | |
| 15:07:42 | mriedem | we don't even have integration testing for the routed networks support that is documented in the neutron docs | |
| 15:08:08 | mriedem | https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html | |
| 15:08:11 | sean-k-mooney | i think they also wanted to supprot using sharing aggrates to model things like floating ips and what host can recive an ip form which subnet pool | |
| 15:08:44 | mriedem | well the good news is pop up teams officially exist now so all of this work can get done! | |
| 15:09:14 | sean-k-mooney | ya without a cross project effort this is hard to make progress on | |
| 15:09:52 | mriedem | i was being a sarcastic jerk but yeah | |
| 15:11:09 | sean-k-mooney | hehe well i have tried to do enough stuff that crosses the nova neutron boundry that i know without getting buyin form both core teams up front and finding people to care about it it will fail | |
| 15:11:33 | sean-k-mooney | anyway thats a tangent | |
| 15:11:56 | mriedem | lyarwood: i guess you got busy https://review.opendev.org/#/q/topic:bug/1653953+(status:open+OR+status:merged) | |
| 15:13:11 | lyarwood | mriedem: busy spamming yes, hopefully that's what you had in mind. | |
| 15:13:24 | mriedem | it is | |
| 15:14:01 | lyarwood | wonderful, there are also DNM changes actually testing things on each branch btw | |
| 15:14:08 | lyarwood | I'll add them to the same topic now to make it clear | |
| 15:14:45 | mriedem | oh i see | |
| 15:15:05 | mriedem | nova DNM changes that depend on the related devstack-plugin-ceph change per branch that depeds on the backport per nova branch.... | |
| 15:15:28 | lyarwood | and around and around we go | |
| 15:16:08 | lyarwood | but that should allow the fix to land then the change to enable the test without everything blowing up like it did with my cinder migrate series. | |
| 15:16:08 | mriedem | this is going to be a pain for cross-cell resize, i'm not sure how i'm going to handle that | |
| 15:17:37 | lyarwood | I really need to go over the spec and code for that, are you using shelve/unshelve there? | |
| 15:17:44 | mriedem | no, | |
| 15:18:02 | mriedem | but for non-volume-backed servers, like shelve i'm creating a snapshot to get the disk from the source cell to the target cell | |
| 15:18:16 | mriedem | and then spawning from that snapshot in the target cell and then delete the temporary snapshot, like unshelve | |
| 15:18:32 | mriedem | so the fix here about checking the vm_state wont' work for cross-cell resize | |
| 15:19:26 | lyarwood | yeah true | |
| 15:19:30 | mriedem | we could check something like the task_state used during that part of the cross-cell resize spawn, but that's going to be used during a normal resize as well, and i'm not sure if flatten should be used there | |
| 15:19:45 | mriedem | we could temporarily hack something into the image meta or the instance system_metadata to be read by the driver, but again that's super hacky | |
| 15:19:59 | lyarwood | You could make it optional for resize and spawn tbh | |
| 15:20:14 | mriedem | optional how? | |
| 15:20:44 | lyarwood | with a configurable, but it would slightly slow down spawns and resizes etc in any env that used it | |
| 15:20:54 | lyarwood | so maybe not | |
| 15:20:59 | mriedem | i think i know a bit better solution, | |
| 15:21:20 | mriedem | the instance.migration_context should be set during a cross-cell resize, and from that we can get the Migration record which will have a cross_cell_move boolean on it so i'd just check that | |
| 15:21:22 | mriedem | from the driver | |
| 15:21:54 | mriedem | it's not fun but it's better than driver interface changes or hacking metadata temporarily or checking vm/task state | |
| 15:22:12 | lyarwood | ack that works | |
| 15:22:36 | lyarwood | we don't have a migration record for shelve/unshelve right? | |
| 15:22:49 | lyarwood | IOW I haven't missed an easier way of fixing my issue? | |
| 15:23:19 | lyarwood | ignore me, of course we don't. | |
| 15:23:20 | mriedem | nope | |
| 15:24:33 | dansmith | that's interesting, | |
| 15:24:36 | dansmith | I've never thought about that | |
| 15:25:01 | dansmith | a migration record would let us provide some unshelve (whilst offloaded) host affinity | |
| 15:25:14 | dansmith | not sure how important that is really, but... | |
| 15:25:59 | efried | dansmith, mriedem: How do y'all feel about reorganizing how request filters are enabled via conf? | |
| 15:25:59 | efried | As it stands, it looks like we're headed for random options scattered around various groups. | |
| 15:25:59 | efried | Wondering if it would make sense e.g. to collect them into [request_filter] $name = {enabled|disabled} | |
| 15:25:59 | efried | or [scheduler] request_filters_enabled = $name[,...] | |
| 15:25:59 | efried | or ... | |
| 15:25:59 | efried | My hesitation is that that's really more of a dev-oriented structure; I don't know that the filters are really similar enough in spirit that that would make sense to the operator. | |
| 15:25:59 | efried | Thoughts? | |