| Posted | Nick | Remark | |
|---|---|---|---|
| #openstack-nova - 2019-06-27 | |||
| 13:26:57 | mriedem | https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/network/neutronv2/api.py#L1672 | |
| 13:27:00 | mriedem | looks like it does... | |
| 13:27:08 | gibi | mriedem: in the other hand, is there any way to trigger a info_cache refresh or we need to do it manually? | |
| 13:28:33 | mriedem | there is a periodic task in the compute service that will forcefully rebuild the cache from info in neutron, | |
| 13:28:44 | mriedem | but i don't think that code is doing anything with the resource request / allocation information yet | |
| 13:28:55 | mriedem | might have been something i brought up when reviewing the series in stein | |
| 13:29:30 | mriedem | this is the call from the compute task https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/compute/manager.py#L7445 | |
| 13:30:20 | mriedem | this is the neutronv2 api method that builds the vif object in the cache https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/network/neutronv2/api.py#L2842 | |
| 13:30:35 | mriedem | we copy the binding:profile directly off the port https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/network/neutronv2/api.py#L2887 | |
| 13:31:23 | mriedem | maybe that's enough? the 'allocation' is in the binding:profile yeah? and that maps to the resource provider uuid on which the allocation belongs | |
| 13:31:53 | gibi | copying binding:profile is enough as it has the 'allocaton' key | |
| 13:32:06 | gibi | and that points to the RP | |
| 13:32:10 | gibi | we allocate from | |
| 13:32:54 | gibi | mriedem: this is where the heal port allocation makes sure that neutron binding:profile is updated https://review.opendev.org/#/c/637955/28/nova/cmd/manage.py@1857 | |
| 13:33:19 | openstackgerrit | Merged openstack/nova stable/stein: libvirt: Rework 'EBUSY' (SIGKILL) error handling code path https://review.opendev.org/667389 | |
| 13:34:22 | mriedem | ok so going back to my original question, cache vs neutron directly, i guess heal_allocations is going to the source of truth first (neutron) and working based on that, which is more fool-proof if possibly less performant | |
| 13:34:53 | mriedem | if we healed based on the cache, and the cache was stale or missing information somehow, then we would fail to heal allocations for the ports in that busted cache | |
| 13:35:16 | gibi | mriedem: there is no way to figure out what to heal if we don't hit neutron. the info cache only good to see if there is allocation or not for a port, but only neutron knows if the port needs allocation | |
| 13:35:19 | mriedem | with the way you have it, heal_allocations would heal the allocations and the port binding profile and the _heal_instance_info_cache task in the compute service running that instance would heal the cache | |
| 13:35:31 | mriedem | sure, i realize that, | |
| 13:35:40 | mriedem | i was thinking of using the cache to pre-filter the list of ports we're asking neutron about | |
| 13:36:05 | mriedem | i.e. if we're iterating over 50 intances that's 50 calls to neutron | |
| 13:36:21 | gibi | mriedem: you mean if we see allocaton key in the cache then we assume that such port doesn't need to be healed? | |
| 13:36:22 | mriedem | i do like that you added an option to skip this though | |
| 13:37:04 | mriedem | gibi: something like that | |
| 13:37:41 | mriedem | if we would have stored some flag in the vif cache originally that indicated the port had resource requests (not necessarily the resource request itself), heal_allocations could have just checked that flag in the cache and then call neutron for that port | |
| 13:38:28 | mriedem | and if ifs and buts were candy and nuts we'd all have a very fine christmas | |
| 13:38:55 | gibi | mriedem: these vifs are created _before_ there was any logic in nova about bandwidth resource. So I don't see how to have a flag in nova about it | |
| 13:39:20 | mriedem | oh right this is also for healing the case of ports attached with qos before stein... | |
| 13:39:22 | mriedem | i forgot about that | |
| 13:39:41 | gibi | this is the case exaclty that the port was attached without allocation | |
| 13:39:54 | mriedem | anyway, i'm trying to optimize something that we just want to work on the first go, so i can drop it | |
| 13:39:57 | gibi | because neutron had support way earlier for bandiwdth | |
| 13:40:50 | mriedem | the other good thing is that heal_allocations now has the option to heal a single instance so if an operator is trying to fix one specific instance (b/c of some user ticket or something) then they can just target that one rather than all instances to heal 1 | |
| 13:41:40 | gibi | yeah, I remember I had to adapt to that improvement at some point | |
| 13:42:52 | gibi | one extra complexity to note. as we need to update both placement and neutron the update cannot be made atomic | |
| 13:43:04 | mriedem | and i'm assuming the earlier patches to refactor the code didn't really require changes to the functional tests the validate the interface (unlike you'd need with unit tests) | |
| 13:43:31 | mriedem | i wrote all of the heal_allocations stuff with functional tests b/c i didn't trust unit tests for testing the interface | |
| 13:43:46 | gibi | mriedem: yes, no functional test was harmed in the refactoring process | |
| 13:43:52 | mriedem | if we fail to update one (placement or neutron) we could rollback the other... | |
| 13:44:03 | mriedem | heh, "no functional test was harmed in the making of this patch" | |
| 13:44:37 | mriedem | ok this has been a useful discussion before i dive in | |
| 13:44:39 | mriedem | thanks | |
| 13:45:06 | gibi | mriedem: yes, rollback is one option. I went for just printing what failed and how to update neutron manually | |
| 13:46:05 | gibi | mriedem: if you think rollback is better then I have to temprarly store the original allocation of the instance before the heal, and put that back if neutron update fails | |
| 13:50:05 | openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Multiple API cleanup changes https://review.opendev.org/666889 | |
| 13:51:14 | mriedem | we could also retry the port updates with a backoff loop, like 3 retries or something if it's a temporary network issue | |
| 13:51:19 | mriedem | could be follow ups though | |
| 13:51:35 | mriedem | i'll keep it in mind when i review and leave a comment | |
| 13:54:42 | gibi | mriedem: ack | |
| 14:02:36 | mriedem | nova meeting happening | |
| 14:10:21 | mriedem | gibi: also in the back of my mind i've been thinking of adding something to the nova-next job post_test_hook, like creating a server, deleting it's allocatoins in placement and then running heal_allocations just to make sure we have integration test coverage as well, but it's lower priority | |
| 14:11:22 | gibi | mriedem: I did similar manual testing for my patch in devstack and discovered bugs. So I agree that it would be useful | |
| 14:20:27 | jrosser | guilhermesp: this lot https://review.opendev.org/#/q/topic:fix-octavia+(status:open+OR+status:merged) | |
| 14:20:54 | jrosser | guilhermesp: there are some jobs running with some depends-on, but it's difficult to see whats going on | |
| 14:21:34 | jrosser | oops -ECHAN | |
| 14:50:10 | openstackgerrit | Surya Seetharaman proposed openstack/nova stable/stein: Grab fresh power state info from the driver https://review.opendev.org/667948 | |
| 14:54:35 | mriedem | http://status.openstack.org/reviews/#nova sure is fun for abandon fodder | |
| 14:55:05 | openstackgerrit | Vrushali Kamde proposed openstack/nova master: Support filtering of hosts by forbidden aggregates https://review.opendev.org/667952 | |
| 14:58:33 | efried | abandon away | |
| 15:09:07 | openstackgerrit | Surya Seetharaman proposed openstack/nova stable/rocky: Grab fresh power state info from the driver https://review.opendev.org/667955 | |
| 15:24:55 | openstackgerrit | Matt Riedemann proposed openstack/nova master: RT: replace _instance_in_resize_state with _is_trackable_migration https://review.opendev.org/560467 | |
| 15:25:18 | mriedem | efried: another rebase on that one ^ | |
| 15:25:39 | mriedem | dansmith: can you come back on this instance.hidden patch https://review.opendev.org/#/c/631123/ ? | |
| 15:26:06 | dansmith | <insert hidden pun here> | |
| 15:26:13 | mriedem | dansmith.hidden = True | |
| 15:27:40 | efried | mriedem: I can't even see where the conflict was on that one. | |
| 15:28:46 | efried | oh, never mind | |
| 15:30:52 | dansmith | mriedem: is the hidden check in the api really required? aren't you filtering out hidden instances from the list query? | |
| 15:31:35 | mriedem | dansmith: i'd have to look again to confirm but i believe there is a window where both are not hidden | |
| 15:31:38 | mriedem | while swapping over | |
| 15:31:50 | mriedem | which, | |
| 15:31:52 | dansmith | mriedem: in that case, the check for hidden-ness doesn't help right? | |
| 15:32:00 | mriedem | arguably the api will still filter - it might pick the "wrong" one but... | |
| 15:32:13 | mriedem | the comment in there mentions something about that right? updated_at and such | |
| 15:32:14 | dansmith | right, but the check of the hidden field would be pointless in that case | |
| 15:32:56 | dansmith | the comment is talking about the case where they're both not hidden | |
| 15:33:10 | dansmith | I'm talking about the case where one is.. what's the point of checking it if instance_list doesn't return them? | |
| 15:34:14 | mriedem | ok so this is the point where we could have 2 copies where hidden=False https://review.opendev.org/#/c/635646/32/nova/conductor/tasks/cross_cell_migrate.py@593 so the DB API would return both from each cell while listing | |
| 15:34:37 | mriedem | now that DB API isn't returning hidden=True by default... | |
| 15:34:42 | dansmith | ...and so checking for instance.hidden in compute/api does what? | |
| 15:35:58 | mriedem | yeah in this case now b/c db api won't return the hidden one, "or instance.hidden" will always be false | |
| 15:36:06 | dansmith | right | |
| 15:36:22 | mriedem | the note above still applies, but the logical or doesn't | |
| 15:36:42 | mriedem | so are you ok with removing the or condition and leaving the comment? | |
| 15:37:02 | dansmith | yep, just said that in the review | |
| 15:37:33 | mriedem | yup, thanks | |
| 15:48:24 | openstackgerrit | melanie witt proposed openstack/nova master: Require at least cryptography>=2.7 https://review.opendev.org/667765 | |
| 16:09:17 | mriedem | dansmith: i'm going to drop https://review.opendev.org/#/c/631123/34/nova/tests/unit/compute/test_compute_api.py then since it's not really valid since the db api wouldn't return a hidden=True instance | |
| 16:09:27 | dansmith | yeah | |
| 16:13:42 | openstackgerrit | Miguel Ángel Herranz Trillo proposed openstack/nova master: Add support for cloud-init on LXC instances https://review.opendev.org/667976 | |
| 16:13:42 | openstackgerrit | Miguel Ángel Herranz Trillo proposed openstack/nova master: Add support for 'initenv' elements https://review.opendev.org/667975 | |
| 16:15:40 | mriedem | looks like someone is trying to make libvirt+lxc work again | |
| 16:16:33 | openstackgerrit | Miguel Ángel Herranz Trillo proposed openstack/nova master: Add support for cloud-init on LXC instances https://review.opendev.org/667976 | |
| 16:21:24 | openstackgerrit | Merged openstack/nova-specs master: support virtual persistent memory https://review.opendev.org/601596 | |
| 16:30:48 | efried | mriedem: At some point recently you wrote a functional test that used a weigher to prefer host1 so that the assertion that we landed on host2 was provably valid every time (instead of just by chance). Can you put your finger on that easily? | |
| 16:31:30 | mriedem | efried: look for HostNameWeigher in the nova/tests/functional | |
| 16:31:32 | mriedem | there are several examples | |
| 16:31:39 | efried | thanks | |
| 16:31:54 | mriedem | ima push this big ass series b/c i've been rebasing it locally for weeks and want to flush it | |