Search Results

Posted	Nick	Remark
#openstack-nova - 2019-06-26
18:56:07	mriedem	as long as it's graceful about other drivers not implementing the necessary interface
18:56:31	sean-k-mooney	ya it handels the not implmetned excetion correctly in the manager
18:56:58	sean-k-mooney	im think of asking them to use the uuid instead of the name however
18:57:36	sean-k-mooney	the instance uuid is set in the libvirt domains uuid field
18:57:57	mriedem	yeah uuid is ideal if we can use it
18:58:14	sean-k-mooney	but i libvirt allows us to deleate by uuid
18:58:32	sean-k-mooney	me might just be pushing the translation into the dirver
18:59:48	sean-k-mooney	im also wondering how to deal with the fact we could be leaking plugged interfaces
19:03:55	mriedem	heh,
19:04:01	mriedem	well we could be leaking all sorts of things
19:04:07	mriedem	storage connections
19:04:42	sean-k-mooney	will undefineing the domin delete the root disk?
19:04:55	sean-k-mooney	or other disks
19:05:26	mriedem	i doubt it
19:05:46	sean-k-mooney	if we are reaping the orpahn vms that were created by nova but are deleted in the db we really should be cleaning up all there local resouces in that case which this current patch does not attemtp to do
19:06:03	mriedem	otherwise we wouldn't need separate calls during driver.destroy to cleanup the volumes via brick and unplug vifs via os-vif
19:06:19	mriedem	at some point this could also be cyborg devices and such couldn't it?
19:06:20	sean-k-mooney	ya that is a good point
19:06:26	sean-k-mooney	ya
19:07:19	sean-k-mooney	so if we are going to reap these we need to try and clean up as much of the resouces as we can or just support powering down the instance but not reaping them?
19:07:50	openstackgerrit	Merged openstack/nova-specs master: grammar fix for show-server-numa-topology spec https://review.opendev.org/667487
19:07:57	sean-k-mooney	if we just delete the domain the operator has noting ot go on to figure out what they need to clean up manually
19:09:11	sean-k-mooney	its tricky however as from the domain we dont know what nova create or not but i think we should at lesast try to do some cleanup
19:12:52	sean-k-mooney	anyway im going to get something to eat
19:13:05	sean-k-mooney	o/
19:19:29	mriedem	there is meta in the domain that tells us if nova created it
19:19:42	sean-k-mooney	yes there is
19:19:48	mriedem	for libvirt anyway
19:20:05	sean-k-mooney	which i asked yonglihe to check when deleteing the domian
19:20:13	sean-k-mooney	in the libvirt case that is
19:20:48	efried	mriedem: +1 on the ML note, thanks for that
19:21:24	mriedem	np
19:22:36	sean-k-mooney	that is what https://review.opendev.org/#/c/648912/14/nova/virt/libvirt/driver.py@9695 is doing and its used to filer the domains here https://review.opendev.org/#/c/627765/28/nova/compute/manager.py@8886
19:59:33	efried	mriedem: took me three days to go through all the specs, but http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007381.html
20:10:01	mriedem	efried: huh https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewedby:self seems to not work properly, it's only showing me https://review.opendev.org/#/c/631154/ but i've clearly commented on https://review.opendev.org/#/c/648686/ as well
20:10:17	mriedem	maybe reviewedby is only on the latest PS?
20:10:55	mriedem	aha
20:10:56	mriedem	https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewer:self
20:11:00	mriedem	reviewer:self, not reviewedby:self
20:14:14	efried	whoops, thanks
21:31:07	melwitt	mnaser, imacdonn: hi, as responders to the ML thread awhile back, I have a spec up for showing server status as UNKNOWN if host status is UNKNOWN that has been receiving some reviews. your reviews would be helpful for deciding whether it goes forward https://review.opendev.org/666181
21:33:10	mriedem	dansmith: you may want to drop your +2 to a +1 or 0 https://review.opendev.org/#/c/457886/ until i get the ceph job results on it
21:41:44	openstackgerrit	Miguel Ángel Herranz Trillo proposed openstack/nova master: Fix type error on call to mount device https://review.opendev.org/659780
21:44:11	mriedem	dansmith: nvm, lyarwood already had a patch up to test that
22:14:40	mriedem	efried: are you ok with me just pushing up this test change and +2ing? https://review.opendev.org/#/c/659780/3/nova/tests/unit/virt/disk/mount/test_nbd.py
22:14:51	efried	...
22:15:58	efried	mriedem: yes, and I'll +A
22:16:52	openstackgerrit	Matt Riedemann proposed openstack/nova master: Fix type error on call to mount device https://review.opendev.org/659780
22:17:37	mriedem	done
22:18:58	efried	+A
22:25:39	mnaser	melwitt: left a comment thanks :D
22:43:44	melwitt	thanks
#openstack-nova - 2019-06-27
13:13:51	gibi	mriedem: the previous patches just refactored the heal allocation code path
13:14:10	gibi	to make it easy to add the port healing
13:15:30	gibi	mriedem: the basic structure of final patch
13:18:00	gibi	mriedem: during _heal_allocations_for_instance first we get if the instance has ports that needs healing
13:18:26	gibi	this _get_port_allocations_to_heal is the majority of the added code there
13:19:01	gibi	as it detect is there is any port that needs healing and generates the allocation fragment for that port
13:19:33	mriedem	ok i'm assuming just checking if the port has a resource request (if hitting neutron directly) or checking the info cache for a bound port with the allocatoin in the binding profile?
13:19:52	gibi	we are hitting neutron directly
13:20:02	mriedem	hmm, that's kind of expensive isn't it?
13:20:19	mriedem	why can't we use the info cache to tell if the vif has a port allocation?
13:20:49	mriedem	or is that some kind of chicken and egg issue? or we don't trust the cache?
13:20:55	gibi	we need to hit neutron to see if there is resource request on the port
13:21:00	gibi	that is not part of the vif cache
13:21:23	mriedem	sure but the binding profile in the vif cache has a thing that tells us if it had a resource request right?
13:21:44	gibi	mriedem: now, the binding profile in the cache only could tell us if there is allocation for the port
13:21:52	openstackgerrit	Eric Fried proposed openstack/nova-specs master: WIP: Physical TPM passthrough https://review.opendev.org/667926
13:21:57	gibi	but if there is no allocation then we don't know if there is a need for such allocation
13:22:48	mriedem	so let's say the binding profile in the vif cache doesn't have that allocation marker, but we find a port with a resource request - we're going to heal the allocation by putting allocations to placement for that instance right?
13:23:03	mriedem	are we also going to update the cache to indicate it's now got an allocatoin?
13:23:14	gibi	mriedem: right, and also updating neutron with the rp uuid
13:23:27	gibi	mriedem: hm, we are not updating the info cache
13:23:29	gibi	:/
13:24:16	mriedem	yeah, so i'm wondering about the scenario that our cache is busted, we go tell placement there is an allocation, but then what happens when we teardown that port/server in the compute service? are we basing that on the cache? b/c if so, we'll leak the allocation correcT?
13:24:25	openstackgerrit	Eric Fried proposed openstack/nova master: WIP: Physical TPM passthrough https://review.opendev.org/667928
13:25:07	mriedem	similarly are we only creating the port allocation in heal_allocations if the port is bound?
13:25:13	gibi	mriedem: we cannot leak allocation during delete as delete deletes everything for the consumer
13:25:28	mriedem	gibi: what about when we detach the port though?
13:25:31	mriedem	but don't delete the server
13:26:31	gibi	I hope that code path works based on the allocation key in the binding profile, but let me check
13:26:57	mriedem	https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/network/neutronv2/api.py#L1672
13:27:00	mriedem	looks like it does...
13:27:08	gibi	mriedem: in the other hand, is there any way to trigger a info_cache refresh or we need to do it manually?
13:28:33	mriedem	there is a periodic task in the compute service that will forcefully rebuild the cache from info in neutron,
13:28:44	mriedem	but i don't think that code is doing anything with the resource request / allocation information yet
13:28:55	mriedem	might have been something i brought up when reviewing the series in stein
13:29:30	mriedem	this is the call from the compute task https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/compute/manager.py#L7445
13:30:20	mriedem	this is the neutronv2 api method that builds the vif object in the cache https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/network/neutronv2/api.py#L2842
13:30:35	mriedem	we copy the binding:profile directly off the port https://github.com/openstack/nova/blob/7b769ad403751268c60b095f722437cbed692071/nova/network/neutronv2/api.py#L2887
13:31:23	mriedem	maybe that's enough? the 'allocation' is in the binding:profile yeah? and that maps to the resource provider uuid on which the allocation belongs
13:31:53	gibi	copying binding:profile is enough as it has the 'allocaton' key
13:32:06	gibi	and that points to the RP
13:32:10	gibi	we allocate from
13:32:54	gibi	mriedem: this is where the heal port allocation makes sure that neutron binding:profile is updated https://review.opendev.org/#/c/637955/28/nova/cmd/manage.py@1857
13:33:19	openstackgerrit	Merged openstack/nova stable/stein: libvirt: Rework 'EBUSY' (SIGKILL) error handling code path https://review.opendev.org/667389
13:34:22	mriedem	ok so going back to my original question, cache vs neutron directly, i guess heal_allocations is going to the source of truth first (neutron) and working based on that, which is more fool-proof if possibly less performant
13:34:53	mriedem	if we healed based on the cache, and the cache was stale or missing information somehow, then we would fail to heal allocations for the ports in that busted cache
13:35:16	gibi	mriedem: there is no way to figure out what to heal if we don't hit neutron. the info cache only good to see if there is allocation or not for a port, but only neutron knows if the port needs allocation
13:35:19	mriedem	with the way you have it, heal_allocations would heal the allocations and the port binding profile and the _heal_instance_info_cache task in the compute service running that instance would heal the cache

Earlier Later