Search Results

Posted	Nick	Remark
#openstack-nova - 2019-06-26
15:10:57	dansmith	why do we need a memcache? it's in the database
15:10:59	efried	it's due to a new rule that recently merged.
15:11:31	sean-k-mooney	dansmith: i was suggesting not keeping in in the database and only having a dict cache or maybe use memcache
15:11:47	dansmith	sean-k-mooney: ...why?
15:11:50	sean-k-mooney	mriedem: and ya it would be a blueprint or spec not a bug fix
15:11:51	mriedem	sean-k-mooney: we can just as easily f that up
15:12:24	sean-k-mooney	well if its in process as a dict cache then if we f it up it fixed by restarting the compute agent
15:12:42	sean-k-mooney	memcahce is proably not going to help with anything
15:12:45	dansmith	we store some stuff in nwinfo that isn't anywhere else, IIRC, like which ports we created vs. the user, so that has to be persisted somewhere if we were going to use memcache
15:12:49	dansmith	...yeah ;)
15:13:02	dansmith	what problem is being solved here?
15:13:22	mriedem	i don't think that overhauling to use an external cache service and restarting the compute is the giant hammer we really need for what we're trying to solve
15:13:29	sean-k-mooney	nothing at the momemnt reworking it is unrelated to what we are trying to fix
15:13:40	openstackgerrit	Eric Fried proposed openstack/nova master: Clean up orphan instances virt driver https://review.opendev.org/648912
15:13:40	openstackgerrit	Eric Fried proposed openstack/nova master: clean up orphan instances https://review.opendev.org/627765
15:13:44	mriedem	so this is a....thought exercise?
15:13:49	efried	sean-k-mooney, gibi: Would y'all please have another look at these --^
15:13:50	sean-k-mooney	yes
15:14:11	sean-k-mooney	its on my todo list to figure ot if it makes sense to even do
15:14:13	gibi	efried: I have it open
15:15:17	efried	thanks gibi
15:15:25	efried	thanks sean-k-mooney
15:15:34	efried	sean-k-mooney: fyi it's apparently a thing stx cares about
15:15:51	efried	thus presumably it "makes sense" in some capacity :)
15:17:05	mriedem	efried: hyperv ci is happy with the update_provider_tree patch https://review.opendev.org/#/c/667417/
15:17:17	efried	mriedem: thanks for the reminder
15:17:49	mriedem	efried: fwiw that cleanup orphan instances thing is also something that the public cloud SIG (and huawei public cloud ops) care about as well, which i was initially reviewing it awhile back
15:18:01	mriedem	*why i was
15:18:34	mriedem	the concern at the last ptg was how much duplication there was with the existing periodic to cleanup running deleted (but not orphaned) instances
15:20:24	efried	okay, thanks for that background.
15:21:13	mriedem	something something live migration fails and you've got untracked guests on the host consuming resources (which aren't tracked obviously) so then trying to schedule things to those hosts fails b/c you're out of resources
15:21:39	efried	sounds like we need a patch to clean up those orphaned instances
15:22:31	mriedem	i'm sure lots of operators have already just written scripts to detect and clean those types of thing sup
15:22:33	mriedem	*up
15:22:38	mriedem	but yeah it's better to have it native probably
15:42:22	efried	mriedem: We don't have a way to prove the xen one is being hit, do we? (update_provider_tree)
15:42:25	efried	since their CI is dead?
15:43:54	efried	mriedem: also, if you haven't already, there should be a note to the ML warning of this (and another before we remove the code path, obvsly)
15:44:06	efried	...for oot folk
15:44:43	mriedem	sorry was just doing tech support with my mom
15:44:53	efried	(I know nova_powervm is copacetic fwiw)
15:45:09	mriedem	i was waiting to send the oot ML email until we were more sure about what i've proposed
15:45:22	mriedem	and idk about the xen one if their CI is dead, though it's pretty damn basic
15:45:25	mriedem	just a port of get_inventory
15:46:05	bauzas	efried: mriedem: heh, the reportclient doesn't of course support all placement API queries, so I wonder whether I should add something like "get_resource_providers()" method in the reportclient just for nova-manage caller, or calling directly the Placement API
15:46:12	bauzas	thoughts on that ?
15:47:03	efried	bauzas: If it's something simple like GET /resource_providers (you really want all of them?) then yeah, just call SchedulerReportClient.get()
15:47:17	bauzas	zactly
15:47:57	efried	sfine
15:48:06	bauzas	efried: but then I don't have a safe_connect connection
15:48:14	mriedem	if you're not going to page, you could be listing 14K providers in the case of cern...
15:48:15	efried	bauzas: We don't want @safe_connect
15:48:19	efried	ever, anywhere
15:48:30	efried	Handle ksa.ClientException at the caller instead.
15:48:45	efried	And if you see @safe_connect anywhere in your travels and want to kill it and do that ^, I will buy your drivks.
15:48:47	efried	drinks
15:48:58	efried	true story
15:49:08	bauzas	it's 40°C here, I'm all for a drink
15:49:15	efried	bauzas: what are you trying to do with the master list?
15:49:32	bauzas	efried: looking up all allocations to see whether they're orphaned
15:49:39	bauzas	mriedem: ah shit, excellent point
15:50:08	mriedem	you could instead page the compute nodes in the cells and hit this api https://developer.openstack.org/api-ref/placement/?expanded=#list-resource-provider-allocations
15:50:13	bauzas	we could possibly need to look at all allocations per resource provider, which would be given by a list of compute services (which is paginated AFAIK)
15:50:31	bauzas	heh, jinxed
15:50:32	mriedem	compute service != compute node == resource provider
15:50:43	bauzas	shit, typo, nodes indeed
15:50:52	bauzas	tell me about my Kilo bp
15:51:38	mriedem	so once you get the allocations for a given provider, what are you going to do?
15:51:50	mriedem	check if an instance (or migration) exists with the given consumer uuid?
15:51:55	mriedem	and if not, consider the allocation orphaned?
15:52:11	mriedem	iff the allocation has resources that nova "owns" like VCPU
15:52:26	mriedem	without consumer types in the allocations response we have to rely on the resource class
15:52:58	bauzas	exactly this, I was about to say which resource classes where nova-related
15:53:07	bauzas	were*
15:53:53	efried	ugh, relying on resource class...
15:54:04	efried	this is where the concept of provider owner would be handy.
15:54:17	bauzas	yeah I know
15:54:32	efried	hopefully we're not allowing allocations from different owners against the same provider anywhere
15:54:39	bauzas	we could also add an argument asking for the resource class we wanna check
15:54:49	efried	no, we shouldn't do it by resource class
15:55:02	efried	because same resource class may be managed by different owners in different providers
15:55:20	efried	think VF (nova-PCI vs cyborg vs neutron)
15:55:53	efried	but we (need to make sure we) have a rule that a provider as a whole is only managed by a single owner.
15:56:25	bauzas	hmmm
15:57:10	bauzas	actually, I'm checking consumer_id
15:57:39	bauzas	so I guess all resource providers corresponding to compute nodes (and children associated) should have allocations against consumer_id that
15:57:53	bauzas	that is either a migration object or a nova instance
15:58:05	bauzas	even cyborg, right?
15:58:31	openstackgerrit	Nate Johnston proposed openstack/nova stable/stein: [DNM] Test change to check for port/instance project mismatch https://review.opendev.org/667663
15:59:20	bauzas	efried: ^?
16:00:45	efried	bauzas: If what you're looking to do is clean up allocations against orphaned instances, I think it's legit to remove all the allocations associated with that consumer, even if they're on providers you don't own. That's symmetrical with what we do when we schedule (we claim all of those atomically from nova).
16:00:51	efried	and
16:01:14	efried	if there's an allocation against a compute node RP, you can legitimately assume it's in that category
16:01:15	efried	but
16:01:33	efried	that will break eventually if we ever have resourceless roots
16:01:34	efried	because
16:01:48	efried	you can not assume that all children of the compute node RP also belong to nova.
16:01:51	bauzas	baby steps here :)
16:02:06	efried	yeah, just leave a note/todo I guess.
16:02:21	bauzas	at least if I can support nested rps, it would be cool

Earlier Later