Search Results

Posted	Nick	Remark
#openstack-nova - 2019-06-28
13:33:31	kashyap	KeithMnemonic: Your best bet is GKotton (who is not here on IRC)
13:34:04	efried	mriedem, dansmith: placement agg sync is automatic now, right? No need to run nova-manage placement sync-aggregates? https://review.opendev.org/#/c/667952/1/doc/source/reference/forbidden-aggregates.rst@44
13:34:16	kashyap	KeithMnemonic: You might want to try his e-mail (gkotton@vmware.com).
13:34:36	dansmith	efried: yeah, the manage sync is for fixup and upgrades at this point I think
13:34:37	kashyap	[And of course, Cc the list, so others could learn, too.]
13:34:45	efried	thanks dansmith
13:35:32	KeithMnemonic	yes i was looking for him, we know each other so hopefully i can find him next week
13:35:43	KeithMnemonic	thanks kashyap
13:36:14	kashyap	KeithMnemonic: By "the list" I meant: openstack-discuss@lists.openstack.org
13:36:20	mriedem	efried: right what dan said,
13:36:32	mriedem	except if the api fails to remove a provider from an aggregate, sync_aggregates won't fix that
13:36:42	mriedem	sync_aggregates is only additive
13:36:53	KeithMnemonic	ok, i thought you meant wait for him to show up back here. i will send an email today
13:38:02	efried	also, mriedem, /me Remind / ping / harass re https://review.opendev.org/662881 (sdk spec)
13:39:26	mriedem	d'oh!
13:39:39	mriedem	can i hit snooze on that until after i take my kid to camp?
13:39:57	efried	of course. I can hit you up hourly
13:40:20	dansmith	cron
13:40:34	dansmith	it's they only way he'll learn.
13:46:57	openstackgerrit	Merged openstack/nova master: Add missing tests for flavor extra_specs mv 2.61 https://review.opendev.org/667600
14:08:28	efried	ugh, do we not have `openstack resource provider trait add` ?
14:24:29	mriedem	efried: https://docs.openstack.org/osc-placement/latest/cli/index.html#trait-create
14:24:46	mriedem	osc verbs are create/set/unset/delete/list
14:24:52	mriedem	and /show
14:25:06	efried	Right, I mean "add this trait to resource provider X without fing with any of its existing traits"
14:25:22	efried	so I don't have to do trait list + add to that + trait set
14:25:32	mriedem	oh, yeah we have a few gaps in ux like that in osc-placement
14:25:50	mriedem	it's annoying, especially for things like adding inventory with a new resource class to a provider or allocations
14:26:00	efried	meanwhile, how tf do I get a compute node UUID?
14:26:13	mriedem	openstack --os-compute-api-version 2.53 hypervisor list
14:26:35	efried	by name?
14:26:52	efried	got it
14:26:53	efried	phew
14:26:54	mriedem	https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/hypervisor.html#hypervisor-list
14:26:54	mriedem	--matching <hostname>
14:27:55	efried	openstack --os-compute-api-version 2.53 hypervisor show my-compute-name -f value -c id
14:27:56	efried	?
14:28:20	mriedem	to get the id, probably -c ID
14:28:46	mriedem	node_uuid=$(openstack --os-compute-api-version 2.53 hypervisor show <hostname> -f value -c ID)
14:29:03	efried	`ID` wasn't working for me, but `id` does.
14:29:06	mriedem	i'm not totally sure osc is working with 2.53 everywhere yet
14:29:09	mriedem	ah ok
14:29:19	mriedem	i've noticed some inconsistencies with ID vs id in osc
14:29:24	efried	yeah, totally
14:29:26	mriedem	for image nad server it's ID i think
14:29:37	openstackgerrit	Balazs Gibizer proposed openstack/nova master: WIP: Add rollback to heal port allocation https://review.opendev.org/668184
14:29:39	mriedem	open a story bug
14:30:28	gibi	mriedem, efried: I hacked up the rollback code for heal port allocation. https://review.opendev.org/668184 Based on the code I feel I'm just pushing the human-interaction-needed problem one level deeper, when the rollback fails
14:30:33	mriedem	efried: there is a story for the trait append thing https://storyboard.openstack.org/#!/story/2005258
14:30:40	mriedem	i knew it sounded familiar....
14:30:47	efried	thanks
14:33:30	mriedem	gibi: i left some comments,
14:34:13	mriedem	but i haven't fully thought through which is worse - the port with the binding:profile.allocation set to something when the allocation doesn't exist in neutron vs the allocation existing in neutron but the port binding profile not mapped to that provider
14:34:26	mriedem	*doesn't exist in placement
14:35:42	gibi	mriedem: if the rollback retry fails the is it OK to ask for the human to help?
14:35:57	gibi	I feel at the end we need the human anyhow
14:37:55	gibi	if we set the allocation key in neutron without having the allocation placemen then we tell neutron to use a resource that is not really allocated. But the physical bandwidth anyhow was used even before we started to heal
14:38:27	mriedem	so the risk there is over-committing the resource right?
14:38:32	mriedem	b/c placement isn't tracking the allocation
14:38:54	gibi	yes, but the overcommit situation can already exists (hence the need of healing)
14:40:04	mriedem	then isn't that better than potentially having the allocations in placement w/o the neutron port binding profile tracking the allocation and if the admin screws up the manual steps, doubling the allocation by re-running the command? iow, it's no different than the situation they could already be in
14:40:32	mriedem	if you tried to run the command again we wouldn't heal that instance / port combo b/c the port would already say it's allocated when really it might not be
14:40:55	mriedem	i agree there is some amount of "we failed our main objective, and we failed to rollback, you need to step in now" if we get there
14:41:28	mriedem	but i would rather we at least try to rollback if possible
14:41:36	sean-k-mooney	i have not been following two closely but how do you determin currently a port needs healing?
14:41:40	mriedem	and it sounds like rolling back the allocation changes is harder since we merged the resources
14:41:59	mriedem	sean-k-mooney: it's a port with a resource_request and doesn't have an allocation set in the binding profile
14:42:17	sean-k-mooney	that could be a problem
14:42:22	mriedem	that makes me think,
14:42:30	mriedem	we should also be making sure the port is actually bound to a host right?
14:42:33	gibi	mriedem: rolling back the allocations can be done by saving what was the original allocation to restore
14:42:34	sean-k-mooney	what about cases where we set the qos policy on a network
14:43:17	mriedem	gibi: ...yeah but that could also get messy right b/c we could lose a race and our generation is off
14:43:20	sean-k-mooney	we only create the allocation if you pass in the port right
14:43:23	mriedem	then what do we do?
14:43:30	mriedem	rollling back the port binding profile allocation field seems easier to me
14:43:40	mriedem	sean-k-mooney: yes
14:43:49	gibi	mriedem: correct, if something else updates the allocation in between then we are rolling back to a wrong allocation
14:43:54	mriedem	we do'nt support creating ports on networks with a qos policy
14:44:11	sean-k-mooney	at all?
14:44:11	gibi	mriedem: rolling back the neutron updated seems easy to me too
14:44:18	gibi	easyier
14:44:25	sean-k-mooney	or we create the ports but dont request the allcotion
14:44:51	sean-k-mooney	because we created the port in the compute node
14:45:14	mriedem	sean-k-mooney: this is the code that determines if we need to heal allocations for the port https://review.opendev.org/#/c/637955/28/nova/cmd/manage.py@1783
14:45:26	mriedem	sean-k-mooney: we fail
14:45:52	mriedem	sean-k-mooney: https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py#L468
14:46:26	mriedem	gibi: so i think we agree that rolling back the port binding update is simpler than the allocation
14:46:33	gibi	mriedem: good point about port bound to a host. But can it be a port with device_id=instance_uuid that is not bound?
14:46:35	mriedem	and i'd prefer we include a rollback
14:47:01	mriedem	gibi: "But can it be a port with device_id=instance_uuid that is not bound?" that i'm not sure about
14:47:03	mriedem	sean-k-mooney: ^
14:47:05	sean-k-mooney	... ok was an api breakage on upgrade but i understand why it was done
14:47:12	mriedem	sean-k-mooney: oh i think we can,
14:47:15	mriedem	because of shelve offload
14:47:31	mriedem	a shelved instance still has its ports and volumes
14:47:37	mriedem	but those ports and volumes aren't "bound" to a host
14:47:40	sean-k-mooney	yes shelve offloaded would still have the device id set
14:47:47	gibi	ack
14:47:55	gibi	then I have to check for boundness as well

Earlier Later