| Posted | Nick | Remark | |
|---|---|---|---|
| #openstack-nova - 2019-06-26 | |||
| 14:12:51 | sean-k-mooney | ya i think on reading there bug both would be needed | |
| 14:13:58 | sean-k-mooney | mriedem: amorin is fixing the fact we might be useing an outdated network_info object form the instance and stephenfin is fixing that if we fail due to the db update we never even tried to clean up the vifs | |
| 14:14:35 | sean-k-mooney | so to fix the downsteam bug we will need to backprot both. | |
| 14:14:47 | sean-k-mooney | ok this make more sense to me now. | |
| 14:20:28 | amorin | hey all | |
| 14:22:04 | amorin | the bug I faced 2 days ago was not fixed by stephenfin patch | |
| 14:22:16 | amorin | I found that it was something else in our code | |
| 14:23:01 | amorin | cc mriedem sean-k-mooney | |
| 14:23:43 | mriedem | mnaser: i think you just hit something like this nw info cache lost thing, so you might have input here http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007363.html | |
| 14:23:45 | amorin | by the way, I faced an other one, related to the patch I did: | |
| 14:23:46 | amorin | https://review.opendev.org/#/c/667294/ | |
| 14:23:48 | mriedem | maciejjozefczyk: sean-k-mooney: ^ | |
| 14:24:43 | mriedem | amorin: one step forward, two steps back :( | |
| 14:25:00 | amorin | yup | |
| 14:25:06 | mriedem | i remember a similar check was added here https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/network/base_api.py#L35 | |
| 14:25:27 | amorin | exact | |
| 14:26:20 | sean-k-mooney | ok so we might need all 3 patches | |
| 14:27:04 | sean-k-mooney | amorin: stephenfin patch is a generalised fix to a very specific edgecase | |
| 14:28:12 | sean-k-mooney | amorin: what you originally tried to fix was more subtle as we were passing stale data in some cases | |
| 14:29:14 | maciejjozefczyk | ehh, instance_info_cache :) | |
| 14:29:41 | openstackgerrit | Martin Midolesov proposed openstack/nova master: Implementing graceful shutdown. https://review.opendev.org/666245 | |
| 14:30:05 | sean-k-mooney | maciejjozefczyk: yep its awsome... | |
| 14:30:41 | sean-k-mooney | mriedem: out of interest why do we store the instance info cache in the db? | |
| 14:31:14 | sean-k-mooney | i fell like we would have fewer bugs related to it if we actully just made it an in process dict cache | |
| 14:31:42 | mriedem | sean-k-mooney: i'll direct your question to the people that worked on nova back in 2011 or something | |
| 14:32:26 | sean-k-mooney | well my next question was going to be "i assume this is because of nova networks legacy choices" | |
| 14:32:33 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove no longer required "inner" methods. https://review.opendev.org/655282 | |
| 14:32:34 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unused FP device creation and deletion methods. https://review.opendev.org/635433 | |
| 14:32:34 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Privsepify ipv4 forwarding enablement. https://review.opendev.org/635431 | |
| 14:32:35 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move adding vlans to interfaces to privsep. https://review.opendev.org/635436 | |
| 14:32:35 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Privsep the ebtables modification code. https://review.opendev.org/635435 | |
| 14:32:36 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move dnsmasq restarts to privsep. https://review.opendev.org/639280 | |
| 14:32:36 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move iptables rule fetching and setting to privsep. https://review.opendev.org/636508 | |
| 14:32:37 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move calls to ovs-vsctl to privsep. https://review.opendev.org/639282 | |
| 14:32:37 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move router advertisement daemon restarts to privsep. https://review.opendev.org/639281 | |
| 14:32:38 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move setting of device trust to privsep. https://review.opendev.org/639283 | |
| 14:32:39 | openstackgerrit | Stephen Finucane proposed openstack/nova master: We no longer need rootwrap. https://review.opendev.org/554438 | |
| 14:32:39 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Cleanup the _execute shim in nova/network. https://review.opendev.org/639581 | |
| 14:32:39 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Move final bridge commands to privsep. https://review.opendev.org/639580 | |
| 14:32:40 | openstackgerrit | Stephen Finucane proposed openstack/nova master: Cleanup no longer required filters and add a release note. https://review.opendev.org/639826 | |
| 14:33:00 | mriedem | sean-k-mooney: idk, you'd have to do some digging to find out when the network info cache was introduced, i don't know if it was before quantum or not | |
| 14:33:20 | mriedem | but we also store bdms in the db which are essentially the same thing - a cache of volume information for the server | |
| 14:33:28 | mriedem | which was probably before cinder existed | |
| 14:33:41 | sean-k-mooney | im seeing a pattern there | |
| 14:34:17 | sean-k-mooney | ok well lets fix the current issue first but i think i might look into if we could remove storing it to the db | |
| 14:34:31 | amorin | I would love that | |
| 14:34:35 | amorin | :p | |
| 14:34:46 | sean-k-mooney | cacheing in memory in the compute agent would likely be enough | |
| 14:35:18 | sean-k-mooney | we would have to rebuild it every time the compute agent restarts but i think that is fine | |
| 14:37:22 | sean-k-mooney | actully we could use memcache to cache it too which would mean all the services would have acess to it anway its now on my todo list | |
| 14:38:02 | sean-k-mooney | messing up the netron policy and currpting the network info cache is what cause our ci cloud production outage at the weekend | |
| 14:45:46 | mriedem | TheJulia: is this a known busted job? http://logs.openstack.org/17/667417/1/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa/db33ba3/controller/logs/devstacklog.txt.gz#_2019-06-26_05_47_14_168 | |
| 14:46:53 | mriedem | sean-k-mooney: redoing how the nw info cache works is hopefully wayyyyyyy down on your todo lits | |
| 14:46:55 | mriedem | *list | |
| 14:47:30 | shilpasd | efried: mriedem: can you tell me how to trigger live migration sync and async way, any CLI commands? | |
| 14:48:00 | mriedem | shilpasd: i don't know what you mean, sync and async way | |
| 14:48:58 | shilpasd | mriedem: means nova live-migration <instance_id>, it triggers live migration, but any another way to live migrate, any periodic call or something | |
| 14:49:33 | mriedem | no nova doesn't auto-live migrate things for you | |
| 14:49:51 | shilpasd | mriedem: i am in process of verifying all move operations on NFS changes done against https://review.opendev.org/#/c/650188/ | |
| 14:50:05 | shilpasd | so wnat to take care of all move operations | |
| 14:50:37 | shilpasd | so just want to know @ it | |
| 14:51:27 | mriedem | all move operations are user-initiated | |
| 14:51:56 | mriedem | as far as i know anyway | |
| 14:52:00 | shilpasd | ok, as of now verifying SHELVE + SHELVE with offload + UNSHELVE + REBUILD + RESIZE + RESIZE REVERT + EVACUATION + COLD MIGRATION + COLD MIGRATION REVERT + LIVE MIGRATION | |
| 14:52:10 | shilpasd | just list if i missed anything | |
| 14:52:17 | mriedem | by rebuild i assume you mean evacuate | |
| 14:52:27 | mriedem | rebuild (the server action in the api) isn't a move, | |
| 14:52:29 | mriedem | but evacuate is | |
| 14:52:32 | efried | brinzhang: I'm here now, what's up? | |
| 14:52:43 | mriedem | evacuate = rebuild on another host | |
| 14:52:51 | shilpasd | rebuild using another image | |
| 14:53:00 | mriedem | rebuild + a new image is not a move | |
| 14:53:10 | mriedem | it's rebuilding the server's root disk image on the same host | |
| 14:53:11 | bauzas | mriedem: not sure I understood your point in https://bugs.launchpad.net/nova/+bug/1793569/comments/5 | |
| 14:53:12 | openstack | Launchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza) | |
| 14:53:37 | mriedem | also, shelve w/o offload and then unshelve is also not a move operation, | |
| 14:53:45 | bauzas | mriedem: do you want heal_allocations to support this or the "placement audit' rather ? | |
| 14:53:48 | mriedem | if the instance is shelved but not offloaded, and then the user unshelves, it's just unshelved on the same host | |
| 14:53:52 | shilpasd | mriedem: ok, noted | |
| 14:54:31 | shilpasd | mriedem: what @ resize | |
| 14:55:14 | shilpasd | its move operation, right, since resizing on another host also | |
| 14:55:26 | mriedem | shilpasd: maybe :) | |
| 14:55:43 | mriedem | unless nova is configured with allow_resize_to_same_host and the scheduler picks the same host the instance is already one, | |
| 14:55:55 | mriedem | which is possible in a small edge site or if the server is in a strict affinity group and can't be moved | |
| 14:56:17 | mriedem | *already on | |
| 14:56:51 | shilpasd | got it | |
| 14:56:52 | mriedem | https://bugs.launchpad.net/nova/+bug/1790204 is all about that problem | |
| 14:56:53 | openstack | Launchpad bug 1790204 in OpenStack Compute (nova) "Allocations are "doubled up" on same host resize even though there is only 1 server on the host" [High,Triaged] | |
| 14:58:00 | mriedem | bauzas: i think i meant to say "nova-manage placement audit" there, | |
| 14:58:18 | mriedem | since heal_allocations doesn't report on things really, nor does it delete allocations, it only adds allocations for instances (not migrations) that are missing | |
| 14:58:51 | bauzas | mriedem: ack, will add this there then | |
| 15:00:27 | mriedem | i went on to continue talking about heal_allocations but idk, it's a blur | |
| 15:00:47 | shilpasd | mriedem: one more query, i have NFS configuration, and performing resize on another host, and it goes for creating a instance data file on the dest system via SSH | |
| 15:00:59 | shilpasd | refer code at https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L8861 | |
| 15:01:37 | shilpasd | mriedem: during shared resource provider check, why this check is necessary? | |
| 15:02:05 | shilpasd | _is_storage_shared_with() | |
| 15:03:26 | mriedem | shilpasd: it may be ssh or rsync, it depends on config https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.remote_filesystem_transport | |
| 15:03:34 | mriedem | the default is ssh | |
| 15:04:23 | mriedem | i'm less familiar with this code, but for one we don't have shared storage provider support in the libvirt driver anyway, | |
| 15:04:58 | mriedem | but this is presumably one of the things we could replace if we had compute nodes modeled in a shared storage aggregate and we could avoid the "temp file create" tests and such for shared storage | |