Earlier  
Posted Nick Remark
#openstack-nova - 2019-06-26
14:12:51 sean-k-mooney ya i think on reading there bug both would be needed
14:13:58 sean-k-mooney mriedem: amorin is fixing the fact we might be useing an outdated network_info object form the instance and stephenfin is fixing that if we fail due to the db update we never even tried to clean up the vifs
14:14:35 sean-k-mooney so to fix the downsteam bug we will need to backprot both.
14:14:47 sean-k-mooney ok this make more sense to me now.
14:20:28 amorin hey all
14:22:04 amorin the bug I faced 2 days ago was not fixed by stephenfin patch
14:22:16 amorin I found that it was something else in our code
14:23:01 amorin cc mriedem sean-k-mooney
14:23:43 mriedem mnaser: i think you just hit something like this nw info cache lost thing, so you might have input here http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007363.html
14:23:45 amorin by the way, I faced an other one, related to the patch I did:
14:23:46 amorin https://review.opendev.org/#/c/667294/
14:23:48 mriedem maciejjozefczyk: sean-k-mooney: ^
14:24:43 mriedem amorin: one step forward, two steps back :(
14:25:00 amorin yup
14:25:06 mriedem i remember a similar check was added here https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/network/base_api.py#L35
14:25:27 amorin exact
14:26:20 sean-k-mooney ok so we might need all 3 patches
14:27:04 sean-k-mooney amorin: stephenfin patch is a generalised fix to a very specific edgecase
14:28:12 sean-k-mooney amorin: what you originally tried to fix was more subtle as we were passing stale data in some cases
14:29:14 maciejjozefczyk ehh, instance_info_cache :)
14:29:41 openstackgerrit Martin Midolesov proposed openstack/nova master: Implementing graceful shutdown. https://review.opendev.org/666245
14:30:05 sean-k-mooney maciejjozefczyk: yep its awsome...
14:30:41 sean-k-mooney mriedem: out of interest why do we store the instance info cache in the db?
14:31:14 sean-k-mooney i fell like we would have fewer bugs related to it if we actully just made it an in process dict cache
14:31:42 mriedem sean-k-mooney: i'll direct your question to the people that worked on nova back in 2011 or something
14:32:26 sean-k-mooney well my next question was going to be "i assume this is because of nova networks legacy choices"
14:32:33 openstackgerrit Stephen Finucane proposed openstack/nova master: Remove no longer required "inner" methods. https://review.opendev.org/655282
14:32:34 openstackgerrit Stephen Finucane proposed openstack/nova master: Remove unused FP device creation and deletion methods. https://review.opendev.org/635433
14:32:34 openstackgerrit Stephen Finucane proposed openstack/nova master: Privsepify ipv4 forwarding enablement. https://review.opendev.org/635431
14:32:35 openstackgerrit Stephen Finucane proposed openstack/nova master: Move adding vlans to interfaces to privsep. https://review.opendev.org/635436
14:32:35 openstackgerrit Stephen Finucane proposed openstack/nova master: Privsep the ebtables modification code. https://review.opendev.org/635435
14:32:36 openstackgerrit Stephen Finucane proposed openstack/nova master: Move dnsmasq restarts to privsep. https://review.opendev.org/639280
14:32:36 openstackgerrit Stephen Finucane proposed openstack/nova master: Move iptables rule fetching and setting to privsep. https://review.opendev.org/636508
14:32:37 openstackgerrit Stephen Finucane proposed openstack/nova master: Move calls to ovs-vsctl to privsep. https://review.opendev.org/639282
14:32:37 openstackgerrit Stephen Finucane proposed openstack/nova master: Move router advertisement daemon restarts to privsep. https://review.opendev.org/639281
14:32:38 openstackgerrit Stephen Finucane proposed openstack/nova master: Move setting of device trust to privsep. https://review.opendev.org/639283
14:32:39 openstackgerrit Stephen Finucane proposed openstack/nova master: We no longer need rootwrap. https://review.opendev.org/554438
14:32:39 openstackgerrit Stephen Finucane proposed openstack/nova master: Cleanup the _execute shim in nova/network. https://review.opendev.org/639581
14:32:39 openstackgerrit Stephen Finucane proposed openstack/nova master: Move final bridge commands to privsep. https://review.opendev.org/639580
14:32:40 openstackgerrit Stephen Finucane proposed openstack/nova master: Cleanup no longer required filters and add a release note. https://review.opendev.org/639826
14:33:00 mriedem sean-k-mooney: idk, you'd have to do some digging to find out when the network info cache was introduced, i don't know if it was before quantum or not
14:33:20 mriedem but we also store bdms in the db which are essentially the same thing - a cache of volume information for the server
14:33:28 mriedem which was probably before cinder existed
14:33:41 sean-k-mooney im seeing a pattern there
14:34:17 sean-k-mooney ok well lets fix the current issue first but i think i might look into if we could remove storing it to the db
14:34:31 amorin I would love that
14:34:35 amorin :p
14:34:46 sean-k-mooney cacheing in memory in the compute agent would likely be enough
14:35:18 sean-k-mooney we would have to rebuild it every time the compute agent restarts but i think that is fine
14:37:22 sean-k-mooney actully we could use memcache to cache it too which would mean all the services would have acess to it anway its now on my todo list
14:38:02 sean-k-mooney messing up the netron policy and currpting the network info cache is what cause our ci cloud production outage at the weekend
14:45:46 mriedem TheJulia: is this a known busted job? http://logs.openstack.org/17/667417/1/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa/db33ba3/controller/logs/devstacklog.txt.gz#_2019-06-26_05_47_14_168
14:46:53 mriedem sean-k-mooney: redoing how the nw info cache works is hopefully wayyyyyyy down on your todo lits
14:46:55 mriedem *list
14:47:30 shilpasd efried: mriedem: can you tell me how to trigger live migration sync and async way, any CLI commands?
14:48:00 mriedem shilpasd: i don't know what you mean, sync and async way
14:48:58 shilpasd mriedem: means nova live-migration <instance_id>, it triggers live migration, but any another way to live migrate, any periodic call or something
14:49:33 mriedem no nova doesn't auto-live migrate things for you
14:49:51 shilpasd mriedem: i am in process of verifying all move operations on NFS changes done against https://review.opendev.org/#/c/650188/
14:50:05 shilpasd so wnat to take care of all move operations
14:50:37 shilpasd so just want to know @ it
14:51:27 mriedem all move operations are user-initiated
14:51:56 mriedem as far as i know anyway
14:52:00 shilpasd ok, as of now verifying SHELVE + SHELVE with offload + UNSHELVE + REBUILD + RESIZE + RESIZE REVERT + EVACUATION + COLD MIGRATION + COLD MIGRATION REVERT + LIVE MIGRATION
14:52:10 shilpasd just list if i missed anything
14:52:17 mriedem by rebuild i assume you mean evacuate
14:52:27 mriedem rebuild (the server action in the api) isn't a move,
14:52:29 mriedem but evacuate is
14:52:32 efried brinzhang: I'm here now, what's up?
14:52:43 mriedem evacuate = rebuild on another host
14:52:51 shilpasd rebuild using another image
14:53:00 mriedem rebuild + a new image is not a move
14:53:10 mriedem it's rebuilding the server's root disk image on the same host
14:53:11 bauzas mriedem: not sure I understood your point in https://bugs.launchpad.net/nova/+bug/1793569/comments/5
14:53:12 openstack Launchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza)
14:53:37 mriedem also, shelve w/o offload and then unshelve is also not a move operation,
14:53:45 bauzas mriedem: do you want heal_allocations to support this or the "placement audit' rather ?
14:53:48 mriedem if the instance is shelved but not offloaded, and then the user unshelves, it's just unshelved on the same host
14:53:52 shilpasd mriedem: ok, noted
14:54:31 shilpasd mriedem: what @ resize
14:55:14 shilpasd its move operation, right, since resizing on another host also
14:55:26 mriedem shilpasd: maybe :)
14:55:43 mriedem unless nova is configured with allow_resize_to_same_host and the scheduler picks the same host the instance is already one,
14:55:55 mriedem which is possible in a small edge site or if the server is in a strict affinity group and can't be moved
14:56:17 mriedem *already on
14:56:51 shilpasd got it
14:56:52 mriedem https://bugs.launchpad.net/nova/+bug/1790204 is all about that problem
14:56:53 openstack Launchpad bug 1790204 in OpenStack Compute (nova) "Allocations are "doubled up" on same host resize even though there is only 1 server on the host" [High,Triaged]
14:58:00 mriedem bauzas: i think i meant to say "nova-manage placement audit" there,
14:58:18 mriedem since heal_allocations doesn't report on things really, nor does it delete allocations, it only adds allocations for instances (not migrations) that are missing
14:58:51 bauzas mriedem: ack, will add this there then
15:00:27 mriedem i went on to continue talking about heal_allocations but idk, it's a blur
15:00:47 shilpasd mriedem: one more query, i have NFS configuration, and performing resize on another host, and it goes for creating a instance data file on the dest system via SSH
15:00:59 shilpasd refer code at https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L8861
15:01:37 shilpasd mriedem: during shared resource provider check, why this check is necessary?
15:02:05 shilpasd _is_storage_shared_with()
15:03:26 mriedem shilpasd: it may be ssh or rsync, it depends on config https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.remote_filesystem_transport
15:03:34 mriedem the default is ssh
15:04:23 mriedem i'm less familiar with this code, but for one we don't have shared storage provider support in the libvirt driver anyway,
15:04:58 mriedem but this is presumably one of the things we could replace if we had compute nodes modeled in a shared storage aggregate and we could avoid the "temp file create" tests and such for shared storage

Earlier   Later