Introducing FOMOE: Fast Opportunistic Mixture Of Experts (pronounced fomo). The problem: Large Mixture of Experts (MoEs) need a lot of memory for weights (hundreds of GBs), which are typically stored in flash memory (eg NVMe). During inference, only a small fraction of these weights are needed, howe