MoE models make local AI more accessible on hardware that most people actually have ...
GPUs are fast, but they have limited RAM. Unified memory machines are big, but they have less bandwidth.