How soon before a real % of LLM queries are done via local AI models running webGPU in-browser, and ...
Local AI models running via WebGPU in browsers could handle a large share of simple LLM queries, reducing reliance on cloud-based SOTA models, but performance and ecosystem limitations remain.
入选理由:超过70%的LLM查询是简单任务(如摘要、翻译),可由轻量本地模型处理。