Efficiency gains on consumer hardware

According to Tether, the new framework significantly reduces both memory and computational requirements. This makes it possible to fine-tune advanced AI models on widely available consumer devices, including standard laptops, GPUs from AMD, Intel, and Apple, as well as modern smartphones.

Benchmark results highlight a substantial efficiency improvement: the BitNet-1B model requires up to 77.8% less video memory (VRAM) compared to traditional 16-bit models such as Gemma or Qwen. This reduction not only lowers hardware constraints but also expands the range of devices capable of handling AI workloads, effectively bringing advanced model development closer to individual developers and smaller teams.

In practical terms, this means that tasks previously limited to data centers can now be executed on personal hardware, reducing both operational costs and reliance on centralized infrastructure. This shift could significantly impact how AI tools are built, tested, and deployed across industries.

Real-world tests on iPhone 16 and Samsung S25

Tether demonstrated the framework’s capabilities using flagship mobile devices. A BitNet model with 125 million parameters was fine-tuned on a Samsung S25 in approximately ten minutes using a biomedical dataset, showcasing both speed and efficiency.

On the iPhone 16, the team managed to fine-tune models with up to 13 billion parameters — a level typically associated with high-performance computing environments. Additionally, inference performance on mobile GPUs outperformed CPU-based processing by a factor of two to eleven times, underlining the growing relevance of mobile hardware in AI workflows.

These results suggest that mobile devices are no longer limited to lightweight AI applications but can increasingly handle more complex model operations, opening the door to on-device AI solutions in fields such as healthcare, finance, and real-time analytics.

A step toward decentralized AI infrastructure

Tether CEO Paolo Ardoino emphasized the broader implications of this development, noting that reliance on centralized infrastructure for AI training could limit innovation and concentrate technological power in the hands of a few major players. By contrast, QVAC Fabric is designed to decentralize access to AI tools and allow users to retain control over their data locally.

Another key innovation is support for LoRA fine-tuning of 1-bit LLMs on non-Nvidia hardware. This reduces dependence on specific chip manufacturers and introduces greater flexibility into the AI ecosystem, potentially encouraging more competition and innovation across hardware platforms.

By enabling efficient model training across diverse devices, the framework also aligns with broader trends toward edge computing and distributed AI systems, where data processing happens closer to the source rather than in centralized servers.

Conclusion

QVAC Fabric represents a meaningful step toward democratizing artificial intelligence. By drastically lowering hardware requirements, Tether is making advanced AI development accessible to a wider range of users — from independent developers to smaller enterprises.

If widely adopted, this approach could accelerate the shift toward decentralized AI infrastructure, reduce costs across the industry, and reshape how and where large language models are trained and deployed. In the longer term, it may also contribute to a more balanced and competitive AI landscape, where innovation is not limited by access to expensive computing resources.