Our Content

Research-driven builders and investors in the cybernetic economy

We contributed to Lido DAO, P2P.org, =nil; Foundation, DRPC, Neutron and invested into 150+ projects

Tech Tree: Crypto x AI

Sep 9, 2024

Outline

Privacy-Preserving RAG
Context Memory Improvements
Prompting and Prompt Lifecycle Management
Synthetic Data for Reasoning
Communication Efficient Training and Inference
Proof of Inference
Generative Models for Blockchain State Prediction
Knowledge transfer from foundation models
Conclusion

This is the third article in a series on the vision and opportunities in decentralized and distributed AI.

First part: Internet of Intelligence Second part: Frontiers of Decentralized AI The investment strategy of Cyber.Fund is straightforward: we invest in technologies and solutions that are bringing the cybernetic economy closer to reality. One of our key tools is "technology tree" research, which helps us identify promising startups by applying technical constraints theory to the market structure. Below is a partial version of the Crypto x AI tech tree.

Privacy-Preserving RAG

Retrieval-Augmented Generative (RAG) systems are combining the power of information retrieval with generative models to produce more accurate and contextually relevant outputs.

Being one of the most deployed AI technology on the market RAG systems handle more and more sensitive data. This has led to a focus on developing privacy-preserving techniques to protect user data throughout the entire RAG pipeline.

One critical area of development is client-side encryption for vector databases to ensure that sensitive information remains secure even when stored externally.

Given that client-side encryption is implemented queryable encryption for metadata becomes necessary. It includes techniques line order-preserving and order-revealing encryption. These schemes allow for range queries on encrypted data by preserving the order of data items. By making encrypted data searchable, it enables efficient information retrieval without compromising security. This method ensures that metadata and relational graphs - crucial for contextual understanding in RAG systems - can be utilised without exposing the underlying data.

Homomorphic encryption for embeddings. Encrypting embeddings while preserving the distances between them is a crucial requirement in scenarios where you want to perform similarity searches on sensitive data without exposing the actual data. Vector databases represent data as a structured collection of multidimensional vectors called "embeddings" that are not human readable. The raw data used to produce these "embeddings" can be partially recovered, making data leakage possible. It possess a significant security risk and could be eliminated if we can encrypt embeddings or make them salty.

By integrating advanced encryption techniques and thoughtful design, it is possible to safeguard user data while maintaining the functionality and effectiveness of these powerful AI systems.

Context Memory Improvements

Improving the design of context memory is essential. Effective context memory allows LLMs to retain relevant information across interactions, enhancing the continuity and relevance of responses.

The most straightforward approach here is offering larger context windows. Increasing the input token limit allows models to consider longer text sequences, which helps maintain context over extended interactions.

Additionally systems like MemGPT use methods of hierarchical attention and paging and make mode pay attention to both local and global contexts, which can help manage longer sequences efficiently. This approach was tested on the Multi-Session Chat dataset, where it outperformed fixed-context models in maintaining consistency and coherence across multiple dialogue sessions.

Another approach is to integrate theory of mind approach into the context memory to enhance the models' ability to understand and predict human intentions and mental states, thereby improving their performance in extended interactions. Temporal and symbolic versions of this approach will probably gain some traction.

Also specialised form of external memory emerges differentiable neural computers that uses a controller (typically a neural network) to interact with memory in a differentiable manner.

Recurrent memory mechanisms are combining the strengths of recurrent neural networks, which are good at handling sequential data, with transformers to improve context retention.

Lastly MemoryLLMs, which embeds a substantial, fixed-sized memory pool within the latent space and the memory pool is recurrently updated when new knowledge enters via the procedure self-update.

Prompting and Prompt Lifecycle Management

In the evolving landscape of AI, prompts have become a critical component in guiding models to produce desired outputs. As prompts grow in value, so does the necessity to protect and manage them effectively.

This has led to the development of advanced tools and methodologies aimed at ensuring the integrity, security, and efficiency of prompt usage.

The biggest problem is that prompts are not interoperable, means that prompts that work good on one LLM may show worse performance on the other, even more advanced LLMs or even on the different versions of the same LLM.

Once interoperability problem is solved it will also be possible to implement cost-aware inference that strategically manages computational resources for prompt processing, ensuring sustainable costs. This approach includes model routing - automatically selecting the most appropriate model for each task. Startups in this area are actively supported by the industry veterans.

The efforts to reduce inference costs continues with the prompt compression techniques are being developed to reduce the size and complexity of prompts without losing their effectiveness. This speeds up processing, enabling more efficient inference.

Ensuring the privacy of prompts is crucial, especially when dealing with sensitive or proprietary information. It is inevitable that LLM inference is going to happen within Trusted Executed Environments as prompts will become more and more valuable.

In some cases split interference can be used to obfuscate the prompt and prevent it from stealing. Split inference involves distributing different parts of a prompt across multiple systems and enhances security by preventing unauthorised access to the full prompt.

Advanced prompt-engineering techniques like FedBPT can sometimes replace traditional model fine-tuning. In FedBPT, clients and servers exchange prompts instead of model parameters. This approach allows for finding an optimal prompt to enhance the performance of frozen pre-trained language models, rather than altering the model itself through fine-tuning.

As the reliance on prompts in AI continues to grow, effective lifecycle management will be crucial for maintaining both security and cost-efficiency in AI operations.

Synthetic Data for Reasoning

Training models to reason effectively is a challenging aspect of advancing AI capabilities. While datasets representing static knowledge - such as factual information or domain-specific data - are relatively accessible, datasets that illustrate reasoning processes are scarce and difficult to produce. This presents a significant bottleneck in developing models that can not only know facts but also apply logic and reasoning to draw conclusions or solve problems.

Synthetic data offers a promising solution to this challenge. By generating artificial datasets that mimic complex reasoning scenarios, we can provide models with a diverse range of examples necessary for learning reasoning skills. This synthetic data can be crafted to include various logical challenges, from basic inferencing to more complex problem-solving tasks, thereby covering a broad spectrum of reasoning capabilities.

For example the biggest dataset for mathematical reasoning only contains roughly 1M math problems and includes:

260k problems and solutions from the Chinese K-12 math exam;
6k from the AMC and AIME official documents;
150k from various international and regional math olympiads.

Creating synthetic datasets for reasoning involves careful design to ensure they adequately represent the kinds of logical steps and decision-making processes we expect from human reasoning. These datasets can simulate real-world problems, dilemmas, and hypothetical situations that require nuanced understanding and logical navigation.

By leveraging synthetic data, we can significantly enhance the reasoning capabilities of AI models. This approach not only addresses the scarcity of naturally occurring reasoning datasets but also allows for controlled experimentation and refinement of reasoning algorithms. As a result, synthetic data stands as a vital tool in the quest to develop more sophisticated, reasoning-capable AI systems.

Communication Efficient Training and Inference

Efficiently training and deploying these models over the standard internet connection is a critical challenge for the Crypto x AI space as it enables decentralisation of the AI. It also enables trustless model training coalitions in a Web2 setting.

While communication-efficient training is a complex problem to solve, several promising techniques and frameworks are emerging as solutions to it.

Pipeline parallelism allows for the distribution of model training across multiple devices, minimising the communication overhead by dividing the model into segments that can be processed in parallel. This technique not only speeds up the training process but also makes better use of available hardware resources.

Locality-sensitive hashing (LSH) is another approach that helps in reducing the communication burden. For example, in scenarios where model updates are not significantly different from previous iterations, LSH can be used to detect these similarities. If the hash values of the current and previous updates are sufficiently close, the communication of the update can be skipped, thus saving bandwidth and reducing latency.

Offloading allows the training of much larger models by leveraging CPU memory and compute resources, thus overcoming the limitations of GPU memory. This is particularly useful in non-datacenter settings.

Tiling is used to manage the memory requirements of very large neural network layers that cannot fit into the GPU memory all at once. By dividing these large layers into smaller tiles, the system processes each tile sequentially, allowing the training of extremely large models without the need for more GPUs or increased memory capacity. This approach helps in handling large-scale models efficiently.

Together, these techniques do not yet provide a robust framework for the distributed training of large models over the internet connections, but lay a very good foundation for the researchers working in that direction.

Proof of Inference

Ensuring the authenticity and integrity of model outputs is becoming increasingly important in the distributed setting.

This concern is especially relevant in scenarios where the results of AI models have significant consequences, finance, or governance. One emerging solution to this challenge is the concept of Proof of Inference.

Proof of Inference refers to methods that verify the identity of an AI model and confirm that the outputs are generated by the original, unaltered model. This is crucial for maintaining trust between solution providers and users, ensuring that the results have not been tampered with or produced by a compromised or substandard model version, such as a quantised or simplified substitute.

To implement Proof of Inference effectively, the verification process must be designed to consume minimal computational resources. This efficiency is vital not only for practical deployment but also for scalability, as models often need to operate in environments with limited computing power or bandwidth. Techniques like cryptographic hashing and digital signatures are being explored to provide lightweight yet robust verification methods.

As the use of AI becomes more widespread, the development and adoption of Proof of Inference mechanisms will be crucial, reinforcing the reliability of AI systems in critical and sensitive contexts.

Generative Models for Blockchain State Prediction

Predicting the future state of the blockchain can provide significant advantages, particularly in financial applications where anticipating market movements are important.

Graph Neural Networks (GNNs) can model the relationships and interactions within the blockchain network, offering a way to predict how the state of the blockchain might evolve over time. This capability is particularly useful for tasks such as predicting transaction patterns, detecting anomalies, and understanding the impact of network changes.

Correlation Generative Adversarial Networks (GANs) are among the leading techniques being used to generate realistic financial correlation matrices. These models can capture the complex relationships between different financial assets, providing valuable insights into market behaviour and potential risks. By simulating various scenarios, Correlation GANs help traders and analysts make more informed decisions, enhancing their ability to respond to market fluctuations.

The integration of generative models like GANs and GNNs into blockchain analytics represents a significant advancement in the field. This can lead to more effective risk management, improved decision-making, and a better understanding of the underlying dynamics driving blockchain ecosystems.

Knowledge transfer from foundation models

Knowledge transfer from foundation models it's a blend of distillation (focusing on transferring knowledge between architectures) and transfer learning (transferring knowledge between tasks). Below are a couple of examples presented at ICML’24.

In the Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models the authors propose a task-oriented knowledge transfer. This involves a straightforward three-step scheme on what to learn, what to freeze, and how to select an unlabelled dataset.

In the Transferring Knowledge from Large Foundation Models to Small Downstream Models. Here, the authors suggest a learnable feature selection over the features from foundation models. The idea is to have the student model learn only the necessary features from the teacher, ignoring the rest of the knowledge. This contrasts with traditional distillation, where the student must learn everything, usually in the prediction space rather than the feature space.

Model merging (where several domain-specific LLMs are merged together skipping the training phase) has emerged as a promising approach for LLM development due to its cost-effectiveness. In the Evolutionary Optimisation of Model Merging Recipes authors present an example of such an approach that they use in their commercial product.

Conclusion

At Cyber.Fund, we are committed to supporting the development of these groundbreaking technologies. As we continue to explore the potential of AI and blockchain, we invite startups, individual researchers, and OSS contributors to join us in this journey.