Cue is one of the latest and most complex AI development tool features developed by TRAE. It extends code completion capabilities beyond the cursor, covering the entire workspace. This feature can understand developers' intentions and work in the background, providing editing suggestions that help complete tasks faster and more comprehensively.
Core Challenges
Building Cue faced three major research challenges:
-
Understanding user intent
-
Determining change locations
-
Determining how to edit
Understanding User Intent
Determining what task the user is trying to complete
Challenges:
-
Non-linear Editing Histories: Developers' coding workflows are typically non-linear and complex. They may copy code blocks, paste them elsewhere and immediately modify them substantially. They may also make rapid consecutive edits, frequently switch between files or functions, or undo and redo changes when experimenting with different implementations. These non-linear workflows create messy change trajectories that can mislead models trying to infer the developer's true intentions.
-
Unintended Biases: Simply sampling training data by splitting diff blocks in a given task may introduce unintended biases into the model. For example, the model might learn to avoid touching parts of the codebase that already include recent changes, but these areas might be precisely where further edits are needed.
-
Intention Hallucination: Models may experience intention hallucination, suggesting changes not directly related to the user's recent edits. This happens when the model tries to be too proactive, aiming to cover all possible relevant edits (high recall), which can lead to noisy and disruptive suggestions. Conversely, if the model is too conservative—focusing only on highly confident suggestions (high precision)—it may become passive, missing opportunities to assist the developer.
Solutions:
-
Simulating Common Editing Scenarios: Develop relevant algorithms to simulate realistic editing scenarios that reflect common developer behaviors by analyzing commit messages, initial commit states, and final commit states.
-
Optimizing Diff Granularity: Teach the model to read fine-grained editing events while carefully optimizing the granularity of diffs presented in prompts. If the model sees too fine-grained diffs, it might be distracted by noise in the user's editing history. If it sees too coarse-grained diffs, it becomes difficult to distinguish between new and old changes.
-
Avoiding Undoing User's Recent Changes: Discovered that models sometimes have a strong tendency to undo a user's recent changes, which can lead to frustrating experiences. Prevent this behavior through careful curation of training data.
Determining Change Locations (Deciding where to make changes)
Challenges:
-
Scalability: The localization mechanism needs to be scalable, able to efficiently handle large codebases without consuming excessive resources.
-
Speed: Extremely fast speed is needed to support highly interactive usage patterns, providing instant suggestions as users make new changes.
-
Relevance: Must accurately identify relevant locations, neither overwhelming users with unnecessary suggestions nor missing important ones.
Solutions:
-
Fast Localization with a Trained Retriever: Achieved a balance of speed and accuracy by combining retrieval infrastructure with a retriever model specifically designed to identify code locations that might need updating.
-
Efficiency and Scalability: This approach is highly scalable, capable of running on large monolithic codebases containing tens of thousands of files without significant performance degradation.
-
Editing Surrounding Code: Code around the user's cursor is always added to the candidate location list and processed first, ensuring immediate and contextually relevant suggestions.
Determining How to Edit
Executing changes accurately and efficiently
Challenges:
-
Complex Edits Beyond Cursor Insertions: Existing models are not good at making large-scale changes beyond simple insertions next to the user's cursor.
-
Latency Constraints: Generating these edits should be fast, allowing real-time suggestions without significant delay.
-
Codebase Awareness: Recommendations need to match the project's coding standards, conventions, and correctly use custom APIs, requiring the model to understand the context of the entire codebase.
Solutions:
-
Novel Diff Format (WIP): Teach the model a specialized diff format that is both compact and can be clearly applied to the original code. This format allows the model to concisely represent complex edits, minimizing the number of tokens generated, and enabling efficient processing of large files. This reduces latency from seconds to hundreds of milliseconds.
-
Codebase-Aware Suggestions: Leverage powerful Retrieval-Augmented Generation (RAG) infrastructure to add codebase-specific context to Cue. By retrieving relevant parts of the codebase, the model can make suggestions that are consistent with project-specific coding standards and correctly interact with custom APIs.
Technical Achievements
By addressing these three core AI challenges, Cue has achieved:
Accurate Intent Understanding: Capturing developer intent in messy editing histories
Efficient Location Identification: Accurately locating relevant change positions in large codebases
Fast and Accurate Edit Generation: Generating accurate and contextually appropriate edits with minimal latency
Core Technical Features
Key Technical Implementation Points
1. Specially Trained Locator Model: For quickly identifying code locations that need updates
2. Retrieval-Augmented Generation (RAG): Context-aware edits based on real-time codebase indexing
3. Novel Diff Decoding Scheme: Efficiently representing complex edits, reducing latency from seconds to hundreds of milliseconds
4. Editing History Analysis: Inferring user intent from recent editing history
5. Scalable Architecture: Supporting large codebases with tens of thousands of files
Supported Application Scenarios
-
Refactoring Operations: Field additions, method signature changes, class renaming, etc.
-
Dependency Updates: Automatically identifying and updating dependencies in relevant files
-
Test Synchronization: Ensuring test code stays consistent with main code
-
API Change Propagation: Updating API calls and implementations across files
-
Iterative Development Support: Providing real-time suggestions during code iteration
Future Development Plans
Although we have achieved good results with Cue, there is still significant room for improvement.
Larger Scale Modifications
-
From Small Commits to Large PRs: Enhance the model's ability to understand broader contexts and dependencies, enabling it to assist with larger PRs and major refactoring tasks
-
Understanding Complex Dependencies: Improve the model's ability to understand and navigate complex code dependencies across multiple files and modules
-
Batch Editing Support: Enable the feature to simultaneously suggest and apply consistent changes across multiple files, reducing time spent on repetitive tasks
More Knowledge Updates and RL Based on Real User Feedback Signals
-
More Efficient Knowledge Updates: Rapid framework, API updates, and knowledge point iterations can be quickly refreshed through online user data mining in addition to traditional RAG methods
-
Reinforcement Learning from Real User Feedback: With accumulated data on user adoption, rejection, and post-adoption edits, building reinforcement learning based on user feedback signals can continuously improve the model's accuracy in Cue scenarios
Better Integration with Chat and Agents
-
Enhanced Context Understanding: Chat functionality can provide additional context to make the next edit suggestion more precise, especially when there might be multiple potential edits
-
Interactive Problem Solving: Developers can use chat to ask for explanations, request alternative solutions, or clarify the intent behind suggestions
-
Unified Development Environment: Combining these features creates a smarter and more supportive environment, further reducing friction and increasing productivity
Conclusion
Cue represents a significant advancement in developer AI tools, creating an intelligent assistant capable of understanding the chain reactions of code changes by addressing three core AI challenges (understanding intent, locating changes, executing edits). It not only reduces manual work but also improves development efficiency, allowing developers to focus on problem-solving rather than tedious code update tasks.
As the feature continues to be refined and deeply integrated with other tools, Cue is redefining the possible boundaries of developer AI, becoming an indispensable part of the modern software development toolkit.