Codegen
Codegen is a python library for manipulating codebases.
It provides a scriptable interface to a powerful, multi-lingual language server built on top of Tree-sitter.
Codegen handles complex refactors while maintaining correctness, enabling a broad set of advanced code manipulation programs.
Installation
What can I do with Codegen?
Codegen enables you to programmatically manipulate code with scale and precision.
Call graph visualization for modal/modal-client/_Client
View source code on modal/modal-client. View codemod on codegen.sh
Common use cases include:
Visualize Your Codebase
Generate interactive visualizations of your codebase’s structure, dependencies, and relationships.
Mine Codebase Data
Create high-quality training data for fine-tuning LLMs on your codebase.
Eliminate Feature Flags
Add, remove, and update feature flags across your application.
Organize Your Codebase
Restructure files, enforce naming conventions, and improve project layout.
Get Started
Get Started
Follow our step-by-step tutorial to start manipulating code with Codegen.
Tutorials
Learn how to use Codegen for common code transformation tasks.
View on GitHub
Star us on GitHub and contribute to the project.
Join our Slack
Get help and connect with the Codegen community.
Why Codegen?
Many software engineering tasks - refactors, enforcing patterns, analyzing control flow, etc. - are fundamentally programmatic operations. Yet the tools we use to express these transformations often feel disconnected from how we think about code.
Codegen was engineered backwards from real-world refactors we performed for enterprises at Codegen, Inc.. Instead of starting with theoretical abstractions, we built the set of APIs that map directly to how humans and AI think about code changes:
- Natural Mental Model: Express transformations through high-level operations that match how you reason about code changes, not low-level text or AST manipulation.
- Clean Business Logic: Let the engine handle the complexities of imports, references, and cross-file dependencies.
- Scale with Confidence: Make sweeping changes across large codebases consistently across Python, TypeScript, JavaScript, and React.
As AI becomes increasingly sophisticated, we’re seeing a fascinating shift: AI agents aren’t bottlenecked by their ability to understand code or generate solutions. Instead, they’re limited by their ability to efficiently manipulate codebases. The challenge isn’t the “brain” - it’s the “hands.”
We built Codegen with a key insight: future AI agents will need to “act via code,” building their own sophisticated tools for code manipulation. Rather than generating diffs or making direct text changes, these agents will:
- Express transformations as composable programs
- Build higher-level tools by combining primitive operations
- Create and maintain their own abstractions for common patterns
This creates a shared language that both humans and AI can reason about effectively, making code changes more predictable, reviewable, and maintainable. Whether you’re a developer writing a complex refactoring script or an AI agent building transformation tools, Codegen provides the foundation for expressing code changes as they should be: through code itself.
Was this page helpful?