When working with large codebases, having clear documentation about each directory’s purpose and contents is crucial. This guide shows how to use Codegen and AI to automatically generate a hierarchical README that explains your codebase structure.

Generating Directory READMEs

Here’s how to recursively generate README files for each directory using AI:

def generate_directory_readme(directory):
    # Skip non-source directories
    if any(skip in directory.name for skip in [
        'node_modules', 'venv', '.git', '__pycache__', 'build', 'dist'
    ]):
        return
        
    # Collect directory contents for context
    files = [f for f in directory.files if f.is_source_file]
    functions = directory.functions
    classes = directory.classes
    
    # Create context for AI
    context = {
        "Directory Name": directory.name,
        "Files": [f"{f.name} ({len(f.source.splitlines())} lines)" for f in files],
        "Functions": [f.name for f in functions],
        "Classes": [c.name for c in classes]
    }
    
    # Generate directory summary using AI
    readme_content = codebase.ai(
        prompt="""Generate a README section that explains this directory's:
        1. Purpose and responsibility
        2. Key components and their roles
        3. How it fits into the larger codebase
        4. Important patterns or conventions
        
        Keep it clear and concise.""",
        target=directory,
        context=context
    )
    
    # Add file listing
    if files:
        readme_content += "\n\n## Files\n"
        for file in files:
            # Get file summary from AI
            file_summary = codebase.ai(
                prompt="Describe this file's purpose in one line:",
                target=file
            )
            readme_content += f"\n### {file.name}\n{file_summary}\n"
            
            # List key components
            if file.classes:
                readme_content += "\nKey classes:\n"
                for cls in file.classes:
                    readme_content += f"- `{cls.name}`\n"
            if file.functions:
                readme_content += "\nKey functions:\n"
                for func in file.functions:
                    readme_content += f"- `{func.name}`\n"
    
    # Create or update README.md
    readme_path = f"{directory.path}/README.md"
    if codebase.has_file(readme_path):
        readme_file = codebase.get_file(readme_path)
        readme_file.edit(readme_content)
    else:
        readme_file = codebase.create_file(readme_path)
        readme_file.edit(readme_content)
    
    # Recursively process subdirectories
    for subdir in directory.subdirectories:
        generate_directory_readme(subdir)

# Generate READMEs for the entire codebase
generate_directory_readme(codebase.root_directory)

This will create a hierarchy of README.md files that explain each directory’s purpose and contents. For example:

# src/
Core implementation directory containing the main business logic and data models.
This directory is responsible for the core functionality of the application.

## Key Patterns
- Business logic is separated from API endpoints
- Models follow the Active Record pattern
- Services implement the Repository pattern

## Files

### models.py
Defines the core data models and their relationships.

Key classes:
- `User`
- `Product`
- `Order`

### services.py
Implements business logic and data access services.

Key classes:
- `UserService`
- `ProductService`
Key functions:
- `initialize_db`
- `migrate_data`

Customizing the Generation

You can customize the README generation by modifying the prompts and adding more context:

def get_directory_patterns(directory):
    """Analyze common patterns in a directory"""
    patterns = []
    
    # Check for common file patterns
    if any('test_' in f.name for f in directory.files):
        patterns.append("Contains unit tests")
    if any('interface' in f.name.lower() for f in directory.files):
        patterns.append("Uses interface-based design")
    if any(c.is_dataclass for c in directory.classes):
        patterns.append("Uses dataclasses for data models")
        
    return patterns

def generate_enhanced_readme(directory):
    # Get additional context
    patterns = get_directory_patterns(directory)
    dependencies = [imp.module for imp in directory.imports]
    
    # Enhanced context for AI
    context = {
        "Common Patterns": patterns,
        "Dependencies": dependencies,
        "Parent Directory": directory.parent.name if directory.parent else None,
        "Child Directories": [d.name for d in directory.subdirectories],
        "Style": "technical but approachable"
    }
    
    # Generate README with enhanced context
    # ... rest of the generation logic

Best Practices

  1. Keep Summaries Focused: Direct the AI to generate concise, purpose-focused summaries.

  2. Include Key Information:

    • Directory purpose
    • Important patterns
    • Key files and their roles
    • How components work together
  3. Maintain Consistency: Use consistent formatting and structure across all READMEs.

  4. Update Regularly: Regenerate READMEs when directory structure or purpose changes.

  5. Version Control: Commit generated READMEs to track documentation evolution.

The AI-generated summaries are a starting point. Review and refine them to ensure accuracy and completeness.

Be mindful of sensitive information in your codebase. Configure the generator to skip sensitive files or directories.

Was this page helpful?