Contributing

Contributing to pipen¶

Thank you for your interest in contributing to pipen! This document provides guidelines and instructions for contributing to the project.

Table of Contents¶

Development Setup
Code Style
Testing
Documentation
Pull Request Process
Reporting Issues

Development Setup¶

Prerequisites¶

Python 3.9 or higher
Poetry for dependency management
Git

Setting Up the Development Environment¶

Fork and clone the repository

# Fork the repository on GitHub first
git clone https://github.com/YOUR_USERNAME/pipen.git
cd pipen

Install development dependencies

# Install using Poetry
poetry install --all-extras

# Or install the development group specifically
poetry install --with dev,docs,example

Activate the virtual environment

# Using Poetry shell
poetry shell

# Or use the virtual environment path
source $(poetry env info --path)/bin/activate

Install pre-commit hooks

pre-commit install

Verify the setup

# Run tests to ensure everything is working
pytest tests/

# Build documentation
cd docs && mkdocs build

Development Workflow¶

# Create a new branch for your changes
git checkout -b feature/your-feature-name

# Make your changes
# ...

# Run tests
pytest tests/

# Run linting
flake8 pipen
mypy -p pipen

# Format code
black pipen

# Commit changes
git add .
git commit -m "Add your feature"

# Push to your fork
git push origin feature/your-feature-name

Code Style¶

Formatting¶

We use Black for code formatting:

# Format code
black pipen

# Check formatting without making changes
black --check pipen

Configuration: - Line length: 88 characters - Target Python versions: 3.9, 3.10, 3.11, 3.12

Linting¶

We use flake8 for code linting:

flake8 pipen

Type Checking¶

We use mypy for static type checking:

mypy -p pipen

Configuration: - Ignore missing imports from external packages - Allow redefinition in some cases - Strict optional mode is disabled for flexibility

Docstring Format¶

We use Google-style docstrings with Args, Returns, Raises, and Attributes sections:

def process_data(input_file: str, output_dir: str, verbose: bool = False) -> dict:
    """Process a data file and save results to output directory.

    This function reads the input file, processes the data, and saves
    the results to the specified output directory.

    Args:
        input_file: Path to the input data file.
        output_dir: Directory where processed results will be saved.
        verbose: If True, print detailed progress information.

    Returns:
        Dictionary containing processing statistics and output file paths.

    Raises:
        FileNotFoundError: If input_file does not exist.
        ValueError: If input_file is malformed.

    Examples:
        >>> result = process_data("data.csv", "output")
        >>> result['processed_count']
        100
    """
    pass

For classes, include a description and list important attributes:

class DataProcessor:
    """Process and transform data files.

    This class provides methods for reading, transforming, and saving
    data in various formats.

    Attributes:
        processed_count: Number of files processed.
        errors: List of errors encountered during processing.
        config: Configuration dictionary for processing parameters.
    """
    pass

Pre-commit Hooks¶

We use pre-commit hooks to automatically run checks before committing:

trailing-whitespace: Remove trailing whitespace
end-of-file-fixer: Ensure files end with a newline
check-yaml: Validate YAML syntax
check-added-large-files: Prevent large files from being committed
versionchecker: Ensure version consistency between pyproject.toml and pipen/version.py
mypy: Run type checking
pytest: Run tests
flake8: Run linting

Pre-commit hooks are configured in .pre-commit-config.yaml and automatically run on commits for files in pipen/ directory (excludes tests/, examples/, and docs/).

Testing¶

Running Tests¶

We use pytest for testing:

# Run all tests
pytest tests/

# Run tests with coverage
pytest --cov=pipen --cov-report=term-missing

# Run specific test file
pytest tests/test_pipen.py

# Run with verbose output
pytest -vv tests/

# Run specific test
pytest tests/test_pipen.py::test_pipen_init

Test Configuration¶

Our test configuration (from pyproject.toml):

Parallel execution: pytest-xdist with -n auto for automatic parallelization
Distribution mode: --dist loadgroup to run dependent tests together
Coverage: pytest-cov for code coverage reporting
Async support: pytest-asyncio for async test cases
Warnings: Treat UserWarning as errors (-W error::UserWarning)

Writing Tests¶

Place tests in the tests/ directory following the structure:

# tests/test_pipen.py
import pytest
from pipen import Pipen, Proc

def test_pipen_init():
    """Test that Pipen initializes correctly."""
    pipeline = Pipen()
    assert pipeline.name == "Pipen"

@pytest.mark.asyncio
async def test_async_pipeline():
    """Test async pipeline execution."""
    pipeline = Pipen()
    result = await pipeline.run_async()
    assert result is True

Test Coverage¶

We aim for high test coverage. The current coverage is tracked on Codacy.

To check coverage locally:

pytest --cov=pipen --cov-report=html
open htmlcov/index.html  # macOS
# or
xdg-open htmlcov/index.html  # Linux

Documentation¶

Building Documentation¶

Documentation is built with MkDocs:

cd docs
mkdocs build        # Build to site/
mkdocs serve        # Serve at http://127.0.0.1:8000
mkdocs gh-deploy    # Deploy to GitHub Pages

Documentation Structure¶

docs/
├── index.md              # Symlink to ../README.md
├── basics.md             # Pipeline layers and folder structure
├── defining-proc.md      # Process definition guide
├── running.md            # Pipeline execution guide
├── configurations.md     # Configuration documentation
├── caching.md            # Job caching mechanism
├── channels.md           # Channel system documentation
├── input-output.md      # Input/output specification
├── error.md             # Error handling strategies
├── templating.md        # Template engine documentation
├── script.md            # Script configuration
├── scheduler.md         # Scheduler backends
├── cloud.md             # Cloud support
├── proc-group.md        # Process groups
├── plugin.md            # Plugin development
├── cli.md               # CLI tool documentation
├── examples.md          # Example documentation
├── CHANGELOG.md         # Version history
├── style.css            # Custom styling
└── script.js            # Custom JavaScript

API Documentation¶

API documentation is auto-generated from docstrings using the mkapi-fix plugin.

To ensure your API documentation is properly generated:

Write Google-style docstrings for all public classes, functions, and methods
Include Args, Returns, Raises, and Attributes sections where applicable
Add Examples sections for complex functions
Ensure type hints are present in function signatures

Adding New Documentation¶

Create a new .md file in the docs/ directory
Update the nav section in mkdocs.yml to include your new page
Add cross-references using [](#anchor) syntax
Use code blocks with language identifiers: python,bash, etc.
Use admonition blocks for notes, warnings, and tips:

!!! note
    This is a note block.

!!! warning
    This is a warning.

!!! tip
    This is a tip.

Documentation Requirements¶

All new public APIs must have docstrings
Breaking changes must be documented in CHANGELOG.md
New features should include examples in the documentation
Visual diagrams should have descriptive alt text for accessibility

Pull Request Process¶

Before Submitting a PR¶

Update documentation
Add or update docstrings for changed code
Update relevant documentation files
Add examples for new features
Run all tests
```
pytest tests/
```

Run linting and type checking

flake8 pipen
mypy -p pipen
black --check pipen

Build documentation
```
cd docs && mkdocs build
```
Update CHANGELOG.md
Add an entry under the appropriate version section
Use the format: [<type>] <description> ([#issue])
Types: added, changed, deprecated, removed, fixed, security

Submitting a PR¶

Push your branch to your fork
Open a pull request on GitHub
Fill in the PR template with:
A clear description of changes
Related issues (if any)
Screenshots for UI changes (if applicable)
Testing performed
Documentation updates

PR Review Process¶

Maintainers will review your PR
Address review comments by pushing additional commits
Keep the PR focused on a single change
Squash commits if requested by maintainers
Update based on review feedback

Merge Criteria¶

A PR can be merged when:

[ ] All tests pass
[ ] Code is properly formatted (Black)
[ ] No linting errors (flake8)
[ ] No type checking errors (mypy)
[ ] Documentation is updated
[ ] CHANGELOG.md is updated for breaking changes
[ ] At least one maintainer approves

Reporting Issues¶

Bug Reports¶

When reporting a bug, include:

Python version: python --version
pipen version: pipen --version
Minimal reproducible example: Code that demonstrates the issue
Expected behavior: What you expected to happen
Actual behavior: What actually happened (with error messages)
Environment details: OS, scheduler used, etc.

Feature Requests¶

When requesting a feature:

Use case: Explain what problem this feature solves
Proposed solution: How you envision the feature working
Alternatives considered: Other approaches you've thought of
Additional context: Any relevant context about the request

Documentation Issues¶

For documentation issues:

Page location: Which documentation page has the issue
Problem: What is incorrect, unclear, or missing
Suggestion: How it should be improved (if you have ideas)

Getting Help¶

GitHub Issues: For bug reports and feature requests
GitHub Discussions: For questions and general discussion
Documentation: https://pwwang.github.io/pipen
Examples: See the examples/ directory for usage examples

License¶

By contributing to pipen, you agree that your contributions will be licensed under the MIT License.