Basic Concepts¶
Understanding PanPath's architecture and design principles will help you use it effectively.
Core Principles¶
1. Unified Interface¶
PanPath provides a single, consistent API that works across different storage backends:
from panpath import PanPath
# All these use the same interface
local = PanPath("/tmp/file.txt")
s3 = PanPath("s3://bucket/file.txt")
gcs = PanPath("gs://bucket/file.txt")
azure = PanPath("az://container/file.txt")
# Same operations work on all
for path in [local, s3, gcs, azure]:
path.write_text("Same API")
content = path.read_text()
print(path.exists())
2. Pathlib Compatibility¶
For local files, PanPath is a drop-in replacement for pathlib.Path:
from pathlib import Path
from panpath import PanPath
# These work identically for local paths
pathlib_path = Path("/tmp/file.txt")
pan_path = PanPath("/tmp/file.txt")
# Same operations
pathlib_path.write_text("Hello")
pan_path.write_text("Hello")
# Same properties
assert pathlib_path.name == pan_path.name
assert pathlib_path.suffix == pan_path.suffix
3. Unified Sync and Async¶
Every path class provides both synchronous and asynchronous methods:
from panpath import PanPath
# Create a path - same class for both sync and async
path = PanPath("s3://bucket/file.txt")
# Synchronous methods (blocks until complete)
content = path.read_text() # Blocks
# Asynchronous methods use a_ prefix (non-blocking)
content = await path.a_read_text() # Non-blocking
# Both sync and async methods available on same instance
path.write_text("sync write") # Sync
await path.a_write_text("async write") # Async
Architecture¶
Path Resolution¶
PanPath uses URI schemes to route to the appropriate backend:
graph TD
A[PanPath] --> B{Parse URI}
B -->|No scheme or file://| C[LocalPath]
B -->|s3://| D[S3Path]
B -->|gs://| E[GSPath]
B -->|az:// or azure://| F[AzurePath]
C --> G["Sync methods: read_text(), write_text(), ..."]
D --> G
E --> G
F --> G
C --> H["Async methods: a_read_text(), a_write_text(), ..."]
D --> H
E --> H
F --> H
Client Management¶
Cloud clients are created lazily and reused:
from panpath import PanPath
# First S3 path creates a client
path1 = PanPath("s3://bucket1/file.txt")
# Second S3 path reuses the same client
path2 = PanPath("s3://bucket2/file.txt")
# Different backend creates a different client
path3 = PanPath("gs://bucket/file.txt")
Performance
Client reuse means you don't pay the initialization cost for each path instance.
Registry System¶
Path classes are registered by URI scheme:
from panpath.registry import register_path_class, get_path_class
from panpath.s3_path import S3Path
# Registration (automatic for built-in backends)
register_path_class("s3", S3Path)
# Retrieval (used internally)
path_class = get_path_class("s3") # Returns S3Path
Path Classes¶
Unified Path Classes¶
from panpath import PanPath
from panpath.s3_path import S3Path
# Factory pattern - PanPath returns appropriate class
path = PanPath("s3://bucket/file.txt")
print(type(path)) # <class 'panpath.s3_path.S3Path'>
# All methods available on same instance
content = path.read_text() # Sync
content = await path.a_read_text() # Async
# Type checking
isinstance(path, PanPath) # True (inherits from PanPath)
isinstance(path, S3Path) # True (actual type)
Async Methods
All async methods are prefixed with a_ for easy identification:
read_text()→a_read_text()write_bytes()→a_write_bytes()exists()→a_exists()iterdir()→a_iterdir()
Storage Concepts¶
Local Paths¶
Local paths represent files and directories on the filesystem:
from panpath import PanPath
path = PanPath("/tmp/file.txt")
# or
path = PanPath("file:///tmp/file.txt")
# Supports all pathlib operations
path.resolve()
path.absolute()
path.expanduser()
Cloud Paths¶
Cloud paths use URI schemes to represent cloud storage objects:
from panpath import PanPath
# Format: scheme://bucket_or_container/key_or_path
s3 = PanPath("s3://my-bucket/path/to/object.txt")
gcs = PanPath("gs://my-bucket/path/to/object.txt")
azure = PanPath("az://my-container/path/to/blob.txt")
Key differences from local paths:
- No absolute/relative distinction: All cloud paths are absolute
- No symlinks: Cloud storage doesn't support symbolic links
- Different permissions model: Uses cloud IAM instead of filesystem permissions
- No hard links: Each object is independent
Buckets and Containers¶
The first component after the scheme is the bucket (S3/GCS) or container (Azure):
from panpath import PanPath
s3_path = PanPath("s3://my-bucket/folder/file.txt")
print(s3_path.parts) # ('s3://my-bucket', 'folder', 'file.txt')
# Cloud-specific properties
print(s3_path.cloud_prefix) # s3://my-bucket
print(s3_path.key) # folder/file.txt
Operations¶
Synchronous Operations¶
Block until completion:
from panpath import PanPath
path = PanPath("s3://bucket/file.txt")
# These block the thread
content = path.read_text()
path.write_text("new content")
exists = path.exists()
items = list(path.iterdir())
Use when:
- Writing simple scripts
- Working in synchronous frameworks (Flask, Django)
- Operations are infrequent
- Code simplicity is more important than concurrency
Asynchronous Operations¶
Return coroutines that can be awaited:
from panpath import PanPath
import asyncio
async def main():
path = PanPath("s3://bucket/file.txt")
# These are non-blocking
content = await path.a_read_text()
await path.a_write_text("new content")
exists = await path.a_exists()
items = await path.a_iterdir()
asyncio.run(main())
Use when:
- Building async applications (FastAPI, aiohttp)
- Need high concurrency
- Performing many I/O operations
- Want better resource utilization
Parallel Async Operations¶
Async mode enables concurrent operations:
import asyncio
from panpath import PanPath
async def download_all(urls):
# Create async paths
paths = [PanPath(url) for url in urls]
# Download all concurrently
contents = await asyncio.gather(*[p.a_read_text() for p in paths])
return contents
urls = [
"s3://bucket/file1.txt",
"s3://bucket/file2.txt",
"s3://bucket/file3.txt",
]
asyncio.run(download_all(urls))
Error Handling¶
Common Exceptions¶
from panpath import PanPath
from panpath.exceptions import (
PanPathException,
PathNotFoundError,
PermissionError,
)
path = PanPath("s3://bucket/nonexistent.txt")
try:
content = path.read_text()
except PathNotFoundError:
print("File not found")
except PermissionError:
print("Access denied")
except PanPathException as e:
print(f"Other error: {e}")
Backend-Specific Errors¶
Cloud backends may raise provider-specific errors:
from panpath import PanPath
import botocore.exceptions
path = PanPath("s3://bucket/file.txt")
try:
content = path.read_text()
except botocore.exceptions.NoCredentialsError:
print("AWS credentials not configured")
except botocore.exceptions.ClientError as e:
print(f"AWS error: {e}")
Best Practices¶
1. Use Type Hints¶
from panpath import PanPath
from pathlib import Path
def process_file(path: PanPath) -> str:
"""Can be sync or async path."""
return path.read_text()
async def process_async(path: PanPath) -> str:
"""Must be async path."""
return await path.a_read_text()
def process_local(path: Path | PanPath) -> str:
"""Accept pathlib.Path or PanPath."""
if isinstance(path, Path):
path = PanPath(str(path))
return path.a_read_text()
2. Handle Optional Dependencies¶
from panpath import PanPath
def get_data(uri: str) -> str:
try:
path = PanPath(uri)
return path.read_text()
except ImportError as e:
raise RuntimeError(
f"Cloud backend not installed: {e}\n"
f"Install with: pip install panpath[s3]"
)
3. Use Context Managers¶
from panpath import PanPath
# Good: File is automatically closed
with PanPath("s3://bucket/file.txt").open("r") as f:
content = f.read()
# Also good for async
from panpath import PanPath
async def read_file():
async with PanPath("s3://bucket/file.txt").a_open("r") as f:
content = await f.a_read()
4. Prefer Bulk Operations¶
from panpath import PanPath
# Less efficient: Individual copies
src_dir = PanPath("s3://bucket/data/")
dst_dir = PanPath("gs://other/data/")
for item in src_dir.iterdir():
item.copy(dst_dir / item.name)
# More efficient: Use copytree
src_dir.copytree(dst_dir)
Next Steps¶
- Local Paths Guide - Learn about local filesystem operations
- Cloud Storage Guide - Cloud-specific features
- Async Operations Guide - Deep dive into async
- API Reference - Complete API documentation