Outcome focus: Specified a repo baseline with src layout, uv locking, Ruff, typed checks, pytest, dependency groups, and CI gates so Python projects begin with executable architecture.
pythonproject architectureuvruffci
The first feature is usually too early.
By the time the first endpoint, notebook, CLI command, or agent workflow lands, the project already has gravity. Imports work a certain way. Tests discover files a certain way. The package manager has a shape. The CI job either proves the installed artifact or it proves whatever happened to be on sys.path. Dependency groups are either clean or already leaking into runtime metadata.
I used to tolerate a loose first week and "clean it up later."
Later rarely came cleanly.
The flat-layout failure is the one that changed my mind. Tests passed locally because import billing resolved to the source directory sitting at the repo root. The built wheel was missing a data file and one subpackage. The test suite never noticed because it had not tested the installed package. The repo looked small enough to be casual. The packaging boundary was already broken.
The Python Packaging User Guide explains why the src/ layout helps: Python includes the current working directory on the import path, and a flat layout can accidentally import the in-development tree instead of the installed artifact. That is not theoretical. It is the kind of quiet mistake that only appears after the package leaves your laptop.
The Skeleton#
For a serious Python application or package in 2026, I want this skeleton before feature work starts:
project/
├── pyproject.toml
├── README.md
├── LICENSE
├── uv.lock
├── .python-version
├── .pre-commit-config.yaml
├── src/
│ └── app/
│ ├── __init__.py
│ ├── __main__.py
│ ├── py.typed
│ ├── api/
│ ├── domain/
│ ├── services/
│ ├── repositories/
│ ├── infrastructure/
│ ├── config/
│ └── main.py
├── tests/
│ ├── unit/
│ ├── integration/
│ └── contract/
├── docs/
├── scripts/
└── migrations/Libraries are smaller, but the same principle holds:
library/
├── pyproject.toml
├── README.md
├── uv.lock
├── src/
│ └── mylib/
│ ├── __init__.py
│ ├── py.typed
│ ├── core.py
│ ├── models.py
│ └── exceptions.py
├── tests/
└── docs/The specific folder names should follow the domain. The anti-pattern is defaulting to utils/, helpers/, shared/, and common/ because nobody wants to name the boundary.
uv As the Project Tool#
uv has become the default I would choose for new Python work because it compresses a previously scattered workflow into one tool: Python installation, virtual environments, dependency resolution, lockfiles, script execution, tool execution, and publishing. The docs describe it as a Rust-based package and project manager with a universal lockfile and Cargo-style workspaces.
The important habit is not "uv is fast," although it is. The habit is using one project entry point for environment behavior.
uv python pin 3.14
uv init --package
uv add fastapi pydantic-settings
uv add --dev ruff pytest pyright hypothesis coverage
uv lock
uv run pytestI do not want engineers hand-installing into .venv during normal work. The uv project docs recommend using uv run and uv add instead of manually modifying the managed environment. That convention gives CI and local development the same dependency story.
Dependency Groups Are Not Extras#
Dependency groups are one of the most useful packaging clarifications in the last few years. PEP 735 standardizes a [dependency-groups] table for local development environments such as tests, linting, docs, and tooling. These groups are not included in built package metadata.
That is exactly what we want.
Extras are for package users. Dependency groups are for contributors and CI.
[project]
name = "billing-platform"
version = "0.1.0"
requires-python = ">=3.14"
dependencies = [
"fastapi>=0.116",
"pydantic-settings>=2.10",
]
[dependency-groups]
test = [
"pytest>=8",
"coverage[toml]>=7",
"hypothesis>=6",
]
lint = [
"ruff>=0.13",
]
typecheck = [
"pyright>=1.1",
]
dev = [
{ include-group = "test" },
{ include-group = "lint" },
{ include-group = "typecheck" },
]The tradeoff is that dependency groups require the team to learn one more packaging distinction. The benefit is that runtime metadata stops advertising contributor tooling as installable product features.
Ruff Should Be Boring#
Ruff is the linting and formatting default I want unless the repo has strong historical reasons to keep a different stack. It replaces a large amount of old toolchain sprawl: pyflakes-style errors, pycodestyle-style rules, import sorting, pyupgrade, many flake8 plugins, and Black-compatible formatting through the Ruff formatter.
The config should be strict enough to keep entropy down and small enough that people can remember what it means.
[tool.ruff]
target-version = "py314"
line-length = 100
[tool.ruff.lint]
select = [
"E",
"F",
"I",
"B",
"UP",
"SIM",
]
ignore = []
[tool.ruff.format]
quote-style = "double"
indent-style = "space"If the rule set becomes a theological document, the tool is no longer doing boring work. Start with correctness, imports, modernization, bugbear-style bug patterns, and simplification. Add security and project-specific rules when the team is ready to own the noise.
The CI Contract#
The CI job should prove four things:
- The code formats and lints.
- The type checker can understand the project.
- Tests pass against the installed package shape.
- The package, container, or deployment artifact can be built.
name: ci
on:
pull_request:
push:
branches: ["main"]
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Install Python
run: uv python install
- name: Sync dependencies
run: uv sync --frozen --all-groups
- name: Lint
run: uv run ruff check .
- name: Format
run: uv run ruff format --check .
- name: Typecheck
run: uv run pyright
- name: Test
run: uv run pytestIf this is a library, add a build job and publish through Trusted Publishers instead of long-lived PyPI tokens.
Tests Are Architecture Too#
pytest remains the testing default because plain functions, fixtures, parametrization, and rich assertions scale well from small units to integration tests. Hypothesis belongs where invariants matter more than examples: parsers, normalizers, pricing logic, feature transforms, deduplication, and state transitions.
The folder split should describe risk:
| Test type | Purpose | Example |
|---|---|---|
| Unit | Pure logic and small services | pricing, parsing, domain rules |
| Integration | Real adapters against local dependencies | database repository, queue client |
| Contract | Cross-system payload shape | partner API response, event schema |
| E2E | One critical path through the app | signup, scoring, invoice generation |
Coverage targets are useful until they become theater. I would rather have 82 percent coverage that exercises the boundaries than 96 percent coverage built from mock-heavy tests that prove implementation details.
pre-commit Is a Local Brake, Not CI#
pre-commit is useful because it makes the cheap checks happen before review. It is not a replacement for CI. Developers can skip hooks. CI cannot be skipped by accident.
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.13.0
hooks:
- id: ruff-check
args: ["--fix"]
- id: ruff-formatUse local hooks for fast hygiene. Use CI for release truth.
The Tradeoff#
The tradeoff is that this skeleton feels heavy for a prototype.
That is why I separate throwaway scripts from projects. A one-file analysis can stay one file. A public package, production API, AI workflow, data pipeline, or backend system should not pretend it is still a scratchpad after it gains users, schedules, secrets, and incidents.
The skeleton buys options:
- a package can be installed and tested as a package;
- dependencies can be reproduced;
- linting and formatting stay one command;
- type checking can become stricter over time;
- tests have an obvious home;
- CI knows what "ready" means.
The first feature should enter a repo that already knows how to say no.