JNB

// Architecture Field Handbook

Jupyter
Notebook

// Project Jupyter · v7.x · JupyterLab 4.x

// "The interactive computing interface for exploration, analysis, and communication."

The complete operational guide to Jupyter Notebooks — from first cell to production deployment. Covers architecture, kernel management, magic commands, secret handling, .ipynb vs .py decision framework, and scaling to production via Papermill, NBConvert, and JupyterHub.

Interactive Computing Data Science Machine Learning JupyterLab Papermill Production

What Is Jupyter

// ARCHITECTURE & CORE CONCEPTS

Jupyter Notebook is an open-source, browser-based interactive computing environment. It combines live code, rich text (Markdown), equations (LaTeX), visualizations, and narrative prose in a single shareable document called a notebook — stored as a .ipynb (IPython Notebook) JSON file.

Originating from IPython in 2011, the Jupyter project now supports over 40 programming languages through interchangeable kernels. The name Jupyter is a tribute to three core scientific computing languages: Julia, Python, and R.

Architecture

Browser

Frontend UI (HTML/JS)

→

Server

Jupyter Server (Python)

→

Protocol

ZeroMQ Messaging

→

Kernel

IPython / IRkernel / etc.

→

Execution

Code runs here

Cells

The fundamental unit. Three types: Code (executable), Markdown (formatted text/LaTeX), and Raw (unprocessed). Cells run independently and share a kernel state.

Kernel

A separate process that executes code. The kernel maintains all variable state between cell executions. Kernels can be restarted, interrupted, or swapped independently of the notebook UI.

.ipynb Format

Notebooks are JSON files. They store cell source, metadata, outputs (including images as base64), and kernel info. This makes them versionable but diff-noisy. Use nbstripout pre-commit to clear outputs.

ℹ

Classic vs JupyterLab vs JupyterLite: Classic Notebook (v6) is the original single-document interface. JupyterLab (v4+) is the modern IDE-like successor with multi-panel layout, file browser, terminal, and extension ecosystem — use JupyterLab for all new work. JupyterLite runs entirely in the browser via WASM — no server required, ideal for education and demos.

When To Use Jupyter

// USE CASES & ANTI-PATTERNS

Jupyter excels at exploratory, narrative, and iterative workflows. It is not a general-purpose application runtime. Choosing Jupyter for the wrong task creates maintenance debt and security risk.

Use Jupyter When Good Fit

Exploratory data analysis (EDA) — rapid iteration on unknown datasets
Data visualization prototyping — matplotlib, plotly, seaborn, altair
Teaching and presenting — inline outputs + Markdown narrative
Statistical analysis and hypothesis testing
Model training experimentation — comparing hyperparameters interactively
Documenting research workflows with reproducible code
One-off data transformations and ad-hoc SQL query analysis
Generating reports that mix prose, code, tables, and charts

Don't Use Jupyter When Bad Fit

Building production REST APIs or microservices (use FastAPI, Flask)
Writing shared library code intended for import by other modules
Long-running background jobs or daemons
CLI tools or scripts with argument parsing
Code that needs proper unit testing as a primary artifact
Multi-developer collaborative coding (merge conflicts on .ipynb JSON)
Anything requiring strict execution order guarantees without cell-by-cell control

Domain Use Case Map

Domain	Jupyter Fits?	Typical Notebook Role
Data Engineering	PARTIAL	Pipeline prototyping, data quality checks — not production ETL
Data Science / ML	YES	EDA, feature engineering, model experimentation, evaluation reports
MLOps	PARTIAL	Parameterized training notebooks via Papermill; not inference serving
Scientific Research	YES	Reproducible analysis, publication figures, computational supplements
Business Analytics	YES	Ad-hoc analysis, executive reports, dashboard prototypes
Web Development	NO	Not applicable — use proper frameworks
DevOps Automation	NO	Use .py scripts, Ansible, or purpose-built CLI tools
Education	YES	Interactive tutorials, exercises with inline feedback via widgets

Setup & Installation

// LOCAL, VENV, CONDA, DOCKER

Always install Jupyter inside a virtual environment — never into the system Python. This prevents dependency conflicts between projects and makes environments reproducible.

bash — Installation Patterns

# ── Option 1: pip + venv (recommended for most projects)
python -m venv .venv
source .venv/bin/activate                     # Windows: .venv\Scripts\activate
pip install jupyterlab                        # installs JupyterLab + classic notebook
jupyter lab                                   # start server → opens browser

# ── Option 2: Conda (data science / ML — handles non-Python deps)
conda create -n myproject python=3.11
conda activate myproject
conda install -c conda-forge jupyterlab pandas numpy scikit-learn matplotlib
jupyter lab

# ── Option 3: uv (fast, modern — recommended for new projects)
uv venv
source .venv/bin/activate
uv pip install jupyterlab
jupyter lab

# ── Option 4: Docker (isolated, reproducible, no local Python needed)
docker run -p 8888:8888 \
  -v $(pwd):/home/jovyan/work \
  jupyter/datascience-notebook

# ── Verify installation
jupyter --version
jupyter kernelspec list               # shows installed kernels

Kernel per Project Best Practice

Register your project venv as a named kernel so you can switch between projects without restarting the server. Each notebook then explicitly selects its environment.

# Register current venv as a named kernel
pip install ipykernel
python -m ipykernel install --user \
  --name myproject \
  --display-name "Python (myproject)"

Useful Companion Packages

nbstripout — strip outputs before git commit
nbconvert — export to HTML, PDF, script
nbformat — programmatic notebook manipulation
jupytext — sync .ipynb ↔ .py/.md files
papermill — parameterize and execute notebooks
nbval — validate notebook outputs in CI

How To Use

// KEYBOARD SHORTCUTS, CELL MODES, OUTPUTS

Jupyter has two modes: Command Mode (blue border — navigate cells) and Edit Mode (cursor active — type code). Press Esc for command mode, Enter or click to enter edit mode.

Essential Keyboard Shortcuts

Shortcut	Mode	Action
`Shift+Enter`	Both	Run cell, move to next
`Ctrl+Enter`	Both	Run cell, stay on current
`Alt+Enter`	Both	Run cell, insert new below
`A`	Command	Insert cell above
`B`	Command	Insert cell below
`DD`	Command	Delete cell
`M`	Command	Convert to Markdown
`Y`	Command	Convert to Code
`Z`	Command	Undo cell deletion
`Ctrl+S`	Both	Save notebook
`0 0`	Command	Restart kernel
`I I`	Command	Interrupt kernel
`Tab`	Edit	Autocomplete
`Shift+Tab`	Edit	Tooltip / docstring

Output Types

Python — Output Examples

# ── Last expression = displayed automatically (no print needed)
df.head()             # renders as interactive HTML table

# ── Rich display protocol — any object with _repr_html_() renders richly
from IPython.display import display, HTML, Image, Markdown, IFrame

display(HTML("<b style='color:gold'>Hello</b>"))
display(Markdown("## Section Header"))
display(Image("chart.png"))

# ── Suppress output with semicolon
plt.plot([1,2,3]);   # trailing ; suppresses the matplotlib object repr

# ── Multiple outputs in one cell
from IPython.display import display
display(df.describe())
display(df.dtypes)
# Both tables print — display() is explicit, last-expr is implicit

# ── print vs display
print(df)             # plain text output
display(df)           # rich HTML table output
df                    # same as display() if last expression

Magic Commands

// LINE MAGIC, CELL MAGIC, AUTOMAGIC

Magic commands are special IPython directives prefixed with % (line magic) or %% (cell magic). They provide superpowers not available in plain Python — timing, profiling, shell access, multi-language execution, and more.

IPython — Magic Command Reference

## ── TIMING & PROFILING ──────────────────────────────────────
%time   df.groupby('user_id').sum()      # single run timing
%timeit df.groupby('user_id').sum()      # multiple runs, stats (default 7 runs × 3 loops)
%timeit -n 1000 -r 5 some_function()      # custom loops/repeats

%%timeit                                   # time entire cell
result = []
for i in range(10000):
    result.append(i**2)

%prun my_function()                        # cProfile line-by-line
%lprun -f my_function my_function()        # line_profiler (pip install line_profiler)
%memit my_function()                       # memory usage (pip install memory_profiler)

## ── SHELL & FILESYSTEM ──────────────────────────────────────
%ls                                        # list directory (same as !ls)
%cd /path/to/dir                           # change directory (persistent!)
%pwd                                       # print working directory
!pip install pandas                        # ! prefix = shell command
files = !ls *.csv                          # capture shell output as list

## ── CODE INSPECTION ─────────────────────────────────────────
%who                                       # list all variables in namespace
%whos                                      # list variables with type and value
%reset                                     # clear namespace (nuclear option)
%history                                   # show input history
%history -n -20:                           # last 20 commands with line numbers
obj?                                        # inspect object (docstring + type)
obj??                                       # inspect object (full source code)

## ── CELL MAGIC (entire cell, not just line) ─────────────────
%%bash                                     # run cell as bash script
echo "Hello from bash"
ls -la

%%html                                     # render cell as HTML
<marquee>Hello</marquee>

%%javascript                               # run cell as JS in browser
console.log("Hello from JS")

%%writefile mymodule.py                   # write cell contents to file
def hello(): return "Hello"

%%capture output                          # capture stdout/stderr/display
print("this won't show but is in output.stdout")

## ── DISPLAY & MATPLOTLIB ────────────────────────────────────
%matplotlib inline                        # render plots inline (static)
%matplotlib widget                        # interactive plots (ipympl)
%config InlineBackend.figure_format='retina'  # hi-DPI plots

## ── AUTO FEATURES ───────────────────────────────────────────
%autoreload 2                             # auto-reload imported modules on change
%load_ext autoreload                      # must load extension first
%automagic on                             # use magic without % prefix (risky)

## ── ENVIRONMENT & MISC ──────────────────────────────────────
%env                                      # show all env vars
%env MY_VAR=hello                         # set env var for session
%lsmagic                                  # list ALL available magic commands
%magic                                    # full magic system docs

▸

%autoreload trick: Add %load_ext autoreload + %autoreload 2 at the top of any notebook that imports local modules. This automatically reloads modified modules when you re-execute cells — eliminates the "restart kernel to see changes" cycle during development.

Kernels

// MULTI-LANGUAGE EXECUTION

A kernel is a language runtime that Jupyter communicates with over ZeroMQ. Any language with a Jupyter kernel can be used as first-class citizens in notebooks. The kernel maintains state between cell executions — variables, imports, and functions persist until the kernel is restarted.

IPython

Python 3.x

Default and most popular. Install: pip install ipykernel. Register with python -m ipykernel install --user --name myenv

IRkernel

R 4.x

Full R environment. Install from R console: install.packages('IRkernel'); IRkernel::installspec()

IJulia

Julia 1.x

Julia REPL in Jupyter. Install: using Pkg; Pkg.add("IJulia") from Julia REPL.

xeus-cling

C++ 17

Interactive C++ via cling interpreter. Install via conda: conda install -c conda-forge xeus-cling

ijavascript

Node.js

Node.js kernel. Install: npm install -g ijavascript; ijsinstall

SoS

Polyglot

Script of Scripts — mix multiple languages in one notebook with variable passing between them. Install: pip install sos-notebook

bash — Kernel Management

# List installed kernels
jupyter kernelspec list

# Remove a kernel
jupyter kernelspec remove myoldenv

# View kernel JSON spec
cat ~/.local/share/jupyter/kernels/myproject/kernel.json

# kernel.json structure
{
  "argv": ["/path/to/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"],
  "display_name": "Python (myproject)",
  "language": "python",
  "env": {                              # inject env vars into this kernel
    "PYTHONPATH": "/my/extra/path"
  }
}

.ipynb vs .py

// THE DECISION FRAMEWORK

This is one of the most important architectural decisions in any data-heavy project. Using the wrong format creates technical debt, breaks automation, or makes collaboration harder. The answer depends on who runs the code, how often, and why.

📓 Use .ipynb (Notebook)

Exploratory analysis where you need to see intermediate outputs
Communicating results to non-technical stakeholders
Tutorials, documentation with executable examples
One-time or infrequent analyses on a specific dataset
Prototyping a model before productionizing
Research with narrative reasoning between code blocks
Interactive visualization with widgets
Parameterized reports via Papermill

🐍 Use .py (Script / Module)

Code that will be imported by other modules
Production pipelines run automatically on a schedule
Code requiring unit tests and test coverage metrics
CLI tools with argument parsing (argparse, Click, Typer)
Shared utility functions used across multiple notebooks
Application code — web servers, data validation, business logic
Code in version control with meaningful diffs
Anything run in a CI/CD pipeline unattended

The Graduation Path

Most successful projects use both. The standard pattern is to prototype in a notebook, then graduate mature, reusable code to .py modules that the notebook imports.

Python — Notebook + Module Pattern

# ── Project structure: notebooks use modules, not the reverse
project/
├── notebooks/
│   ├── 01_eda.ipynb                # exploration — messy, disposable
│   ├── 02_feature_engineering.ipynb
│   └── 03_model_evaluation.ipynb   # final report — clean, presentable
├── src/
│   ├── __init__.py
│   ├── features.py                 # graduated from notebook → tested module
│   ├── models.py
│   └── utils.py
├── tests/
│   └── test_features.py            # .py modules are testable; .ipynb are not (easily)
└── data/

# ── In notebook: import your own modules
%load_ext autoreload
%autoreload 2

import sys; sys.path.insert(0, '../')
from src.features import build_feature_matrix
from src.utils import load_config

# Notebook handles: exploration + visualization + narrative
# .py handles: reusable logic + testing + production execution

Converting Between Formats

bash — Format Conversion

# Convert .ipynb → .py script (strips outputs)
jupyter nbconvert --to script analysis.ipynb
# → outputs analysis.py with # In[N]: cell markers

# Jupytext: bidirectional sync — edit .py, see changes in .ipynb
pip install jupytext
jupytext --to py:percent analysis.ipynb          # convert with % cell markers
jupytext --to notebook analysis.py               # back to notebook
jupytext --sync analysis.ipynb                   # sync paired files

# Pair a notebook for dual-format version control
jupytext --set-formats ipynb,py:percent analysis.ipynb
# Now both files stay in sync — commit .py for clean diffs, .ipynb optionally

▸

Jupytext + nbstripout workflow: Use Jupytext to maintain a paired .py version of every notebook (commit the .py, not the .ipynb). Add nbstripout as a pre-commit hook to strip outputs from any .ipynb files that do get committed. This gives you readable git diffs and avoids 10MB notebook blobs in your repo history.

Notebook Structure

// OPINIONATED TEMPLATE FOR PRODUCTION-GRADE NOTEBOOKS

A well-structured notebook is reproducible from top-to-bottom (kernel restart → run all with no errors), self-documenting, and separates configuration from logic. These are the conventions that make notebooks maintainable months later.

Python — Canonical Notebook Template

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 1 — Title (Markdown)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
"""
# Customer Churn Analysis — Q4 2024

**Purpose**: Identify leading indicators of churn in the enterprise segment.
**Author**: Data Science Team
**Last Updated**: 2024-12-01
**Data**: `data/crm_export_2024-12.csv`
"""

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 2 — Parameters (top of notebook = easy to find)
#           Papermill injects values into this cell tag: "parameters"
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DATA_PATH   = "data/crm_export_2024-12.csv"
OUTPUT_DIR  = "outputs/"
CUTOFF_DATE = "2024-11-01"
CHURN_DAYS  = 90

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 3 — Imports (all at top, organized: stdlib → 3rd party → local)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
import os, sys, warnings
from pathlib import Path
from datetime import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sys.path.insert(0, "../")
from src.utils import load_config

warnings.filterwarnings('ignore')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 4 — Data Loading
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
df = pd.read_csv(DATA_PATH, parse_dates=['created_at', 'last_login'])
print(f"Loaded {len(df):,} rows, {df.shape[1]} columns")
df.head()

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Sections: 1. EDA  2. Feature Engineering  3. Modeling  4. Results
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Advanced Features

// PAPERMILL, NBCONVERT, PARALLEL EXECUTION

Papermill — Parameterized Notebook Execution

Papermill executes notebooks programmatically with injected parameters. It's the backbone of notebook-based ML pipelines — run the same notebook against different datasets, date ranges, or model configs.

Python + bash — Papermill

# ── Setup: tag the parameters cell in JupyterLab
# View → Cell Toolbar → Tags → add tag "parameters" to the config cell

# ── CLI execution
pip install papermill
papermill input.ipynb output_2024-12.ipynb \
    -p DATA_PATH "data/dec.csv" \
    -p CUTOFF_DATE "2024-12-01" \
    -p CHURN_DAYS 60

# ── Python API — run programmatically in a pipeline
import papermill as pm

pm.execute_notebook(
    input_path="templates/churn_analysis.ipynb",
    output_path=f"outputs/churn_{month}.ipynb",
    parameters={
        "DATA_PATH": f"data/crm_{month}.csv",
        "CUTOFF_DATE": cutoff_str,
    },
    kernel_name="python3",
    execution_timeout=600,          # 10 min timeout
    progress_bar=False,              # suppress in CI
)

# ── Run multiple notebooks in parallel
from concurrent.futures import ThreadPoolExecutor

months = ["2024-10", "2024-11", "2024-12"]

def run_month(month):
    pm.execute_notebook(
        "templates/monthly_report.ipynb",
        f"outputs/report_{month}.ipynb",
        parameters={"MONTH": month}
    )

with ThreadPoolExecutor(max_workers=3) as ex:
    ex.map(run_month, months)

NBConvert — Export to Any Format

bash — NBConvert Export

# Export to HTML (inline CSS + JS — fully self-contained)
jupyter nbconvert analysis.ipynb --to html --no-input   # hide code cells
jupyter nbconvert analysis.ipynb --to html              # include code

# Export to PDF (requires LaTeX → install texlive)
jupyter nbconvert analysis.ipynb --to pdf

# Export to slides (uses Reveal.js)
jupyter nbconvert analysis.ipynb --to slides --post serve

# Export to Markdown (strips outputs — good for docs)
jupyter nbconvert analysis.ipynb --to markdown

# Execute THEN convert in one command
jupyter nbconvert analysis.ipynb \
    --to html \
    --execute \
    --ExecutePreprocessor.timeout=300 \
    --output report.html

✓

Automated reports pattern: Schedule a cron job (or Airflow task) that runs papermill to execute the notebook with fresh data, then nbconvert to generate an HTML report, then emails it or uploads to S3. Zero infrastructure — a notebook becomes a fully automated, reproducible report.

Widgets & Interactivity

// IPYWIDGETS, INTERACT, VOILÀ

ipywidgets turns static notebooks into interactive dashboards — sliders, dropdowns, text inputs, and progress bars that trigger Python callbacks without writing any JavaScript.

Python — ipywidgets Patterns

pip install ipywidgets
# JupyterLab: extensions are auto-enabled in v3.x+

import ipywidgets as widgets
from IPython.display import display

# ── Quick interactive plot with @interact decorator
from ipywidgets import interact, fixed
import matplotlib.pyplot as plt
import numpy as np

@interact(
    amplitude=(0.1, 2.0, 0.1),       # (min, max, step) → slider
    frequency=(1, 10),
    wave_type=['sin', 'cos', 'tan']  # list → dropdown
)
def plot_wave(amplitude=1.0, frequency=3, wave_type='sin'):
    x = np.linspace(0, 2*np.pi, 500)
    y = amplitude * getattr(np, wave_type)(frequency * x)
    plt.figure(figsize=(10, 3))
    plt.plot(x, y)
    plt.title(f"{wave_type}(x), A={amplitude}, f={frequency}")
    plt.tight_layout()
    plt.show()

# ── Manual widget construction (more control)
slider   = widgets.FloatSlider(value=1.0, min=0.1, max=5.0, description='α')
dropdown = widgets.Dropdown(options=['linear','log','sqrt'], description='Scale')
button   = widgets.Button(description='Run Analysis', button_style='success')
output   = widgets.Output()

def on_button_click(b):
    with output:
        output.clear_output()
        print(f"Running with α={slider.value}, scale={dropdown.value}")
        # ... your analysis code here ...

button.on_click(on_button_click)
display(widgets.VBox([slider, dropdown, button, output]))

# ── Progress bar for long operations
progress = widgets.IntProgress(value=0, min=0, max=100, description='Processing:')
display(progress)
for i, chunk in enumerate(chunks):
    process(chunk)
    progress.value = i + 1

Voilà — Notebook to Dashboard Deployment

Voilà renders a Jupyter notebook as a standalone web app — strips the code, shows only outputs and widgets. No code visible to end users. One command to serve:

pip install voila
voila analysis.ipynb          # serve locally
voila analysis.ipynb --port 8866 --no-browser

Panel & Plotly Dash Alternatives

For complex dashboards, consider Panel (works inside and outside notebooks) or Dash (Flask-based, production-grade). Both are better choices than Voilà when you need routing, multi-page apps, or custom layouts beyond widget composition.

Extensions & JupyterLab

// ESSENTIAL EXTENSIONS & LAB SETUP

Extension / Tool	Purpose	Install
jupyterlab-git	Git GUI inside JupyterLab — stage, commit, diff, branch	`pip install jupyterlab-git`
jupyterlab-lsp	Language Server Protocol — autocomplete, hover docs, go-to-definition	`pip install jupyterlab-lsp python-lsp-server`
jupyterlab-variableinspector	Live variable inspector sidebar — see all variables, types, shapes	`pip install lckr-jupyterlab-variableinspector`
aquirdturtle_collapsible_headings	Collapse/expand notebook sections via Markdown headers	`pip install aquirdturtle_collapsible_headings`
jupyterlab-spellchecker	Spellcheck in Markdown cells	`pip install jupyterlab-spellchecker`
nbdime	Human-readable notebook diffs and merges for git	`pip install nbdime; nbdime config-git --enable --global`
ipympl	Interactive matplotlib widgets (pan, zoom, update data)	`pip install ipympl` → use `%matplotlib widget`
jupyterlab-code-formatter	Format code cells with Black/isort on save	`pip install jupyterlab_code_formatter black isort`

Managing Secrets

// NEVER HARDCODE CREDENTIALS IN NOTEBOOKS

Notebooks are particularly dangerous for secret leakage because outputs (including printed API keys) are stored in the .ipynb JSON and committed to git. A secret hardcoded in a notebook cell may live in git history forever even after deletion.

⚠

Critical: Jupyter outputs are stored in the .ipynb file. If you print(api_key) even accidentally, that secret is now in your notebook's JSON. If committed to git, it's in history. Even if you clear the output and commit again — it's still in the previous commit. Assume it's leaked. Rotate immediately.

Secret Patterns (Safest First)

Python — Secret Management Patterns

## ── PATTERN 1: .env file + python-dotenv (recommended for local dev)
# .env file (NEVER commit this — add to .gitignore)
# DATABASE_URL=postgresql://user:pass@host:5432/db
# OPENAI_API_KEY=sk-proj-...
# AWS_ACCESS_KEY_ID=AKIA...

pip install python-dotenv

from dotenv import load_dotenv
import os

load_dotenv()                                  # loads .env from cwd or parent dirs
db_url = os.environ["DATABASE_URL"]          # raises if missing — fail fast
api_key = os.getenv("OPENAI_API_KEY", "")    # returns "" if missing

## ── PATTERN 2: OS environment variables (CI/CD / containers)
# Set in terminal before launching jupyter:
# export DATABASE_URL="postgresql://..."
# jupyter lab

import os
db_url = os.environ["DATABASE_URL"]   # already available if set before launch

## ── PATTERN 3: getpass (interactive prompt — never stored)
from getpass import getpass
api_key = getpass("Enter API key: ")    # prompts silently, not stored in output

## ── PATTERN 4: HashiCorp Vault (production / team environments)
import hvac

client = hvac.Client(url="https://vault.internal:8200")
client.auth.approle.login(role_id=os.environ["VAULT_ROLE_ID"],
                          secret_id=os.environ["VAULT_SECRET_ID"])
secret = client.secrets.kv.read_secret_version(path="data-team/postgres")
db_password = secret["data"]["data"]["password"]

## ── PATTERN 5: AWS Secrets Manager / cloud-native
import boto3, json

client = boto3.client("secretsmanager", region_name="us-east-1")
# Auth via instance profile / IAM role — no credentials in code
response = client.get_secret_value(SecretId="prod/myapp/db")
secret   = json.loads(response["SecretString"])
password = secret["password"]

## ── WHAT NOT TO DO
# ❌ api_key = "sk-proj-abc123..."          hardcoded in cell
# ❌ password = "mypassword"               hardcoded
# ❌ %env DATABASE_URL=postgresql://...    stored in notebook JSON
# ❌ print(api_key)                        output saved to .ipynb

Pre-commit Hook — nbstripout Essential

Strip outputs (and therefore any accidentally printed secrets) before every commit. One-time setup, works for the entire team.

pip install nbstripout pre-commit
# .pre-commit-config.yaml:
repos:
- repo: https://github.com/kynan/nbstripout
  rev: 0.7.1
  hooks:
  - id: nbstripout

pre-commit install

Scan History for Leaks If In Doubt

If you suspect a secret was committed, scan immediately and rotate regardless.

pip install gitleaks
# Scan entire git history
gitleaks detect --source . --verbose
# Clean history (nuclear option)
git filter-repo --invert-paths \
  --path sensitive_notebook.ipynb

Advanced Configuration

// JUPYTER_SERVER, JUPYTERHUB, RESOURCE LIMITS

bash + Python — Server Configuration

# ── Generate default config files
jupyter lab --generate-config
# → ~/.jupyter/jupyter_lab_config.py

jupyter server --generate-config
# → ~/.jupyter/jupyter_server_config.py

## ── jupyter_server_config.py — production settings

# Security: token-based auth (default) — never disable in production
c.ServerApp.token = ''                     # ← EMPTY = auto-generate. Set fixed for CI
c.ServerApp.password = ''                  # use token OR password, not both
c.ServerApp.open_browser = False

# Network: bind to localhost only (never 0.0.0.0 without auth + TLS)
c.ServerApp.ip = '127.0.0.1'              # default — local only
c.ServerApp.port = 8888
c.ServerApp.allow_remote_access = False

# Root dir for file browser
c.ServerApp.root_dir = '/home/jupyter/work'

# Kernel management: kill idle kernels
c.MappingKernelManager.cull_idle_timeout = 3600  # 1 hour idle → kill
c.MappingKernelManager.cull_interval = 300       # check every 5 min
c.MappingKernelManager.cull_connected = False    # don't kill if browser tab open

# Execution: limit per-notebook concurrency
c.ServerApp.max_body_size = 512 * 1024 * 1024  # 512MB upload limit

## ── Remote server with SSL (production setup)
# Generate self-signed cert (use proper cert in prod)
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
    -keyout jupyter.key -out jupyter.crt

c.ServerApp.certfile = '/path/to/jupyter.crt'
c.ServerApp.keyfile  = '/path/to/jupyter.key'
c.ServerApp.ip       = '0.0.0.0'           # only with SSL + auth

## ── Custom kernel memory limits (via systemd or cgroups)
# kernel.json: wrap python with memory-limited systemd-run
{
    "argv": ["systemd-run", "--scope", "-p", "MemoryMax=4G",
             "/usr/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"]
}

ℹ

JupyterHub is the multi-user server — it manages spawning per-user notebook servers, authentication (PAM, LDAP, OAuth, GitHub), and resource allocation. Use JupyterHub when you need a shared Jupyter environment for a team. Run it behind a reverse proxy (nginx/Caddy) with TLS. Use Zero to JupyterHub with Kubernetes for scalable cloud deployments.

Deploy to Production

// PAPERMILL PIPELINES, VOILÀ, JUPYTERHUB, DOCKER

There are four deployment patterns for notebooks in production. Choose based on whether you're scheduling analysis, serving dashboards, or providing a shared compute environment.

Pattern	Use Case	Technology	Complexity
Scheduled Execution	Automated reports, ML retraining, data pipelines	Papermill + cron / Airflow / Prefect	LOW
Dashboard App	Business users need interactive outputs, no code	Voilà + Panel + nginx	MEDIUM
Team Compute	Data science team needs shared GPU/CPU environment	JupyterHub + Kubernetes	HIGH
ML Pipeline Step	Notebook is a training/evaluation step in a DAG	Papermill + MLflow / Kedro	MEDIUM

Pattern 1: Airflow + Papermill Pipeline

Python — Airflow DAG with Papermill

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import papermill as pm

def run_analysis_notebook(**context):
    execution_date = context['ds']              # Airflow execution date string
    pm.execute_notebook(
        input_path="notebooks/templates/daily_report.ipynb",
        output_path=f"outputs/daily_report_{execution_date}.ipynb",
        parameters={
            "REPORT_DATE":    execution_date,
            "DATA_SOURCE":    "s3://my-bucket/data/",
            "OUTPUT_BUCKET":  "s3://my-bucket/reports/",
        },
        kernel_name="python3",
        execution_timeout=1800,
    )

with DAG(
    dag_id="daily_analysis_report",
    start_date=datetime(2024, 1, 1),
    schedule_interval="@daily",
    catchup=False,
    default_args={"retries": 1, "retry_delay": timedelta(minutes=10)}
) as dag:

    run_notebook = PythonOperator(
        task_id="run_analysis",
        python_callable=run_analysis_notebook,
        provide_context=True,
    )

Pattern 2: Docker Deployment

Dockerfile — Production Jupyter Container

FROM jupyter/datascience-notebook:python-3.11

# Switch to root to install system packages
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Back to jovyan (non-root user — required)
USER jovyan

# Install Python packages
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt

# Copy notebook templates (read-only)
COPY --chown=jovyan:users notebooks/ /home/jovyan/work/notebooks/

# Configure: no token in internal cluster (protected by k8s network policy)
ENV JUPYTER_ENABLE_LAB=yes
RUN jupyter lab --generate-config && \
    echo "c.ServerApp.token = ''" >> ~/.jupyter/jupyter_lab_config.py && \
    echo "c.ServerApp.allow_origin = '*'" >> ~/.jupyter/jupyter_lab_config.py

EXPOSE 8888
# CMD inherited from base image: starts jupyter lab

Pattern 3: Voilà as a Web App

bash + nginx — Voilà Production Deployment

# ── Start Voilà server (bind to localhost — nginx handles TLS)
voila dashboard.ipynb \
    --port 8866 \
    --no-browser \
    --VoilaConfiguration.file_whitelist="['.*\.(png|jpg|gif|svg|mp4|avi)']"

# ── nginx config (TLS termination + proxy)
server {
    listen 443 ssl;
    server_name dashboard.internal.co;

    ssl_certificate     /etc/ssl/certs/dashboard.crt;
    ssl_certificate_key /etc/ssl/private/dashboard.key;

    location / {
        proxy_pass         http://127.0.0.1:8866;
        proxy_http_version 1.1;
        proxy_set_header   Upgrade $http_upgrade;
        proxy_set_header   Connection "upgrade";  # required for WebSockets
        proxy_set_header   Host $host;
        proxy_read_timeout 86400;                 # long timeout for kernel sessions
    }
}

# ── systemd service for Voilà
[Unit]
Description=Voila Dashboard
After=network.target

[Service]
User=jupyter
WorkingDirectory=/home/jupyter/app
Environment="PATH=/home/jupyter/.venv/bin"
ExecStart=/home/jupyter/.venv/bin/voila dashboard.ipynb --port 8866 --no-browser
Restart=always

[Install]
WantedBy=multi-user.target

CI/CD & Testing

// NBVAL, PYTEST, GITHUB ACTIONS

Notebooks can be tested in CI — verifying they execute top-to-bottom without errors, and optionally validating that cell outputs match expected values. Combine with Jupytext to keep .py versions that work with standard pytest.

YAML — GitHub Actions CI Pipeline

name: Notebook CI

on: [push, pull_request]

jobs:
  test-notebooks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with: { python-version: "3.11" }

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install nbval nbmake pytest papermill

      # ── Test 1: notebooks execute without error (nbmake)
      - name: Execute notebooks
        run: pytest --nbmake notebooks/ --nbmake-timeout=300
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}  # inject secrets via GitHub Secrets
          API_KEY:      ${{ secrets.API_KEY }}

      # ── Test 2: validate outputs haven't changed (nbval)
      - name: Validate notebook outputs
        run: pytest --nbval-lax notebooks/validated/
        # --nbval-lax: ignore minor output diffs (timestamps, etc.)
        # --nbval: strict equality on all outputs

      # ── Test 3: run via Papermill and check exit code
      - name: Parameterized execution test
        run: |
          papermill notebooks/report.ipynb /tmp/report_out.ipynb \
            -p TEST_MODE true \
            -p SAMPLE_SIZE 100
          echo "Exit code: $?"

Best Practices Summary Checklist

All notebooks run top-to-bottom (restart + run all) without error
nbstripout pre-commit hook strips outputs before commit
Jupytext paired .py files for readable git diffs
Secrets loaded from env / .env / vault — never hardcoded
Parameters cell tagged for Papermill injection
Imports and config at top, not scattered through cells
Kernel registered per project environment
%autoreload for local module development

Common Pitfalls Avoid

Running cells out of order — hidden state is the #1 notebook bug
Hardcoded file paths (use pathlib, env vars, or config)
Installing packages inside notebooks with !pip install in CI
Committing large binary outputs or base64 images in .ipynb
Opening the same notebook in multiple browser tabs (kernel conflicts)
Using %cd — it changes working dir globally for the kernel
Long-running notebooks without checkpointing intermediate results

▸

Reference resources: jupyter.org/documentation · papermill.readthedocs.io · nbconvert.readthedocs.io · jupytext.readthedocs.io · voila.readthedocs.io · jupyterhub.readthedocs.io · zero-to-jupyterhub.readthedocs.io · nbval.readthedocs.io