// "The interactive computing interface for exploration, analysis, and communication."
The complete operational guide to Jupyter Notebooks — from first cell to production deployment. Covers architecture, kernel management, magic commands, secret handling, .ipynb vs .py decision framework, and scaling to production via Papermill, NBConvert, and JupyterHub.
Jupyter Notebook is an open-source, browser-based interactive computing environment. It combines live code, rich text (Markdown), equations (LaTeX), visualizations, and narrative prose in a single shareable document called a notebook — stored as a .ipynb (IPython Notebook) JSON file.
Originating from IPython in 2011, the Jupyter project now supports over 40 programming languages through interchangeable kernels. The name Jupyter is a tribute to three core scientific computing languages: Julia, Python, and R.
Architecture
Browser
Frontend UI (HTML/JS)
→
Server
Jupyter Server (Python)
→
Protocol
ZeroMQ Messaging
→
Kernel
IPython / IRkernel / etc.
→
Execution
Code runs here
Cells
The fundamental unit. Three types: Code (executable), Markdown (formatted text/LaTeX), and Raw (unprocessed). Cells run independently and share a kernel state.
Kernel
A separate process that executes code. The kernel maintains all variable state between cell executions. Kernels can be restarted, interrupted, or swapped independently of the notebook UI.
.ipynb Format
Notebooks are JSON files. They store cell source, metadata, outputs (including images as base64), and kernel info. This makes them versionable but diff-noisy. Use nbstripout pre-commit to clear outputs.
ℹ
Classic vs JupyterLab vs JupyterLite: Classic Notebook (v6) is the original single-document interface. JupyterLab (v4+) is the modern IDE-like successor with multi-panel layout, file browser, terminal, and extension ecosystem — use JupyterLab for all new work. JupyterLite runs entirely in the browser via WASM — no server required, ideal for education and demos.
02
When To Use Jupyter
// USE CASES & ANTI-PATTERNS
Jupyter excels at exploratory, narrative, and iterative workflows. It is not a general-purpose application runtime. Choosing Jupyter for the wrong task creates maintenance debt and security risk.
Use Jupyter When Good Fit
Exploratory data analysis (EDA) — rapid iteration on unknown datasets
Data visualization prototyping — matplotlib, plotly, seaborn, altair
Teaching and presenting — inline outputs + Markdown narrative
Statistical analysis and hypothesis testing
Model training experimentation — comparing hyperparameters interactively
Documenting research workflows with reproducible code
One-off data transformations and ad-hoc SQL query analysis
Generating reports that mix prose, code, tables, and charts
Don't Use Jupyter When Bad Fit
Building production REST APIs or microservices (use FastAPI, Flask)
Writing shared library code intended for import by other modules
Long-running background jobs or daemons
CLI tools or scripts with argument parsing
Code that needs proper unit testing as a primary artifact
Multi-developer collaborative coding (merge conflicts on .ipynb JSON)
Anything requiring strict execution order guarantees without cell-by-cell control
Domain Use Case Map
Domain
Jupyter Fits?
Typical Notebook Role
Data Engineering
PARTIAL
Pipeline prototyping, data quality checks — not production ETL
Data Science / ML
YES
EDA, feature engineering, model experimentation, evaluation reports
MLOps
PARTIAL
Parameterized training notebooks via Papermill; not inference serving
Use .py scripts, Ansible, or purpose-built CLI tools
Education
YES
Interactive tutorials, exercises with inline feedback via widgets
03
Setup & Installation
// LOCAL, VENV, CONDA, DOCKER
Always install Jupyter inside a virtual environment — never into the system Python. This prevents dependency conflicts between projects and makes environments reproducible.
Register your project venv as a named kernel so you can switch between projects without restarting the server. Each notebook then explicitly selects its environment.
# Register current venv as a named kernel
pip install ipykernel
python -m ipykernel install --user \
--name myproject \
--display-name "Python (myproject)"
Useful Companion Packages
nbstripout — strip outputs before git commit
nbconvert — export to HTML, PDF, script
nbformat — programmatic notebook manipulation
jupytext — sync .ipynb ↔ .py/.md files
papermill — parameterize and execute notebooks
nbval — validate notebook outputs in CI
04
How To Use
// KEYBOARD SHORTCUTS, CELL MODES, OUTPUTS
Jupyter has two modes: Command Mode (blue border — navigate cells) and Edit Mode (cursor active — type code). Press Esc for command mode, Enter or click to enter edit mode.
Essential Keyboard Shortcuts
Shortcut
Mode
Action
Shift+Enter
Both
Run cell, move to next
Ctrl+Enter
Both
Run cell, stay on current
Alt+Enter
Both
Run cell, insert new below
A
Command
Insert cell above
B
Command
Insert cell below
DD
Command
Delete cell
M
Command
Convert to Markdown
Y
Command
Convert to Code
Z
Command
Undo cell deletion
Ctrl+S
Both
Save notebook
0 0
Command
Restart kernel
I I
Command
Interrupt kernel
Tab
Edit
Autocomplete
Shift+Tab
Edit
Tooltip / docstring
Output Types
Python — Output Examples
# ── Last expression = displayed automatically (no print needed)
df.head() # renders as interactive HTML table# ── Rich display protocol — any object with _repr_html_() renders richlyfrom IPython.display import display, HTML, Image, Markdown, IFrame
display(HTML("<b style='color:gold'>Hello</b>"))
display(Markdown("## Section Header"))
display(Image("chart.png"))
# ── Suppress output with semicolon
plt.plot([1,2,3]); # trailing ; suppresses the matplotlib object repr# ── Multiple outputs in one cellfrom IPython.display import display
display(df.describe())
display(df.dtypes)
# Both tables print — display() is explicit, last-expr is implicit# ── print vs display
print(df) # plain text output
display(df) # rich HTML table output
df # same as display() if last expression
05
Magic Commands
// LINE MAGIC, CELL MAGIC, AUTOMAGIC
Magic commands are special IPython directives prefixed with % (line magic) or %% (cell magic). They provide superpowers not available in plain Python — timing, profiling, shell access, multi-language execution, and more.
IPython — Magic Command Reference
## ── TIMING & PROFILING ──────────────────────────────────────%time df.groupby('user_id').sum() # single run timing%timeit df.groupby('user_id').sum() # multiple runs, stats (default 7 runs × 3 loops)%timeit -n 1000 -r 5 some_function() # custom loops/repeats%%timeit# time entire cell
result = []
for i in range(10000):
result.append(i**2)
%prun my_function() # cProfile line-by-line%lprun -f my_function my_function() # line_profiler (pip install line_profiler)%memit my_function() # memory usage (pip install memory_profiler)## ── SHELL & FILESYSTEM ──────────────────────────────────────%ls# list directory (same as !ls)%cd /path/to/dir # change directory (persistent!)%pwd# print working directory!pip install pandas # ! prefix = shell command
files = !ls *.csv # capture shell output as list## ── CODE INSPECTION ─────────────────────────────────────────%who# list all variables in namespace%whos# list variables with type and value%reset# clear namespace (nuclear option)%history# show input history%history -n -20: # last 20 commands with line numbers
obj?# inspect object (docstring + type)
obj??# inspect object (full source code)## ── CELL MAGIC (entire cell, not just line) ─────────────────%%bash# run cell as bash script
echo "Hello from bash"
ls -la
%%html# render cell as HTML
<marquee>Hello</marquee>
%%javascript# run cell as JS in browser
console.log("Hello from JS")
%%writefile mymodule.py # write cell contents to file
def hello(): return "Hello"
%%capture output # capture stdout/stderr/display
print("this won't show but is in output.stdout")
## ── DISPLAY & MATPLOTLIB ────────────────────────────────────%matplotlib inline # render plots inline (static)%matplotlib widget # interactive plots (ipympl)%config InlineBackend.figure_format='retina'# hi-DPI plots## ── AUTO FEATURES ───────────────────────────────────────────%autoreload 2 # auto-reload imported modules on change%load_ext autoreload # must load extension first%automagic on # use magic without % prefix (risky)## ── ENVIRONMENT & MISC ──────────────────────────────────────%env# show all env vars%env MY_VAR=hello # set env var for session%lsmagic# list ALL available magic commands%magic# full magic system docs
▸
%autoreload trick: Add %load_ext autoreload + %autoreload 2 at the top of any notebook that imports local modules. This automatically reloads modified modules when you re-execute cells — eliminates the "restart kernel to see changes" cycle during development.
06
Kernels
// MULTI-LANGUAGE EXECUTION
A kernel is a language runtime that Jupyter communicates with over ZeroMQ. Any language with a Jupyter kernel can be used as first-class citizens in notebooks. The kernel maintains state between cell executions — variables, imports, and functions persist until the kernel is restarted.
IPython
Python 3.x
Default and most popular. Install: pip install ipykernel. Register with python -m ipykernel install --user --name myenv
IRkernel
R 4.x
Full R environment. Install from R console: install.packages('IRkernel'); IRkernel::installspec()
IJulia
Julia 1.x
Julia REPL in Jupyter. Install: using Pkg; Pkg.add("IJulia") from Julia REPL.
xeus-cling
C++ 17
Interactive C++ via cling interpreter. Install via conda: conda install -c conda-forge xeus-cling
Script of Scripts — mix multiple languages in one notebook with variable passing between them. Install: pip install sos-notebook
bash — Kernel Management
# List installed kernels
jupyter kernelspec list
# Remove a kernel
jupyter kernelspec remove myoldenv
# View kernel JSON spec
cat ~/.local/share/jupyter/kernels/myproject/kernel.json
# kernel.json structure
{
"argv": ["/path/to/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"],
"display_name": "Python (myproject)",
"language": "python",
"env": { # inject env vars into this kernel"PYTHONPATH": "/my/extra/path"
}
}
07
.ipynb vs .py
// THE DECISION FRAMEWORK
This is one of the most important architectural decisions in any data-heavy project. Using the wrong format creates technical debt, breaks automation, or makes collaboration harder. The answer depends on who runs the code, how often, and why.
📓 Use .ipynb (Notebook)
Exploratory analysis where you need to see intermediate outputs
Communicating results to non-technical stakeholders
Tutorials, documentation with executable examples
One-time or infrequent analyses on a specific dataset
Prototyping a model before productionizing
Research with narrative reasoning between code blocks
Interactive visualization with widgets
Parameterized reports via Papermill
🐍 Use .py (Script / Module)
Code that will be imported by other modules
Production pipelines run automatically on a schedule
Code requiring unit tests and test coverage metrics
CLI tools with argument parsing (argparse, Click, Typer)
Shared utility functions used across multiple notebooks
Application code — web servers, data validation, business logic
Code in version control with meaningful diffs
Anything run in a CI/CD pipeline unattended
The Graduation Path
Most successful projects use both. The standard pattern is to prototype in a notebook, then graduate mature, reusable code to .py modules that the notebook imports.
Python — Notebook + Module Pattern
# ── Project structure: notebooks use modules, not the reverse
project/
├── notebooks/
│ ├── 01_eda.ipynb # exploration — messy, disposable
│ ├── 02_feature_engineering.ipynb
│ └── 03_model_evaluation.ipynb # final report — clean, presentable
├── src/
│ ├── __init__.py
│ ├── features.py # graduated from notebook → tested module
│ ├── models.py
│ └── utils.py
├── tests/
│ └── test_features.py # .py modules are testable; .ipynb are not (easily)
└── data/
# ── In notebook: import your own modules%load_ext autoreload
%autoreload2import sys; sys.path.insert(0, '../')
from src.features import build_feature_matrix
from src.utils import load_config
# Notebook handles: exploration + visualization + narrative# .py handles: reusable logic + testing + production execution
Converting Between Formats
bash — Format Conversion
# Convert .ipynb → .py script (strips outputs)
jupyter nbconvert --to script analysis.ipynb
# → outputs analysis.py with # In[N]: cell markers# Jupytext: bidirectional sync — edit .py, see changes in .ipynb
pip install jupytext
jupytext --to py:percent analysis.ipynb # convert with % cell markers
jupytext --to notebook analysis.py # back to notebook
jupytext --sync analysis.ipynb # sync paired files# Pair a notebook for dual-format version control
jupytext --set-formats ipynb,py:percent analysis.ipynb
# Now both files stay in sync — commit .py for clean diffs, .ipynb optionally
▸
Jupytext + nbstripout workflow: Use Jupytext to maintain a paired .py version of every notebook (commit the .py, not the .ipynb). Add nbstripout as a pre-commit hook to strip outputs from any .ipynb files that do get committed. This gives you readable git diffs and avoids 10MB notebook blobs in your repo history.
08
Notebook Structure
// OPINIONATED TEMPLATE FOR PRODUCTION-GRADE NOTEBOOKS
A well-structured notebook is reproducible from top-to-bottom (kernel restart → run all with no errors), self-documenting, and separates configuration from logic. These are the conventions that make notebooks maintainable months later.
Python — Canonical Notebook Template
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 1 — Title (Markdown)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"""
# Customer Churn Analysis — Q4 2024
**Purpose**: Identify leading indicators of churn in the enterprise segment.
**Author**: Data Science Team
**Last Updated**: 2024-12-01
**Data**: `data/crm_export_2024-12.csv`
"""# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 2 — Parameters (top of notebook = easy to find)
# Papermill injects values into this cell tag: "parameters"
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DATA_PATH = "data/crm_export_2024-12.csv"
OUTPUT_DIR = "outputs/"
CUTOFF_DATE = "2024-11-01"
CHURN_DAYS = 90# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 3 — Imports (all at top, organized: stdlib → 3rd party → local)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━import os, sys, warnings
from pathlib import Path
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sys.path.insert(0, "../")
from src.utils import load_config
warnings.filterwarnings('ignore')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'%load_ext autoreload
%autoreload2# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Cell 4 — Data Loading
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
df = pd.read_csv(DATA_PATH, parse_dates=['created_at', 'last_login'])
print(f"Loaded {len(df):,} rows, {df.shape[1]} columns")
df.head()
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Sections: 1. EDA 2. Feature Engineering 3. Modeling 4. Results
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
09
Advanced Features
// PAPERMILL, NBCONVERT, PARALLEL EXECUTION
Papermill — Parameterized Notebook Execution
Papermill executes notebooks programmatically with injected parameters. It's the backbone of notebook-based ML pipelines — run the same notebook against different datasets, date ranges, or model configs.
Python + bash — Papermill
# ── Setup: tag the parameters cell in JupyterLab# View → Cell Toolbar → Tags → add tag "parameters" to the config cell# ── CLI execution
pip install papermill
papermill input.ipynb output_2024-12.ipynb \
-p DATA_PATH "data/dec.csv" \
-p CUTOFF_DATE "2024-12-01" \
-p CHURN_DAYS 60
# ── Python API — run programmatically in a pipelineimport papermill as pm
pm.execute_notebook(
input_path="templates/churn_analysis.ipynb",
output_path=f"outputs/churn_{month}.ipynb",
parameters={
"DATA_PATH": f"data/crm_{month}.csv",
"CUTOFF_DATE": cutoff_str,
},
kernel_name="python3",
execution_timeout=600, # 10 min timeout
progress_bar=False, # suppress in CI
)
# ── Run multiple notebooks in parallelfrom concurrent.futures import ThreadPoolExecutor
months = ["2024-10", "2024-11", "2024-12"]
defrun_month(month):
pm.execute_notebook(
"templates/monthly_report.ipynb",
f"outputs/report_{month}.ipynb",
parameters={"MONTH": month}
)
with ThreadPoolExecutor(max_workers=3) as ex:
ex.map(run_month, months)
NBConvert — Export to Any Format
bash — NBConvert Export
# Export to HTML (inline CSS + JS — fully self-contained)
jupyter nbconvert analysis.ipynb --to html --no-input # hide code cells
jupyter nbconvert analysis.ipynb --to html # include code# Export to PDF (requires LaTeX → install texlive)
jupyter nbconvert analysis.ipynb --to pdf
# Export to slides (uses Reveal.js)
jupyter nbconvert analysis.ipynb --to slides --post serve
# Export to Markdown (strips outputs — good for docs)
jupyter nbconvert analysis.ipynb --to markdown
# Execute THEN convert in one command
jupyter nbconvert analysis.ipynb \
--to html \
--execute \
--ExecutePreprocessor.timeout=300 \
--output report.html
✓
Automated reports pattern: Schedule a cron job (or Airflow task) that runs papermill to execute the notebook with fresh data, then nbconvert to generate an HTML report, then emails it or uploads to S3. Zero infrastructure — a notebook becomes a fully automated, reproducible report.
10
Widgets & Interactivity
// IPYWIDGETS, INTERACT, VOILÀ
ipywidgets turns static notebooks into interactive dashboards — sliders, dropdowns, text inputs, and progress bars that trigger Python callbacks without writing any JavaScript.
Python — ipywidgets Patterns
pip install ipywidgets
# JupyterLab: extensions are auto-enabled in v3.x+import ipywidgets as widgets
from IPython.display import display
# ── Quick interactive plot with @interact decoratorfrom ipywidgets import interact, fixed
import matplotlib.pyplot as plt
import numpy as np
@interact(
amplitude=(0.1, 2.0, 0.1), # (min, max, step) → slider
frequency=(1, 10),
wave_type=['sin', 'cos', 'tan'] # list → dropdown
)
defplot_wave(amplitude=1.0, frequency=3, wave_type='sin'):
x = np.linspace(0, 2*np.pi, 500)
y = amplitude * getattr(np, wave_type)(frequency * x)
plt.figure(figsize=(10, 3))
plt.plot(x, y)
plt.title(f"{wave_type}(x), A={amplitude}, f={frequency}")
plt.tight_layout()
plt.show()
# ── Manual widget construction (more control)
slider = widgets.FloatSlider(value=1.0, min=0.1, max=5.0, description='α')
dropdown = widgets.Dropdown(options=['linear','log','sqrt'], description='Scale')
button = widgets.Button(description='Run Analysis', button_style='success')
output = widgets.Output()
defon_button_click(b):
with output:
output.clear_output()
print(f"Running with α={slider.value}, scale={dropdown.value}")
# ... your analysis code here ...
button.on_click(on_button_click)
display(widgets.VBox([slider, dropdown, button, output]))
# ── Progress bar for long operations
progress = widgets.IntProgress(value=0, min=0, max=100, description='Processing:')
display(progress)
for i, chunk inenumerate(chunks):
process(chunk)
progress.value = i + 1
Voilà — Notebook to Dashboard Deployment
Voilà renders a Jupyter notebook as a standalone web app — strips the code, shows only outputs and widgets. No code visible to end users. One command to serve:
For complex dashboards, consider Panel (works inside and outside notebooks) or Dash (Flask-based, production-grade). Both are better choices than Voilà when you need routing, multi-page apps, or custom layouts beyond widget composition.
Notebooks are particularly dangerous for secret leakage because outputs (including printed API keys) are stored in the .ipynb JSON and committed to git. A secret hardcoded in a notebook cell may live in git history forever even after deletion.
⚠
Critical: Jupyter outputs are stored in the .ipynb file. If you print(api_key) even accidentally, that secret is now in your notebook's JSON. If committed to git, it's in history. Even if you clear the output and commit again — it's still in the previous commit. Assume it's leaked. Rotate immediately.
Secret Patterns (Safest First)
Python — Secret Management Patterns
## ── PATTERN 1: .env file + python-dotenv (recommended for local dev)# .env file (NEVER commit this — add to .gitignore)# DATABASE_URL=postgresql://user:pass@host:5432/db# OPENAI_API_KEY=sk-proj-...# AWS_ACCESS_KEY_ID=AKIA...
pip install python-dotenv
from dotenv import load_dotenv
import os
load_dotenv() # loads .env from cwd or parent dirs
db_url = os.environ["DATABASE_URL"] # raises if missing — fail fast
api_key = os.getenv("OPENAI_API_KEY", "") # returns "" if missing## ── PATTERN 2: OS environment variables (CI/CD / containers)# Set in terminal before launching jupyter:# export DATABASE_URL="postgresql://..."# jupyter labimport os
db_url = os.environ["DATABASE_URL"] # already available if set before launch## ── PATTERN 3: getpass (interactive prompt — never stored)from getpass import getpass
api_key = getpass("Enter API key: ") # prompts silently, not stored in output## ── PATTERN 4: HashiCorp Vault (production / team environments)import hvac
client = hvac.Client(url="https://vault.internal:8200")
client.auth.approle.login(role_id=os.environ["VAULT_ROLE_ID"],
secret_id=os.environ["VAULT_SECRET_ID"])
secret = client.secrets.kv.read_secret_version(path="data-team/postgres")
db_password = secret["data"]["data"]["password"]
## ── PATTERN 5: AWS Secrets Manager / cloud-nativeimport boto3, json
client = boto3.client("secretsmanager", region_name="us-east-1")
# Auth via instance profile / IAM role — no credentials in code
response = client.get_secret_value(SecretId="prod/myapp/db")
secret = json.loads(response["SecretString"])
password = secret["password"]
## ── WHAT NOT TO DO# ❌ api_key = "sk-proj-abc123..." hardcoded in cell# ❌ password = "mypassword" hardcoded# ❌ %env DATABASE_URL=postgresql://... stored in notebook JSON# ❌ print(api_key) output saved to .ipynb
Pre-commit Hook — nbstripout Essential
Strip outputs (and therefore any accidentally printed secrets) before every commit. One-time setup, works for the entire team.
# ── Generate default config files
jupyter lab --generate-config
# → ~/.jupyter/jupyter_lab_config.py
jupyter server --generate-config
# → ~/.jupyter/jupyter_server_config.py## ── jupyter_server_config.py — production settings# Security: token-based auth (default) — never disable in production
c.ServerApp.token = ''# ← EMPTY = auto-generate. Set fixed for CI
c.ServerApp.password = ''# use token OR password, not both
c.ServerApp.open_browser = False# Network: bind to localhost only (never 0.0.0.0 without auth + TLS)
c.ServerApp.ip = '127.0.0.1'# default — local only
c.ServerApp.port = 8888
c.ServerApp.allow_remote_access = False# Root dir for file browser
c.ServerApp.root_dir = '/home/jupyter/work'# Kernel management: kill idle kernels
c.MappingKernelManager.cull_idle_timeout = 3600# 1 hour idle → kill
c.MappingKernelManager.cull_interval = 300# check every 5 min
c.MappingKernelManager.cull_connected = False# don't kill if browser tab open# Execution: limit per-notebook concurrency
c.ServerApp.max_body_size = 512 * 1024 * 1024# 512MB upload limit## ── Remote server with SSL (production setup)# Generate self-signed cert (use proper cert in prod)
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout jupyter.key -out jupyter.crt
c.ServerApp.certfile = '/path/to/jupyter.crt'
c.ServerApp.keyfile = '/path/to/jupyter.key'
c.ServerApp.ip = '0.0.0.0'# only with SSL + auth## ── Custom kernel memory limits (via systemd or cgroups)# kernel.json: wrap python with memory-limited systemd-run
{
"argv": ["systemd-run", "--scope", "-p", "MemoryMax=4G",
"/usr/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"]
}
ℹ
JupyterHub is the multi-user server — it manages spawning per-user notebook servers, authentication (PAM, LDAP, OAuth, GitHub), and resource allocation. Use JupyterHub when you need a shared Jupyter environment for a team. Run it behind a reverse proxy (nginx/Caddy) with TLS. Use Zero to JupyterHub with Kubernetes for scalable cloud deployments.
14
Deploy to Production
// PAPERMILL PIPELINES, VOILÀ, JUPYTERHUB, DOCKER
There are four deployment patterns for notebooks in production. Choose based on whether you're scheduling analysis, serving dashboards, or providing a shared compute environment.
Pattern
Use Case
Technology
Complexity
Scheduled Execution
Automated reports, ML retraining, data pipelines
Papermill + cron / Airflow / Prefect
LOW
Dashboard App
Business users need interactive outputs, no code
Voilà + Panel + nginx
MEDIUM
Team Compute
Data science team needs shared GPU/CPU environment
JupyterHub + Kubernetes
HIGH
ML Pipeline Step
Notebook is a training/evaluation step in a DAG
Papermill + MLflow / Kedro
MEDIUM
Pattern 1: Airflow + Papermill Pipeline
Python — Airflow DAG with Papermill
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import papermill as pm
defrun_analysis_notebook(**context):
execution_date = context['ds'] # Airflow execution date string
pm.execute_notebook(
input_path="notebooks/templates/daily_report.ipynb",
output_path=f"outputs/daily_report_{execution_date}.ipynb",
parameters={
"REPORT_DATE": execution_date,
"DATA_SOURCE": "s3://my-bucket/data/",
"OUTPUT_BUCKET": "s3://my-bucket/reports/",
},
kernel_name="python3",
execution_timeout=1800,
)
with DAG(
dag_id="daily_analysis_report",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
default_args={"retries": 1, "retry_delay": timedelta(minutes=10)}
) as dag:
run_notebook = PythonOperator(
task_id="run_analysis",
python_callable=run_analysis_notebook,
provide_context=True,
)
Pattern 2: Docker Deployment
Dockerfile — Production Jupyter Container
FROM jupyter/datascience-notebook:python-3.11
# Switch to root to install system packagesUSER root
RUN apt-get update && apt-get install -y --no-install-recommends \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Back to jovyan (non-root user — required)USER jovyan
# Install Python packagesCOPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Copy notebook templates (read-only)COPY --chown=jovyan:users notebooks/ /home/jovyan/work/notebooks/
# Configure: no token in internal cluster (protected by k8s network policy)ENV JUPYTER_ENABLE_LAB=yes
RUN jupyter lab --generate-config && \
echo "c.ServerApp.token = ''" >> ~/.jupyter/jupyter_lab_config.py && \
echo "c.ServerApp.allow_origin = '*'" >> ~/.jupyter/jupyter_lab_config.py
EXPOSE 8888
# CMD inherited from base image: starts jupyter lab
Notebooks can be tested in CI — verifying they execute top-to-bottom without errors, and optionally validating that cell outputs match expected values. Combine with Jupytext to keep .py versions that work with standard pytest.
YAML — GitHub Actions CI Pipeline
name: Notebook CI
on: [push, pull_request]
jobs:
test-notebooks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with: { python-version: "3.11" }
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install nbval nbmake pytest papermill
# ── Test 1: notebooks execute without error (nbmake)
- name: Execute notebooks
run: pytest --nbmake notebooks/ --nbmake-timeout=300
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }} # inject secrets via GitHub Secrets
API_KEY: ${{ secrets.API_KEY }}
# ── Test 2: validate outputs haven't changed (nbval)
- name: Validate notebook outputs
run: pytest --nbval-lax notebooks/validated/
# --nbval-lax: ignore minor output diffs (timestamps, etc.)# --nbval: strict equality on all outputs# ── Test 3: run via Papermill and check exit code
- name: Parameterized execution test
run: |
papermill notebooks/report.ipynb /tmp/report_out.ipynb \
-p TEST_MODE true \
-p SAMPLE_SIZE 100
echo "Exit code: $?"
Best Practices Summary Checklist
All notebooks run top-to-bottom (restart + run all) without error
nbstripout pre-commit hook strips outputs before commit
Jupytext paired .py files for readable git diffs
Secrets loaded from env / .env / vault — never hardcoded
Parameters cell tagged for Papermill injection
Imports and config at top, not scattered through cells
Kernel registered per project environment
%autoreload for local module development
Common Pitfalls Avoid
Running cells out of order — hidden state is the #1 notebook bug
Hardcoded file paths (use pathlib, env vars, or config)
Installing packages inside notebooks with !pip install in CI
Committing large binary outputs or base64 images in .ipynb
Opening the same notebook in multiple browser tabs (kernel conflicts)
Using %cd — it changes working dir globally for the kernel
Long-running notebooks without checkpointing intermediate results