V1
Back to handbooks index
Field Manual
Jupyter Notebook
DOC-JNB-2024
KERNEL READY
JNB
// Architecture Field Handbook

Jupyter
Notebook

// Project Jupyter · v7.x · JupyterLab 4.x
// "The interactive computing interface for exploration, analysis, and communication."

The complete operational guide to Jupyter Notebooks — from first cell to production deployment. Covers architecture, kernel management, magic commands, secret handling, .ipynb vs .py decision framework, and scaling to production via Papermill, NBConvert, and JupyterHub.

Interactive Computing Data Science Machine Learning JupyterLab Papermill Production
01

What Is Jupyter

// ARCHITECTURE & CORE CONCEPTS

Jupyter Notebook is an open-source, browser-based interactive computing environment. It combines live code, rich text (Markdown), equations (LaTeX), visualizations, and narrative prose in a single shareable document called a notebook — stored as a .ipynb (IPython Notebook) JSON file.

Originating from IPython in 2011, the Jupyter project now supports over 40 programming languages through interchangeable kernels. The name Jupyter is a tribute to three core scientific computing languages: Julia, Python, and R.

Architecture

Browser
Frontend UI (HTML/JS)
Server
Jupyter Server (Python)
Protocol
ZeroMQ Messaging
Kernel
IPython / IRkernel / etc.
Execution
Code runs here
Cells

The fundamental unit. Three types: Code (executable), Markdown (formatted text/LaTeX), and Raw (unprocessed). Cells run independently and share a kernel state.

Kernel

A separate process that executes code. The kernel maintains all variable state between cell executions. Kernels can be restarted, interrupted, or swapped independently of the notebook UI.

.ipynb Format

Notebooks are JSON files. They store cell source, metadata, outputs (including images as base64), and kernel info. This makes them versionable but diff-noisy. Use nbstripout pre-commit to clear outputs.

Classic vs JupyterLab vs JupyterLite: Classic Notebook (v6) is the original single-document interface. JupyterLab (v4+) is the modern IDE-like successor with multi-panel layout, file browser, terminal, and extension ecosystem — use JupyterLab for all new work. JupyterLite runs entirely in the browser via WASM — no server required, ideal for education and demos.
02

When To Use Jupyter

// USE CASES & ANTI-PATTERNS

Jupyter excels at exploratory, narrative, and iterative workflows. It is not a general-purpose application runtime. Choosing Jupyter for the wrong task creates maintenance debt and security risk.

Use Jupyter When Good Fit
  • Exploratory data analysis (EDA) — rapid iteration on unknown datasets
  • Data visualization prototyping — matplotlib, plotly, seaborn, altair
  • Teaching and presenting — inline outputs + Markdown narrative
  • Statistical analysis and hypothesis testing
  • Model training experimentation — comparing hyperparameters interactively
  • Documenting research workflows with reproducible code
  • One-off data transformations and ad-hoc SQL query analysis
  • Generating reports that mix prose, code, tables, and charts
Don't Use Jupyter When Bad Fit
  • Building production REST APIs or microservices (use FastAPI, Flask)
  • Writing shared library code intended for import by other modules
  • Long-running background jobs or daemons
  • CLI tools or scripts with argument parsing
  • Code that needs proper unit testing as a primary artifact
  • Multi-developer collaborative coding (merge conflicts on .ipynb JSON)
  • Anything requiring strict execution order guarantees without cell-by-cell control

Domain Use Case Map

DomainJupyter Fits?Typical Notebook Role
Data EngineeringPARTIALPipeline prototyping, data quality checks — not production ETL
Data Science / MLYESEDA, feature engineering, model experimentation, evaluation reports
MLOpsPARTIALParameterized training notebooks via Papermill; not inference serving
Scientific ResearchYESReproducible analysis, publication figures, computational supplements
Business AnalyticsYESAd-hoc analysis, executive reports, dashboard prototypes
Web DevelopmentNONot applicable — use proper frameworks
DevOps AutomationNOUse .py scripts, Ansible, or purpose-built CLI tools
EducationYESInteractive tutorials, exercises with inline feedback via widgets
03

Setup & Installation

// LOCAL, VENV, CONDA, DOCKER

Always install Jupyter inside a virtual environment — never into the system Python. This prevents dependency conflicts between projects and makes environments reproducible.

bash — Installation Patterns
# ── Option 1: pip + venv (recommended for most projects) python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install jupyterlab # installs JupyterLab + classic notebook jupyter lab # start server → opens browser # ── Option 2: Conda (data science / ML — handles non-Python deps) conda create -n myproject python=3.11 conda activate myproject conda install -c conda-forge jupyterlab pandas numpy scikit-learn matplotlib jupyter lab # ── Option 3: uv (fast, modern — recommended for new projects) uv venv source .venv/bin/activate uv pip install jupyterlab jupyter lab # ── Option 4: Docker (isolated, reproducible, no local Python needed) docker run -p 8888:8888 \ -v $(pwd):/home/jovyan/work \ jupyter/datascience-notebook # ── Verify installation jupyter --version jupyter kernelspec list # shows installed kernels
Kernel per Project Best Practice

Register your project venv as a named kernel so you can switch between projects without restarting the server. Each notebook then explicitly selects its environment.

# Register current venv as a named kernel pip install ipykernel python -m ipykernel install --user \ --name myproject \ --display-name "Python (myproject)"
Useful Companion Packages
  • nbstripout — strip outputs before git commit
  • nbconvert — export to HTML, PDF, script
  • nbformat — programmatic notebook manipulation
  • jupytext — sync .ipynb ↔ .py/.md files
  • papermill — parameterize and execute notebooks
  • nbval — validate notebook outputs in CI
04

How To Use

// KEYBOARD SHORTCUTS, CELL MODES, OUTPUTS

Jupyter has two modes: Command Mode (blue border — navigate cells) and Edit Mode (cursor active — type code). Press Esc for command mode, Enter or click to enter edit mode.

Essential Keyboard Shortcuts

ShortcutModeAction
Shift+EnterBothRun cell, move to next
Ctrl+EnterBothRun cell, stay on current
Alt+EnterBothRun cell, insert new below
ACommandInsert cell above
BCommandInsert cell below
DDCommandDelete cell
MCommandConvert to Markdown
YCommandConvert to Code
ZCommandUndo cell deletion
Ctrl+SBothSave notebook
0 0CommandRestart kernel
I ICommandInterrupt kernel
TabEditAutocomplete
Shift+TabEditTooltip / docstring

Output Types

Python — Output Examples
# ── Last expression = displayed automatically (no print needed) df.head() # renders as interactive HTML table # ── Rich display protocol — any object with _repr_html_() renders richly from IPython.display import display, HTML, Image, Markdown, IFrame display(HTML("<b style='color:gold'>Hello</b>")) display(Markdown("## Section Header")) display(Image("chart.png")) # ── Suppress output with semicolon plt.plot([1,2,3]); # trailing ; suppresses the matplotlib object repr # ── Multiple outputs in one cell from IPython.display import display display(df.describe()) display(df.dtypes) # Both tables print — display() is explicit, last-expr is implicit # ── print vs display print(df) # plain text output display(df) # rich HTML table output df # same as display() if last expression
05

Magic Commands

// LINE MAGIC, CELL MAGIC, AUTOMAGIC

Magic commands are special IPython directives prefixed with % (line magic) or %% (cell magic). They provide superpowers not available in plain Python — timing, profiling, shell access, multi-language execution, and more.

IPython — Magic Command Reference
## ── TIMING & PROFILING ────────────────────────────────────── %time df.groupby('user_id').sum() # single run timing %timeit df.groupby('user_id').sum() # multiple runs, stats (default 7 runs × 3 loops) %timeit -n 1000 -r 5 some_function() # custom loops/repeats %%timeit # time entire cell result = [] for i in range(10000): result.append(i**2) %prun my_function() # cProfile line-by-line %lprun -f my_function my_function() # line_profiler (pip install line_profiler) %memit my_function() # memory usage (pip install memory_profiler) ## ── SHELL & FILESYSTEM ────────────────────────────────────── %ls # list directory (same as !ls) %cd /path/to/dir # change directory (persistent!) %pwd # print working directory !pip install pandas # ! prefix = shell command files = !ls *.csv # capture shell output as list ## ── CODE INSPECTION ───────────────────────────────────────── %who # list all variables in namespace %whos # list variables with type and value %reset # clear namespace (nuclear option) %history # show input history %history -n -20: # last 20 commands with line numbers obj? # inspect object (docstring + type) obj?? # inspect object (full source code) ## ── CELL MAGIC (entire cell, not just line) ───────────────── %%bash # run cell as bash script echo "Hello from bash" ls -la %%html # render cell as HTML <marquee>Hello</marquee> %%javascript # run cell as JS in browser console.log("Hello from JS") %%writefile mymodule.py # write cell contents to file def hello(): return "Hello" %%capture output # capture stdout/stderr/display print("this won't show but is in output.stdout") ## ── DISPLAY & MATPLOTLIB ──────────────────────────────────── %matplotlib inline # render plots inline (static) %matplotlib widget # interactive plots (ipympl) %config InlineBackend.figure_format='retina' # hi-DPI plots ## ── AUTO FEATURES ─────────────────────────────────────────── %autoreload 2 # auto-reload imported modules on change %load_ext autoreload # must load extension first %automagic on # use magic without % prefix (risky) ## ── ENVIRONMENT & MISC ────────────────────────────────────── %env # show all env vars %env MY_VAR=hello # set env var for session %lsmagic # list ALL available magic commands %magic # full magic system docs
%autoreload trick: Add %load_ext autoreload + %autoreload 2 at the top of any notebook that imports local modules. This automatically reloads modified modules when you re-execute cells — eliminates the "restart kernel to see changes" cycle during development.
06

Kernels

// MULTI-LANGUAGE EXECUTION

A kernel is a language runtime that Jupyter communicates with over ZeroMQ. Any language with a Jupyter kernel can be used as first-class citizens in notebooks. The kernel maintains state between cell executions — variables, imports, and functions persist until the kernel is restarted.

IPython
Python 3.x
Default and most popular. Install: pip install ipykernel. Register with python -m ipykernel install --user --name myenv
IRkernel
R 4.x
Full R environment. Install from R console: install.packages('IRkernel'); IRkernel::installspec()
IJulia
Julia 1.x
Julia REPL in Jupyter. Install: using Pkg; Pkg.add("IJulia") from Julia REPL.
xeus-cling
C++ 17
Interactive C++ via cling interpreter. Install via conda: conda install -c conda-forge xeus-cling
ijavascript
Node.js
Node.js kernel. Install: npm install -g ijavascript; ijsinstall
SoS
Polyglot
Script of Scripts — mix multiple languages in one notebook with variable passing between them. Install: pip install sos-notebook
bash — Kernel Management
# List installed kernels jupyter kernelspec list # Remove a kernel jupyter kernelspec remove myoldenv # View kernel JSON spec cat ~/.local/share/jupyter/kernels/myproject/kernel.json # kernel.json structure { "argv": ["/path/to/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"], "display_name": "Python (myproject)", "language": "python", "env": { # inject env vars into this kernel "PYTHONPATH": "/my/extra/path" } }
07

.ipynb vs .py

// THE DECISION FRAMEWORK

This is one of the most important architectural decisions in any data-heavy project. Using the wrong format creates technical debt, breaks automation, or makes collaboration harder. The answer depends on who runs the code, how often, and why.

📓 Use .ipynb (Notebook)
  • Exploratory analysis where you need to see intermediate outputs
  • Communicating results to non-technical stakeholders
  • Tutorials, documentation with executable examples
  • One-time or infrequent analyses on a specific dataset
  • Prototyping a model before productionizing
  • Research with narrative reasoning between code blocks
  • Interactive visualization with widgets
  • Parameterized reports via Papermill
🐍 Use .py (Script / Module)
  • Code that will be imported by other modules
  • Production pipelines run automatically on a schedule
  • Code requiring unit tests and test coverage metrics
  • CLI tools with argument parsing (argparse, Click, Typer)
  • Shared utility functions used across multiple notebooks
  • Application code — web servers, data validation, business logic
  • Code in version control with meaningful diffs
  • Anything run in a CI/CD pipeline unattended

The Graduation Path

Most successful projects use both. The standard pattern is to prototype in a notebook, then graduate mature, reusable code to .py modules that the notebook imports.

Python — Notebook + Module Pattern
# ── Project structure: notebooks use modules, not the reverse project/ ├── notebooks/ │ ├── 01_eda.ipynb # exploration — messy, disposable │ ├── 02_feature_engineering.ipynb │ └── 03_model_evaluation.ipynb # final report — clean, presentable ├── src/ │ ├── __init__.py │ ├── features.py # graduated from notebook → tested module │ ├── models.py │ └── utils.py ├── tests/ │ └── test_features.py # .py modules are testable; .ipynb are not (easily) └── data/ # ── In notebook: import your own modules %load_ext autoreload %autoreload 2 import sys; sys.path.insert(0, '../') from src.features import build_feature_matrix from src.utils import load_config # Notebook handles: exploration + visualization + narrative # .py handles: reusable logic + testing + production execution

Converting Between Formats

bash — Format Conversion
# Convert .ipynb → .py script (strips outputs) jupyter nbconvert --to script analysis.ipynb # → outputs analysis.py with # In[N]: cell markers # Jupytext: bidirectional sync — edit .py, see changes in .ipynb pip install jupytext jupytext --to py:percent analysis.ipynb # convert with % cell markers jupytext --to notebook analysis.py # back to notebook jupytext --sync analysis.ipynb # sync paired files # Pair a notebook for dual-format version control jupytext --set-formats ipynb,py:percent analysis.ipynb # Now both files stay in sync — commit .py for clean diffs, .ipynb optionally
Jupytext + nbstripout workflow: Use Jupytext to maintain a paired .py version of every notebook (commit the .py, not the .ipynb). Add nbstripout as a pre-commit hook to strip outputs from any .ipynb files that do get committed. This gives you readable git diffs and avoids 10MB notebook blobs in your repo history.
08

Notebook Structure

// OPINIONATED TEMPLATE FOR PRODUCTION-GRADE NOTEBOOKS

A well-structured notebook is reproducible from top-to-bottom (kernel restart → run all with no errors), self-documenting, and separates configuration from logic. These are the conventions that make notebooks maintainable months later.

Python — Canonical Notebook Template
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Cell 1 — Title (Markdown) # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ """ # Customer Churn Analysis — Q4 2024 **Purpose**: Identify leading indicators of churn in the enterprise segment. **Author**: Data Science Team **Last Updated**: 2024-12-01 **Data**: `data/crm_export_2024-12.csv` """ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Cell 2 — Parameters (top of notebook = easy to find) # Papermill injects values into this cell tag: "parameters" # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ DATA_PATH = "data/crm_export_2024-12.csv" OUTPUT_DIR = "outputs/" CUTOFF_DATE = "2024-11-01" CHURN_DAYS = 90 # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Cell 3 — Imports (all at top, organized: stdlib → 3rd party → local) # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ import os, sys, warnings from pathlib import Path from datetime import datetime import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sys.path.insert(0, "../") from src.utils import load_config warnings.filterwarnings('ignore') %matplotlib inline %config InlineBackend.figure_format = 'retina' %load_ext autoreload %autoreload 2 # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Cell 4 — Data Loading # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ df = pd.read_csv(DATA_PATH, parse_dates=['created_at', 'last_login']) print(f"Loaded {len(df):,} rows, {df.shape[1]} columns") df.head() # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Sections: 1. EDA 2. Feature Engineering 3. Modeling 4. Results # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
09

Advanced Features

// PAPERMILL, NBCONVERT, PARALLEL EXECUTION

Papermill — Parameterized Notebook Execution

Papermill executes notebooks programmatically with injected parameters. It's the backbone of notebook-based ML pipelines — run the same notebook against different datasets, date ranges, or model configs.

Python + bash — Papermill
# ── Setup: tag the parameters cell in JupyterLab # View → Cell Toolbar → Tags → add tag "parameters" to the config cell # ── CLI execution pip install papermill papermill input.ipynb output_2024-12.ipynb \ -p DATA_PATH "data/dec.csv" \ -p CUTOFF_DATE "2024-12-01" \ -p CHURN_DAYS 60 # ── Python API — run programmatically in a pipeline import papermill as pm pm.execute_notebook( input_path="templates/churn_analysis.ipynb", output_path=f"outputs/churn_{month}.ipynb", parameters={ "DATA_PATH": f"data/crm_{month}.csv", "CUTOFF_DATE": cutoff_str, }, kernel_name="python3", execution_timeout=600, # 10 min timeout progress_bar=False, # suppress in CI ) # ── Run multiple notebooks in parallel from concurrent.futures import ThreadPoolExecutor months = ["2024-10", "2024-11", "2024-12"] def run_month(month): pm.execute_notebook( "templates/monthly_report.ipynb", f"outputs/report_{month}.ipynb", parameters={"MONTH": month} ) with ThreadPoolExecutor(max_workers=3) as ex: ex.map(run_month, months)

NBConvert — Export to Any Format

bash — NBConvert Export
# Export to HTML (inline CSS + JS — fully self-contained) jupyter nbconvert analysis.ipynb --to html --no-input # hide code cells jupyter nbconvert analysis.ipynb --to html # include code # Export to PDF (requires LaTeX → install texlive) jupyter nbconvert analysis.ipynb --to pdf # Export to slides (uses Reveal.js) jupyter nbconvert analysis.ipynb --to slides --post serve # Export to Markdown (strips outputs — good for docs) jupyter nbconvert analysis.ipynb --to markdown # Execute THEN convert in one command jupyter nbconvert analysis.ipynb \ --to html \ --execute \ --ExecutePreprocessor.timeout=300 \ --output report.html
Automated reports pattern: Schedule a cron job (or Airflow task) that runs papermill to execute the notebook with fresh data, then nbconvert to generate an HTML report, then emails it or uploads to S3. Zero infrastructure — a notebook becomes a fully automated, reproducible report.
10

Widgets & Interactivity

// IPYWIDGETS, INTERACT, VOILÀ

ipywidgets turns static notebooks into interactive dashboards — sliders, dropdowns, text inputs, and progress bars that trigger Python callbacks without writing any JavaScript.

Python — ipywidgets Patterns
pip install ipywidgets # JupyterLab: extensions are auto-enabled in v3.x+ import ipywidgets as widgets from IPython.display import display # ── Quick interactive plot with @interact decorator from ipywidgets import interact, fixed import matplotlib.pyplot as plt import numpy as np @interact( amplitude=(0.1, 2.0, 0.1), # (min, max, step) → slider frequency=(1, 10), wave_type=['sin', 'cos', 'tan'] # list → dropdown ) def plot_wave(amplitude=1.0, frequency=3, wave_type='sin'): x = np.linspace(0, 2*np.pi, 500) y = amplitude * getattr(np, wave_type)(frequency * x) plt.figure(figsize=(10, 3)) plt.plot(x, y) plt.title(f"{wave_type}(x), A={amplitude}, f={frequency}") plt.tight_layout() plt.show() # ── Manual widget construction (more control) slider = widgets.FloatSlider(value=1.0, min=0.1, max=5.0, description='α') dropdown = widgets.Dropdown(options=['linear','log','sqrt'], description='Scale') button = widgets.Button(description='Run Analysis', button_style='success') output = widgets.Output() def on_button_click(b): with output: output.clear_output() print(f"Running with α={slider.value}, scale={dropdown.value}") # ... your analysis code here ... button.on_click(on_button_click) display(widgets.VBox([slider, dropdown, button, output])) # ── Progress bar for long operations progress = widgets.IntProgress(value=0, min=0, max=100, description='Processing:') display(progress) for i, chunk in enumerate(chunks): process(chunk) progress.value = i + 1
Voilà — Notebook to Dashboard Deployment

Voilà renders a Jupyter notebook as a standalone web app — strips the code, shows only outputs and widgets. No code visible to end users. One command to serve:

pip install voila voila analysis.ipynb # serve locally voila analysis.ipynb --port 8866 --no-browser
Panel & Plotly Dash Alternatives

For complex dashboards, consider Panel (works inside and outside notebooks) or Dash (Flask-based, production-grade). Both are better choices than Voilà when you need routing, multi-page apps, or custom layouts beyond widget composition.

11

Extensions & JupyterLab

// ESSENTIAL EXTENSIONS & LAB SETUP
Extension / ToolPurposeInstall
jupyterlab-gitGit GUI inside JupyterLab — stage, commit, diff, branchpip install jupyterlab-git
jupyterlab-lspLanguage Server Protocol — autocomplete, hover docs, go-to-definitionpip install jupyterlab-lsp python-lsp-server
jupyterlab-variableinspectorLive variable inspector sidebar — see all variables, types, shapespip install lckr-jupyterlab-variableinspector
aquirdturtle_collapsible_headingsCollapse/expand notebook sections via Markdown headerspip install aquirdturtle_collapsible_headings
jupyterlab-spellcheckerSpellcheck in Markdown cellspip install jupyterlab-spellchecker
nbdimeHuman-readable notebook diffs and merges for gitpip install nbdime; nbdime config-git --enable --global
ipymplInteractive matplotlib widgets (pan, zoom, update data)pip install ipympl → use %matplotlib widget
jupyterlab-code-formatterFormat code cells with Black/isort on savepip install jupyterlab_code_formatter black isort
12

Managing Secrets

// NEVER HARDCODE CREDENTIALS IN NOTEBOOKS

Notebooks are particularly dangerous for secret leakage because outputs (including printed API keys) are stored in the .ipynb JSON and committed to git. A secret hardcoded in a notebook cell may live in git history forever even after deletion.

Critical: Jupyter outputs are stored in the .ipynb file. If you print(api_key) even accidentally, that secret is now in your notebook's JSON. If committed to git, it's in history. Even if you clear the output and commit again — it's still in the previous commit. Assume it's leaked. Rotate immediately.

Secret Patterns (Safest First)

Python — Secret Management Patterns
## ── PATTERN 1: .env file + python-dotenv (recommended for local dev) # .env file (NEVER commit this — add to .gitignore) # DATABASE_URL=postgresql://user:pass@host:5432/db # OPENAI_API_KEY=sk-proj-... # AWS_ACCESS_KEY_ID=AKIA... pip install python-dotenv from dotenv import load_dotenv import os load_dotenv() # loads .env from cwd or parent dirs db_url = os.environ["DATABASE_URL"] # raises if missing — fail fast api_key = os.getenv("OPENAI_API_KEY", "") # returns "" if missing ## ── PATTERN 2: OS environment variables (CI/CD / containers) # Set in terminal before launching jupyter: # export DATABASE_URL="postgresql://..." # jupyter lab import os db_url = os.environ["DATABASE_URL"] # already available if set before launch ## ── PATTERN 3: getpass (interactive prompt — never stored) from getpass import getpass api_key = getpass("Enter API key: ") # prompts silently, not stored in output ## ── PATTERN 4: HashiCorp Vault (production / team environments) import hvac client = hvac.Client(url="https://vault.internal:8200") client.auth.approle.login(role_id=os.environ["VAULT_ROLE_ID"], secret_id=os.environ["VAULT_SECRET_ID"]) secret = client.secrets.kv.read_secret_version(path="data-team/postgres") db_password = secret["data"]["data"]["password"] ## ── PATTERN 5: AWS Secrets Manager / cloud-native import boto3, json client = boto3.client("secretsmanager", region_name="us-east-1") # Auth via instance profile / IAM role — no credentials in code response = client.get_secret_value(SecretId="prod/myapp/db") secret = json.loads(response["SecretString"]) password = secret["password"] ## ── WHAT NOT TO DO # ❌ api_key = "sk-proj-abc123..." hardcoded in cell # ❌ password = "mypassword" hardcoded # ❌ %env DATABASE_URL=postgresql://... stored in notebook JSON # ❌ print(api_key) output saved to .ipynb
Pre-commit Hook — nbstripout Essential

Strip outputs (and therefore any accidentally printed secrets) before every commit. One-time setup, works for the entire team.

pip install nbstripout pre-commit # .pre-commit-config.yaml: repos: - repo: https://github.com/kynan/nbstripout rev: 0.7.1 hooks: - id: nbstripout pre-commit install
Scan History for Leaks If In Doubt

If you suspect a secret was committed, scan immediately and rotate regardless.

pip install gitleaks # Scan entire git history gitleaks detect --source . --verbose # Clean history (nuclear option) git filter-repo --invert-paths \ --path sensitive_notebook.ipynb
13

Advanced Configuration

// JUPYTER_SERVER, JUPYTERHUB, RESOURCE LIMITS
bash + Python — Server Configuration
# ── Generate default config files jupyter lab --generate-config # → ~/.jupyter/jupyter_lab_config.py jupyter server --generate-config # → ~/.jupyter/jupyter_server_config.py ## ── jupyter_server_config.py — production settings # Security: token-based auth (default) — never disable in production c.ServerApp.token = '' # ← EMPTY = auto-generate. Set fixed for CI c.ServerApp.password = '' # use token OR password, not both c.ServerApp.open_browser = False # Network: bind to localhost only (never 0.0.0.0 without auth + TLS) c.ServerApp.ip = '127.0.0.1' # default — local only c.ServerApp.port = 8888 c.ServerApp.allow_remote_access = False # Root dir for file browser c.ServerApp.root_dir = '/home/jupyter/work' # Kernel management: kill idle kernels c.MappingKernelManager.cull_idle_timeout = 3600 # 1 hour idle → kill c.MappingKernelManager.cull_interval = 300 # check every 5 min c.MappingKernelManager.cull_connected = False # don't kill if browser tab open # Execution: limit per-notebook concurrency c.ServerApp.max_body_size = 512 * 1024 * 1024 # 512MB upload limit ## ── Remote server with SSL (production setup) # Generate self-signed cert (use proper cert in prod) openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout jupyter.key -out jupyter.crt c.ServerApp.certfile = '/path/to/jupyter.crt' c.ServerApp.keyfile = '/path/to/jupyter.key' c.ServerApp.ip = '0.0.0.0' # only with SSL + auth ## ── Custom kernel memory limits (via systemd or cgroups) # kernel.json: wrap python with memory-limited systemd-run { "argv": ["systemd-run", "--scope", "-p", "MemoryMax=4G", "/usr/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"] }
JupyterHub is the multi-user server — it manages spawning per-user notebook servers, authentication (PAM, LDAP, OAuth, GitHub), and resource allocation. Use JupyterHub when you need a shared Jupyter environment for a team. Run it behind a reverse proxy (nginx/Caddy) with TLS. Use Zero to JupyterHub with Kubernetes for scalable cloud deployments.
14

Deploy to Production

// PAPERMILL PIPELINES, VOILÀ, JUPYTERHUB, DOCKER

There are four deployment patterns for notebooks in production. Choose based on whether you're scheduling analysis, serving dashboards, or providing a shared compute environment.

PatternUse CaseTechnologyComplexity
Scheduled ExecutionAutomated reports, ML retraining, data pipelinesPapermill + cron / Airflow / PrefectLOW
Dashboard AppBusiness users need interactive outputs, no codeVoilà + Panel + nginxMEDIUM
Team ComputeData science team needs shared GPU/CPU environmentJupyterHub + KubernetesHIGH
ML Pipeline StepNotebook is a training/evaluation step in a DAGPapermill + MLflow / KedroMEDIUM

Pattern 1: Airflow + Papermill Pipeline

Python — Airflow DAG with Papermill
from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime, timedelta import papermill as pm def run_analysis_notebook(**context): execution_date = context['ds'] # Airflow execution date string pm.execute_notebook( input_path="notebooks/templates/daily_report.ipynb", output_path=f"outputs/daily_report_{execution_date}.ipynb", parameters={ "REPORT_DATE": execution_date, "DATA_SOURCE": "s3://my-bucket/data/", "OUTPUT_BUCKET": "s3://my-bucket/reports/", }, kernel_name="python3", execution_timeout=1800, ) with DAG( dag_id="daily_analysis_report", start_date=datetime(2024, 1, 1), schedule_interval="@daily", catchup=False, default_args={"retries": 1, "retry_delay": timedelta(minutes=10)} ) as dag: run_notebook = PythonOperator( task_id="run_analysis", python_callable=run_analysis_notebook, provide_context=True, )

Pattern 2: Docker Deployment

Dockerfile — Production Jupyter Container
FROM jupyter/datascience-notebook:python-3.11 # Switch to root to install system packages USER root RUN apt-get update && apt-get install -y --no-install-recommends \ postgresql-client \ && rm -rf /var/lib/apt/lists/* # Back to jovyan (non-root user — required) USER jovyan # Install Python packages COPY requirements.txt /tmp/requirements.txt RUN pip install --no-cache-dir -r /tmp/requirements.txt # Copy notebook templates (read-only) COPY --chown=jovyan:users notebooks/ /home/jovyan/work/notebooks/ # Configure: no token in internal cluster (protected by k8s network policy) ENV JUPYTER_ENABLE_LAB=yes RUN jupyter lab --generate-config && \ echo "c.ServerApp.token = ''" >> ~/.jupyter/jupyter_lab_config.py && \ echo "c.ServerApp.allow_origin = '*'" >> ~/.jupyter/jupyter_lab_config.py EXPOSE 8888 # CMD inherited from base image: starts jupyter lab

Pattern 3: Voilà as a Web App

bash + nginx — Voilà Production Deployment
# ── Start Voilà server (bind to localhost — nginx handles TLS) voila dashboard.ipynb \ --port 8866 \ --no-browser \ --VoilaConfiguration.file_whitelist="['.*\.(png|jpg|gif|svg|mp4|avi)']" # ── nginx config (TLS termination + proxy) server { listen 443 ssl; server_name dashboard.internal.co; ssl_certificate /etc/ssl/certs/dashboard.crt; ssl_certificate_key /etc/ssl/private/dashboard.key; location / { proxy_pass http://127.0.0.1:8866; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; # required for WebSockets proxy_set_header Host $host; proxy_read_timeout 86400; # long timeout for kernel sessions } } # ── systemd service for Voilà [Unit] Description=Voila Dashboard After=network.target [Service] User=jupyter WorkingDirectory=/home/jupyter/app Environment="PATH=/home/jupyter/.venv/bin" ExecStart=/home/jupyter/.venv/bin/voila dashboard.ipynb --port 8866 --no-browser Restart=always [Install] WantedBy=multi-user.target
15

CI/CD & Testing

// NBVAL, PYTEST, GITHUB ACTIONS

Notebooks can be tested in CI — verifying they execute top-to-bottom without errors, and optionally validating that cell outputs match expected values. Combine with Jupytext to keep .py versions that work with standard pytest.

YAML — GitHub Actions CI Pipeline
name: Notebook CI on: [push, pull_request] jobs: test-notebooks: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: { python-version: "3.11" } - name: Install dependencies run: | pip install -r requirements.txt pip install nbval nbmake pytest papermill # ── Test 1: notebooks execute without error (nbmake) - name: Execute notebooks run: pytest --nbmake notebooks/ --nbmake-timeout=300 env: DATABASE_URL: ${{ secrets.DATABASE_URL }} # inject secrets via GitHub Secrets API_KEY: ${{ secrets.API_KEY }} # ── Test 2: validate outputs haven't changed (nbval) - name: Validate notebook outputs run: pytest --nbval-lax notebooks/validated/ # --nbval-lax: ignore minor output diffs (timestamps, etc.) # --nbval: strict equality on all outputs # ── Test 3: run via Papermill and check exit code - name: Parameterized execution test run: | papermill notebooks/report.ipynb /tmp/report_out.ipynb \ -p TEST_MODE true \ -p SAMPLE_SIZE 100 echo "Exit code: $?"
Best Practices Summary Checklist
  • All notebooks run top-to-bottom (restart + run all) without error
  • nbstripout pre-commit hook strips outputs before commit
  • Jupytext paired .py files for readable git diffs
  • Secrets loaded from env / .env / vault — never hardcoded
  • Parameters cell tagged for Papermill injection
  • Imports and config at top, not scattered through cells
  • Kernel registered per project environment
  • %autoreload for local module development
Common Pitfalls Avoid
  • Running cells out of order — hidden state is the #1 notebook bug
  • Hardcoded file paths (use pathlib, env vars, or config)
  • Installing packages inside notebooks with !pip install in CI
  • Committing large binary outputs or base64 images in .ipynb
  • Opening the same notebook in multiple browser tabs (kernel conflicts)
  • Using %cd — it changes working dir globally for the kernel
  • Long-running notebooks without checkpointing intermediate results
Reference resources: jupyter.org/documentation · papermill.readthedocs.io · nbconvert.readthedocs.io · jupytext.readthedocs.io · voila.readthedocs.io · jupyterhub.readthedocs.io · zero-to-jupyterhub.readthedocs.io · nbval.readthedocs.io