Production-Ready GCP Reference

Google Cloud (GCP) Engineer Handbook

A practical, data-focused guide to Google Cloud architecture — compute, BigQuery, networking, cost management, and security patterns with Terraform and Google Cloud Python Client Libraries.

Google Cloud Platform Terraform · Python SDKs Data & Analytics Focus March 2026

ℹ

Data-first platform: GCP is the industry leader in data and analytics. BigQuery, Cloud Storage, and Pub/Sub form the backbone of most GCP architectures. This handbook reflects that bias — the data sections are intentionally the most detailed. Every service section includes primary cost drivers.

Module 1: Compute & Containers

Compute Engine VM families, sustained use discounts, Cloud Functions serverless triggers, and GKE Autopilot vs Standard.

Module 2: Data, Storage & Analytics

Cloud Storage classes, BigQuery architecture and pricing, Cloud SQL vs Firestore for relational and document workloads.

Module 3: Networking & IAM

Global VPC architecture, Shared VPC, IAM resource hierarchy, Service Accounts, and Identity-Aware Proxy (IAP).

Module 4: Pricing & Billing Management

GCP Pricing Calculator, Billing Export to BigQuery for SQL-based cost analysis, and managing API quotas.

Module 5: Common Pitfalls

Default VPC security risks, runaway BigQuery costs, and the danger of project proliferation without billing strategy.

Module 1: Compute & Containers

GCP offers compute across the full abstraction spectrum — from Compute Engine VMs you fully control, to Cloud Functions where you write a single handler and GCP manages everything else. GKE sits in the middle as the premier managed Kubernetes offering in any cloud.

Compute Engine

Compute Engine provides virtual machines running on Google's infrastructure. You select a Machine Family (general-purpose, compute-optimized, memory-optimized, accelerator-optimized), choose an OS image, and pick a zone. Unlike AWS, GCP automatically applies Sustained Use Discounts — no upfront commitment needed.

Machine Families

Family	Series	Use Case	Example
General Purpose	E2, N2, N2D, T2D, C3	Web servers, dev/test, small databases, microservices	`e2-medium` (2 vCPU / 4 GB)
Compute-Optimized	C2, C2D, H3	Batch processing, gaming servers, HPC, CI/CD	`c2-standard-8` (8 vCPU / 32 GB)
Memory-Optimized	M1, M2, M3	SAP HANA, large in-memory databases, real-time analytics	`m2-ultramem-208` (208 vCPU / 5.8 TB)
Accelerator-Optimized	A2, A3, G2	ML training/inference, video transcoding, GPU workloads	`a2-highgpu-1g` (12 vCPU / 85 GB + A100)

Pricing Models

On-Demand

Pay per second (1-minute minimum). No commitment. Full flexibility for unpredictable workloads. Automatic Sustained Use Discounts kick in after the first month.

Committed Use Discounts (CUDs)

1 or 3-year commitment for specific vCPU/memory amounts. Up to 57% savings (3-year) vs on-demand. Applied at the billing account level, not per-VM.

Spot VMs (Preemptible)

Up to 91% savings. GCP can reclaim with 30 seconds notice. Best for fault-tolerant batch, rendering, and data processing. Max 24-hour lifetime.

Sustained Use Discounts (SUD)

GCP automatically applies discounts when a VM runs for more than 25% of a month. No action required — no reservations, no upfront payment. The discount scales linearly:

Monthly Usage	Effective Discount	You Pay
0–25% of month	0%	Full on-demand rate
25–50% of month	~20%	80% of on-demand rate on incremental usage
50–75% of month	~40%	60% of on-demand rate on incremental usage
75–100% of month	~60%	40% of on-demand rate on incremental usage

💡

SUD advantage over AWS: AWS requires you to purchase Reserved Instances upfront. GCP applies SUD automatically. For workloads that run most of the month but you're unsure about committing, GCP's automatic discounts are a significant advantage.

💰

Cost drivers: Machine type × hours running + persistent disk storage (GB/month) + network egress + static IP addresses ($0.01/hour when not attached to a running VM). A forgotten n2-standard-8 costs ~$280/month.

Console Navigation

GCP Console → Compute Engine → VM Instances → Create Instance → Select Machine Family → Configure Boot Disk → Add Firewall Rules

Terraform — Launch a Compute Engine VM

# terraform/compute.tf — Production-ready Compute Engine instance

provider "google" {
  project = "my-gcp-project-id"
  region  = "us-central1"
}

resource "google_compute_instance" "web_server" {
  name         = "web-server-prod"
  machine_type = "e2-medium"
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-12"
      size  = 30  # GB
      type  = "pd-balanced"
    }
  }

  network_interface {
    network    = "default"
    subnetwork = "default"

    # Omit access_config block for private-only VM (no external IP)
    access_config {
      # Ephemeral external IP — use for testing only
    }
  }

  # Use a Service Account instead of user credentials
  service_account {
    email  = google_service_account.vm_sa.email
    scopes = ["cloud-platform"]
  }

  # Enable Shielded VM features for security
  shielded_instance_config {
    enable_secure_boot          = true
    enable_vtpm                 = true
    enable_integrity_monitoring = true
  }

  metadata = {
    # Block project-wide SSH keys — use OS Login instead
    block-project-ssh-keys = "true"
  }

  labels = {
    environment = "production"
    managed_by  = "terraform"
  }
}

resource "google_service_account" "vm_sa" {
  account_id   = "web-server-sa"
  display_name = "Web Server Service Account"
}

💡

Always use Shielded VMs. Setting enable_secure_boot = true prevents rootkits and bootkits. Combined with block-project-ssh-keys and OS Login, you get a hardened VM baseline.

Cloud Functions

Cloud Functions is Google's serverless compute platform for event-driven code. You write a function, attach it to a trigger (HTTP, Cloud Storage, Pub/Sub, Firestore, Cloud Scheduler), and GCP handles provisioning, scaling, and patching. Cloud Functions (2nd gen) is built on Cloud Run, giving you longer timeouts (up to 60 minutes) and concurrency support.

Gen 1 vs Gen 2

Feature	Gen 1	Gen 2 (Recommended)
Max timeout	9 minutes	60 minutes (HTTP) / 9 min (event)
Concurrency	1 request per instance	Up to 1,000 concurrent requests per instance
Min instances	Supported	Supported — eliminates cold starts
Traffic splitting	Not available	Supported via Cloud Run revisions
Built on	Custom runtime	Cloud Run + Eventarc

💰

Cost drivers: Invocations ($0.40 per 1M) + compute time (vCPU-seconds + GB-seconds) + network egress. Free tier: 2M invocations/month, 400,000 GB-seconds, 200,000 GHz-seconds. Min instances add a flat idle cost even when not handling requests.

Python — Cloud Function Triggered by GCS Upload

# main.py — Cloud Function (Gen 2) triggered by a Cloud Storage object upload
import functions_framework
from google.cloud import storage

@functions_framework.cloud_event
def process_gcs_upload(cloud_event):
    """Triggered when a new object is created in a GCS bucket.

    The event payload contains bucket name, object name, and metadata.
    """
    data = cloud_event.data

    bucket_name = data["bucket"]
    file_name = data["name"]
    content_type = data.get("contentType", "unknown")
    size_bytes = data.get("size", 0)

    print(f"New file uploaded: gs://{bucket_name}/{file_name}")
    print(f"Content type: {content_type}, Size: {size_bytes} bytes")

    # Example: Read the file and process it
    client = storage.Client()
    bucket = client.bucket(bucket_name)
    blob = bucket.blob(file_name)

    # Only process CSV files
    if file_name.endswith(".csv"):
        content = blob.download_as_text()
        line_count = len(content.strip().split("\n"))
        print(f"CSV has {line_count} lines — sending to BigQuery...")
        # Insert into BigQuery, call another service, etc.
    else:
        print(f"Skipping non-CSV file: {file_name}")

Deploy with gcloud CLI

# Deploy the Gen 2 Cloud Function with a GCS trigger
gcloud functions deploy process-gcs-upload \
  --gen2 \
  --runtime python312 \
  --region us-central1 \
  --source . \
  --entry-point process_gcs_upload \
  --trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \
  --trigger-event-filters="bucket=my-data-bucket" \
  --memory 256Mi \
  --timeout 120s \
  --service-account my-cf-sa@my-gcp-project-id.iam.gserviceaccount.com

Terraform — Cloud Function (Gen 2)

# terraform/cloud_function.tf — Gen 2 Cloud Function with GCS trigger

resource "google_storage_bucket" "source_code" {
  name     = "cf-source-${var.project_id}"
  location = "US"
}

resource "google_storage_bucket_object" "function_zip" {
  name   = "function-source.zip"
  bucket = google_storage_bucket.source_code.name
  source = "${path.module}/function-source.zip"
}

resource "google_cloudfunctions2_function" "processor" {
  name     = "process-gcs-upload"
  location = "us-central1"

  build_config {
    runtime     = "python312"
    entry_point = "process_gcs_upload"
    source {
      storage_source {
        bucket = google_storage_bucket.source_code.name
        object = google_storage_bucket_object.function_zip.name
      }
    }
  }

  service_config {
    max_instance_count = 10
    min_instance_count = 0
    available_memory   = "256Mi"
    timeout_seconds    = 120
    service_account_email = google_service_account.cf_sa.email
  }

  event_trigger {
    trigger_region = "us-central1"
    event_type     = "google.cloud.storage.object.v1.finalized"
    event_filters {
      attribute = "bucket"
      value     = google_storage_bucket.data_bucket.name
    }
  }
}

resource "google_service_account" "cf_sa" {
  account_id   = "cloud-function-sa"
  display_name = "Cloud Function Service Account"
}

Google Kubernetes Engine (GKE)

GKE is widely considered the best managed Kubernetes service in any cloud. Google invented Kubernetes (from the Borg system), and GKE reflects that heritage — it supports the latest K8s versions first, has the tightest integration with GCP services, and offers an Autopilot mode that eliminates node management entirely.

Autopilot vs Standard Mode

GKE Autopilot (Recommended)

Google manages the nodes. You only define Pods — GKE provisions the right node types, sizes, and scaling automatically. You pay per Pod resource request (vCPU, memory, ephemeral storage). No node pools to manage, no OS patching, no capacity planning. Best for teams that want Kubernetes without the infrastructure overhead.

GKE Standard

You manage the nodes. Full control over node pools, machine types, OS images, and scaling policies. You pay for the entire node regardless of pod utilization. Offers DaemonSets, privileged containers, and access to the node OS. Best for workloads that need custom kernel settings, GPU scheduling, or specific machine types.

Why GKE Leads the Market Industry Context

Fastest K8s upgrades: GKE supports new Kubernetes versions weeks before EKS/AKS. Release channels: Rapid, Regular, and Stable channels with automatic upgrades. GKE Enterprise: Multi-cluster management, service mesh (Anthos Service Mesh), and fleet-level policy enforcement. Binary Authorization: Only deploy signed container images — built-in supply chain security.

💰

Cost drivers (Autopilot): Pod vCPU-seconds + Pod memory-seconds + ephemeral storage. Cost drivers (Standard): Node VM cost (same as Compute Engine) + $0.10/hour/cluster management fee. The management fee is waived for one zonal cluster per billing account. Autopilot is often cheaper because you don't pay for unused node capacity.

Console Navigation

GCP Console → Kubernetes Engine → Clusters → Create Cluster → Choose Autopilot or Standard → Select Region & Release Channel

Terraform — GKE Autopilot Cluster

# terraform/gke.tf — GKE Autopilot cluster

resource "google_container_cluster" "autopilot" {
  name     = "prod-autopilot-cluster"
  location = "us-central1"

  # Enable Autopilot mode
  enable_autopilot = true

  # Use a release channel for automatic upgrades
  release_channel {
    channel = "REGULAR"
  }

  # Private cluster — nodes have no external IPs
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false  # Keep API server public for kubectl access
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  # Network configuration
  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.gke_subnet.name

  # IP allocation for Pods and Services
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  # Workload Identity — maps K8s SAs to GCP SAs
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }
}

Module 2: Data, Storage & Analytics

Data is GCP's strongest domain. BigQuery is arguably the most important service in all of cloud computing for analytics workloads. Cloud Storage is the universal data lake foundation. This module covers the storage and database services that underpin modern data architectures.

Cloud Storage (GCS)

Cloud Storage (GCS) is Google's object storage service — the equivalent of AWS S3. It stores unstructured data (files, images, backups, ML training data) in buckets. GCS is globally unique by bucket name, and objects are stored in the closest region or in dual/multi-region configurations for high availability.

Storage Classes

Class	Min Storage Duration	Use Case	Storage $/GB/month	Retrieval $/GB
Standard	None	Frequently accessed data, hot data, serving website assets	$0.020	Free
Nearline	30 days	Data accessed <1x/month — backups, long-tail content	$0.010	$0.01
Coldline	90 days	Data accessed <1x/quarter — disaster recovery	$0.004	$0.02
Archive	365 days	Data accessed <1x/year — regulatory compliance, long-term retention	$0.0012	$0.05

💡

Autoclass: Enable Autoclass on a bucket and GCS will automatically move objects between storage classes based on actual access patterns. No lifecycle rules to write — Google observes and optimizes. This is a significant UX improvement over manually configuring S3 lifecycle policies.

Location Types

Region

Single region (e.g., us-central1). Lowest cost. Best for compute co-location — store data in the same region as your VMs/GKE.

Dual-Region

Two specific regions (e.g., us-east1 + us-central1). Automatic replication with turbo mode (<15 min RPO). Good balance of HA and cost.

Multi-Region

Broad area (e.g., US, EU, ASIA). Highest availability and geo-redundancy. Best for serving content worldwide.

💰

Cost drivers: Storage (GB/month × class rate) + network egress ($0.12/GB to internet) + operations (Class A: $0.05/10K, Class B: $0.004/10K). Early deletion fees apply for Nearline/Coldline/Archive. Egress to BigQuery in the same region is free — this is a major advantage for data pipelines.

Python — Upload and Download Objects

# gcs_operations.py — Upload and download objects using Google Cloud Storage client
from google.cloud import storage

# Client uses Application Default Credentials (ADC)
# On GCE/GKE: automatically uses the attached Service Account
# Locally: uses `gcloud auth application-default login`
client = storage.Client()

# ── Upload a file with custom metadata ──
bucket = client.bucket("my-data-bucket")
blob = bucket.blob("uploads/2026/03/report.csv")

# Set custom metadata (searchable, useful for tracking)
blob.metadata = {
    "uploaded_by": "data-pipeline-v2",
    "source_system": "salesforce",
    "record_count": "45230",
}

blob.upload_from_filename(
    "./local-data/report.csv",
    content_type="text/csv",
)
print(f"Uploaded to gs://{bucket.name}/{blob.name}")

# ── Download a file ──
download_blob = bucket.blob("uploads/2026/03/report.csv")
download_blob.download_to_filename("./downloads/report.csv")
print("Downloaded successfully")

# ── List objects with a prefix ──
blobs = client.list_blobs("my-data-bucket", prefix="uploads/2026/")
for b in blobs:
    print(f"  {b.name} ({b.size} bytes, {b.storage_class})")

Terraform — Secure GCS Bucket

# terraform/gcs.tf — Production GCS bucket with lifecycle and access control

resource "google_storage_bucket" "data_lake" {
  name          = "data-lake-${var.project_id}"
  location      = "US"
  storage_class = "STANDARD"

  # Enable Autoclass to automatically optimize storage costs
  autoclass {
    enabled = true
  }

  # Prevent accidental deletion
  force_destroy = false

  # Enable uniform bucket-level access (recommended over legacy ACLs)
  uniform_bucket_level_access = true

  # Versioning for data recovery
  versioning {
    enabled = true
  }

  # Lifecycle rule: delete old versions after 90 days
  lifecycle_rule {
    condition {
      age                = 90
      with_state         = "ARCHIVED"
    }
    action {
      type = "Delete"
    }
  }

  # Block public access
  public_access_prevention = "enforced"
}

BigQuery

BigQuery is Google's "killer app" — the service that differentiates GCP from every other cloud. It's a fully managed, serverless, petabyte-scale data warehouse with built-in ML, geospatial analysis, and BI Engine for caching. There are no indexes to tune, no clusters to manage, and no vacuum operations. You write SQL, and Google handles the rest.

Architecture: Columnar & Serverless

BigQuery separates storage and compute. Data is stored in Google's Capacitor columnar format on Colossus (Google's distributed file system). Queries are executed by Dremel, a multi-tenant execution engine that distributes work across thousands of workers. This separation means:

Storage scales independently — you only pay for data at rest, and it's cheap ($0.02/GB/month, dropping to $0.01 after 90 days of no edits).
Compute scales on demand — Dremel allocates slots (units of compute) dynamically. No cluster sizing.
Columnar format — queries only read the columns referenced in your SELECT statement. A query on 3 columns of a 200-column table reads ~1.5% of the data.

Why BigQuery Matters Strategic Context

BigQuery processes over 110 TB of data per second across Google. It can scan a 1 PB table in under 30 seconds. No other cloud service at any price point matches this throughput for ad-hoc analytics. AWS Redshift Serverless and Azure Synapse are the closest competitors, but both require more tuning and have steeper cost curves at scale.

Pricing Models

On-Demand Pricing

Pay per query: $6.25 per TB scanned. The first 1 TB per month is free. Best for: exploratory analytics, development, and variable workloads. Risk: an unpartitioned SELECT * on a 10 TB table costs ~$62.50 per query.

Capacity Pricing (Slots)

Purchase BigQuery Editions (Standard, Enterprise, Enterprise Plus). You buy "slots" (units of compute) on a per-second, autoscaling basis. Best for: production pipelines with predictable, high-volume queries. You pay for compute time, not data scanned. Starting at 100 slots (~$0.04/slot-hour for Standard edition).

💰

Cost drivers (On-Demand): Bytes scanned per query (after column pruning and partition pruning). Use --dry_run to preview cost before executing. Cost drivers (Slots): Slot-hours consumed × edition rate. Storage: $0.02/GB/month active, $0.01/GB/month long-term (90+ days unmodified). Streaming inserts: $0.01 per 200 MB.

Console Navigation

GCP Console → BigQuery → SQL Workspace → Compose New Query → Check "Bytes processed" estimate before running

Python — Query a Public Dataset

# bigquery_demo.py — Query a public dataset using the BigQuery Python client
from google.cloud import bigquery

# Client uses Application Default Credentials
client = bigquery.Client()

# ── Query the public GitHub dataset ──
# This scans ~6 GB on-demand = ~$0.04
query = """
    SELECT
        language.name AS language,
        COUNT(*) AS repo_count
    FROM
        `bigquery-public-data.github_repos.languages`,
        UNNEST(language) AS language
    GROUP BY
        language
    ORDER BY
        repo_count DESC
    LIMIT 20
"""

# Use QueryJobConfig for cost control
job_config = bigquery.QueryJobConfig(
    # Set a maximum bytes billed to prevent runaway costs
    maximum_bytes_billed=10 * 1024 ** 3,  # 10 GB limit
    # Use standard SQL (default, but explicit is good)
    use_legacy_sql=False,
)

# Dry run first to check cost
dry_run_config = bigquery.QueryJobConfig(dry_run=True, use_legacy_sql=False)
dry_run_job = client.query(query, job_config=dry_run_config)
mb_scanned = dry_run_job.total_bytes_processed / (1024 ** 2)
print(f"This query will scan {mb_scanned:.1f} MB")

# Execute the actual query
query_job = client.query(query, job_config=job_config)
results = query_job.result()

print(f"\nTop programming languages on GitHub:")
for row in results:
    print(f"  {row.language:20} {row.repo_count:>,} repos")

print(f"\nTotal bytes billed: {query_job.total_bytes_billed:,}")

💡

Always set maximum_bytes_billed. This acts as a safety net — if the query would scan more than the limit, BigQuery rejects it instead of running. This single line of code can prevent a $500 mistake.

Cost Optimization Strategies

BigQuery Cost Reduction Checklist Save 50–90%

Partition tables by date/timestamp — queries that filter on the partition column only scan relevant partitions. Cluster tables by frequently filtered columns (up to 4). Never use SELECT * — always specify columns. Use materialized views for repeated queries. Export to BI Engine for sub-second dashboard queries (cached in memory). Set per-user and per-project query quotas to prevent accidental cost spikes.

Terraform — BigQuery Dataset and Table

# terraform/bigquery.tf — Partitioned and clustered BigQuery table

resource "google_bigquery_dataset" "analytics" {
  dataset_id = "analytics"
  location   = "US"

  # Default table expiration: 180 days (auto-cleanup for temp data)
  default_table_expiration_ms = 15552000000

  labels = {
    environment = "production"
    managed_by  = "terraform"
  }
}

resource "google_bigquery_table" "events" {
  dataset_id = google_bigquery_dataset.analytics.dataset_id
  table_id   = "events"

  # Partition by ingestion time (or a specific TIMESTAMP/DATE column)
  time_partitioning {
    type  = "DAY"
    field = "event_timestamp"
    expiration_ms = 7776000000  # 90-day partition expiry
  }

  # Cluster by frequently filtered columns — up to 4
  clustering = ["user_id", "event_type"]

  schema = jsonencode([
    { name = "event_id",        type = "STRING",    mode = "REQUIRED" },
    { name = "event_timestamp", type = "TIMESTAMP", mode = "REQUIRED" },
    { name = "user_id",          type = "STRING",    mode = "REQUIRED" },
    { name = "event_type",       type = "STRING",    mode = "REQUIRED" },
    { name = "properties",       type = "JSON",      mode = "NULLABLE" },
  ])

  labels = {
    environment = "production"
  }
}

Cloud SQL & Firestore

GCP offers both managed relational databases (Cloud SQL) and a serverless NoSQL document store (Firestore). Choose based on your data model — if you need joins, transactions, and schemas, use Cloud SQL. If you need flexible documents with real-time sync, use Firestore.

Cloud SQL

Cloud SQL is a fully managed service for MySQL, PostgreSQL, and SQL Server. Google handles replication, backups, encryption, and patching. It supports High Availability with automatic failover (regional instance with a standby in another zone).

Feature	Cloud SQL	AlloyDB (Premium)
Engine	MySQL, PostgreSQL, SQL Server	PostgreSQL-compatible only
Performance	Standard managed DB	4x faster than standard PostgreSQL (Google claims)
HA	Regional with zonal failover	Regional with <1 sec failover
Best for	Standard OLTP workloads, lift-and-shift	High-performance OLTP, hybrid OLAP/OLTP
Pricing	vCPU/hour + storage/GB	vCPU/hour + storage/GB (higher base)

Firestore

Firestore is a serverless, NoSQL document database with real-time syncing and offline support. Documents are organized in collections, and queries are indexed automatically. Firestore operates in two modes:

Native mode: Full Firestore features including real-time listeners, offline cache, and mobile SDK support. Best for web/mobile apps.
Datastore mode: Backward-compatible with the legacy Datastore API. No real-time features, but supports server-side workloads with higher throughput.

💰

Cloud SQL cost drivers: Instance vCPU-hours + storage (SSD/HDD per GB/month) + HA doubles the instance cost + backups. Firestore cost drivers: Document reads ($0.06/100K), writes ($0.18/100K), deletes ($0.02/100K) + storage ($0.18/GB/month). Firestore free tier: 50K reads, 20K writes, 20K deletes per day.

Terraform — Cloud SQL with HA

# terraform/cloud_sql.tf — Cloud SQL PostgreSQL with HA and private IP

resource "google_sql_database_instance" "postgres" {
  name             = "prod-postgres"
  database_version = "POSTGRES_16"
  region           = "us-central1"

  settings {
    tier = "db-custom-4-16384"  # 4 vCPU, 16 GB RAM

    # High Availability — automatic failover to another zone
    availability_type = "REGIONAL"

    # Disk configuration
    disk_size         = 100  # GB
    disk_type         = "PD_SSD"
    disk_autoresize   = true

    # Private IP only — no public exposure
    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.vpc.id
    }

    # Automated backups
    backup_configuration {
      enabled                        = true
      point_in_time_recovery_enabled = true
      start_time                     = "03:00"
      transaction_log_retention_days = 7
      backup_retention_settings {
        retained_backups = 14
      }
    }

    # Maintenance window — Sunday 4AM
    maintenance_window {
      day          = 7
      hour         = 4
      update_track = "stable"
    }
  }

  deletion_protection = true
}

Module 3: Networking & IAM

GCP's networking model is fundamentally different from AWS and Azure. VPCs are global, subnets are regional, and firewall rules are centralized. The IAM hierarchy (Organization → Folders → Projects) is how Google expects you to model your organization.

VPC & Shared VPC

A GCP Virtual Private Cloud (VPC) is a global resource. Unlike AWS (where a VPC is regional) or Azure (where a VNet is regional), a single GCP VPC spans all regions. Subnets, however, are regional — each subnet belongs to one region and one VPC.

GCP VPC vs AWS VPC vs Azure VNet

Feature	GCP VPC	AWS VPC	Azure VNet
Scope	Global	Regional	Regional
Subnets	Regional	Zonal (AZ-bound)	Regional
Firewall	VPC-level rules with tags/SAs	Security Groups per ENI	NSGs per subnet/NIC
Peering	Global (cross-region peering built in)	Regional (cross-region costs extra)	Global
Cross-VPC	Shared VPC (centralized)	Transit Gateway	Virtual WAN

💡

Key insight: Because GCP VPCs are global, you don't need to create separate VPCs per region or peer them together. A single VPC with regional subnets can span your entire multi-region infrastructure. This dramatically simplifies network architecture.

Shared VPC

Shared VPC lets you designate a host project that owns the VPC and subnets, while service projects use those subnets for their resources (VMs, GKE clusters, etc.). This centralizes network management and security while allowing teams to manage their own compute resources.

When to Use Shared VPC Architecture Pattern

Enterprise standard: If you have more than 2–3 projects, use Shared VPC. It prevents IP overlap, centralizes firewall rules, and gives the networking team a single pane of glass. Without Shared VPC: Each project creates its own VPC, leading to IP conflicts, duplicated firewall rules, and peering nightmares. Shared VPC is free — there's no reason not to use it.

💰

Cost drivers: VPCs and subnets are free. You pay for: Cloud NAT ($0.045/hour/gateway + $0.045/GB processed), Cloud VPN ($0.075/hour/tunnel), Egress ($0.12/GB to internet; inter-region egress $0.01–0.08/GB), and Static external IPs ($0.01/hour when unattached). Cloud NAT is the most common "hidden" cost.

Console Navigation

GCP Console → VPC Network → VPC Networks → Create VPC Network → Add Subnets (Custom Mode) → Configure Firewall Rules

Terraform — VPC with Public and Private Subnets

# terraform/vpc.tf — Custom VPC with public and private subnets

resource "google_compute_network" "vpc" {
  name                    = "prod-vpc"
  auto_create_subnetworks = false  # Custom mode — we define our own subnets
  routing_mode            = "GLOBAL"
}

# ── Public subnet (VMs can have external IPs) ──
resource "google_compute_subnetwork" "public" {
  name          = "public-us-central1"
  ip_cidr_range = "10.10.0.0/24"
  region        = "us-central1"
  network       = google_compute_network.vpc.id

  # Enable Flow Logs for network monitoring
  log_config {
    aggregation_interval = "INTERVAL_5_SEC"
    flow_sampling        = 0.5
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

# ── Private subnet (no external IPs, uses Cloud NAT for outbound) ──
resource "google_compute_subnetwork" "private" {
  name                     = "private-us-central1"
  ip_cidr_range            = "10.20.0.0/24"
  region                   = "us-central1"
  network                  = google_compute_network.vpc.id
  private_ip_google_access = true  # Access Google APIs without external IP

  # Secondary ranges for GKE Pods and Services
  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.100.0.0/16"
  }
  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.200.0.0/20"
  }
}

# ── Cloud NAT for private subnet outbound access ──
resource "google_compute_router" "router" {
  name    = "nat-router"
  region  = "us-central1"
  network = google_compute_network.vpc.id
}

resource "google_compute_router_nat" "nat" {
  name                               = "cloud-nat"
  router                             = google_compute_router.router.name
  region                             = "us-central1"
  nat_ip_allocate_option             = "AUTO_ONLY"
  source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS"

  subnetwork {
    name                    = google_compute_subnetwork.private.id
    source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
  }
}

# ── Firewall: Allow SSH via IAP (no public SSH port) ──
resource "google_compute_firewall" "allow_iap_ssh" {
  name    = "allow-iap-ssh"
  network = google_compute_network.vpc.id

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  # IAP's IP range — only IAP can initiate SSH
  source_ranges = ["35.235.240.0/20"]
  target_tags   = ["allow-ssh"]
}

# ── Firewall: Allow internal traffic between subnets ──
resource "google_compute_firewall" "allow_internal" {
  name    = "allow-internal"
  network = google_compute_network.vpc.id

  allow {
    protocol = "tcp"
    ports    = ["0-65535"]
  }
  allow {
    protocol = "udp"
    ports    = ["0-65535"]
  }
  allow {
    protocol = "icmp"
  }

  source_ranges = ["10.10.0.0/24", "10.20.0.0/24"]
}

IAM & Resource Hierarchy

GCP's Identity and Access Management is built around a resource hierarchy. Permissions granted at a higher level are inherited by all children. This is fundamentally different from AWS (where IAM is account-flat) and gives GCP a powerful organizational model.

The GCP Hierarchy

# GCP Resource Hierarchy — permissions flow downward

Organization (example.com)
├── Folder: Engineering
│   ├── Folder: Backend
│   │   ├── Project: backend-prod        ←  VMs, GKE, Cloud SQL live here
│   │   └── Project: backend-staging
│   └── Folder: Data
│       ├── Project: data-warehouse-prod  ←  BigQuery datasets live here
│       └── Project: data-warehouse-dev
├── Folder: Marketing
│   └── Project: marketing-analytics
└── Folder: Shared Services
    ├── Project: shared-networking        ←  Shared VPC host project
    └── Project: shared-monitoring

Key IAM Concepts

Concept	What It Is	Example
Principal	Who is making the request (user, group, service account)	`user:alice@example.com`
Role	Collection of permissions (predefined or custom)	`roles/bigquery.dataViewer`
Policy Binding	Attaches a role to a principal at a resource level	"Alice gets BigQuery Viewer on project X"
Service Account	Identity for applications and VMs (not humans)	`my-app@project.iam.gserviceaccount.com`
Workload Identity	Maps Kubernetes SAs to GCP Service Accounts	GKE pods authenticate as GCP SAs

Service Accounts — The GCP Way

In GCP, Service Accounts are the primary way applications authenticate. Unlike AWS IAM Users (with Access Keys), GCP Service Accounts use short-lived tokens automatically rotated by the platform. Never download service account key files — use attached service accounts on Compute Engine, GKE (Workload Identity), or Cloud Functions.

🚨

Never download SA key files. Service Account JSON keys are the GCP equivalent of AWS Access Keys — long-lived credentials that can be leaked. Instead, attach SAs directly to resources (VMs, functions, GKE pods). For external applications, use Workload Identity Federation.

Python — Authenticate and List Resources

# iam_demo.py — Authenticate with a Service Account and list resources
from google.cloud import compute_v1
from google.cloud import resourcemanager_v3
import google.auth

# ── Application Default Credentials (ADC) ──
# On GCE/GKE: automatically uses the attached Service Account
# Locally: uses credentials from `gcloud auth application-default login`
credentials, project = google.auth.default()
print(f"Authenticated as project: {project}")

# ── List Compute Engine instances ──
instance_client = compute_v1.InstancesClient()
request = compute_v1.AggregatedListInstancesRequest(project=project)

print("\nCompute Engine instances:")
for zone, instances_scoped_list in instance_client.aggregated_list(request=request):
    if instances_scoped_list.instances:
        for instance in instances_scoped_list.instances:
            print(f"  {instance.name:30} {instance.status:10} {zone}")

# ── Impersonate a Service Account (no key file needed) ──
from google.auth import impersonated_credentials

target_sa = "data-pipeline@my-project.iam.gserviceaccount.com"
target_scopes = ["https://www.googleapis.com/auth/cloud-platform"]

# Create impersonated credentials — requires iam.serviceAccountTokenCreator role
impersonated_creds = impersonated_credentials.Credentials(
    source_credentials=credentials,
    target_principal=target_sa,
    target_scopes=target_scopes,
    lifetime=3600,  # 1 hour max
)

# Use impersonated credentials with any Google Cloud client
from google.cloud import bigquery
bq_client = bigquery.Client(credentials=impersonated_creds, project=project)
datasets = list(bq_client.list_datasets())
print(f"\nDatasets accessible as {target_sa}: {len(datasets)}")

Terraform — IAM Bindings

# terraform/iam.tf — Project-level and resource-level IAM bindings

# ── Grant a group BigQuery Data Viewer on the project ──
resource "google_project_iam_member" "bq_viewer" {
  project = var.project_id
  role    = "roles/bigquery.dataViewer"
  member  = "group:data-analysts@example.com"
}

# ── Grant a Service Account storage access on a specific bucket ──
resource "google_storage_bucket_iam_member" "pipeline_writer" {
  bucket = google_storage_bucket.data_lake.name
  role   = "roles/storage.objectCreator"
  member = "serviceAccount:${google_service_account.pipeline_sa.email}"
}

# ── Create a custom role with minimal permissions ──
resource "google_project_iam_custom_role" "log_reader" {
  role_id     = "customLogReader"
  title       = "Custom Log Reader"
  description = "Can read logs but not modify anything"

  permissions = [
    "logging.logEntries.list",
    "logging.logs.list",
    "logging.logServices.list",
  ]
}

Cloud Identity-Aware Proxy (IAP)

IAP lets you control access to your web applications and VMs without a VPN. It acts as a reverse proxy that verifies the user's identity (via Google Sign-In or external IdP) and checks IAM permissions before forwarding the request. This is Google's implementation of the BeyondCorp zero-trust security model.

How IAP Works

User Request → Load Balancer → IAP (Identity Check) → IAM Authorization → Backend Service

Use Cases

Internal dashboards: Protect Grafana, admin panels, or internal tools without exposing them to the internet or requiring VPN access.
SSH/RDP via IAP tunnels: Replace bastion hosts. Use gcloud compute ssh --tunnel-through-iap to SSH into private VMs through IAP without any external IP.
Context-aware access: Combine with Access Context Manager to require device posture (managed device, screen lock enabled) before granting access.

💡

Replace your bastion hosts with IAP tunnels. Instead of maintaining a bastion VM ($30+/month), use gcloud compute ssh --tunnel-through-iap VM_NAME. Traffic is encrypted end-to-end, access is IAM-controlled, and there's no public IP to expose. IAP TCP tunneling is free.

💰

Cost drivers: IAP itself is free. You pay for the Load Balancer fronting the IAP-protected backend ($0.025/hour + $0.008/GB processed). IAP TCP tunnels (SSH/RDP) are free. The savings from removing VPN infrastructure and bastion hosts typically exceed the LB cost.

Terraform — IAP-Protected Backend

# terraform/iap.tf — Enable IAP on a backend service

# Enable IAP on the backend service
resource "google_iap_web_backend_service_iam_member" "access" {
  web_backend_service = google_compute_backend_service.app.name
  role                = "roles/iap.httpsResourceAccessor"
  member              = "group:developers@example.com"
}

# IAP OAuth consent — required for web app protection
resource "google_iap_brand" "default" {
  support_email     = "admin@example.com"
  application_title = "Internal Tools"
}

resource "google_iap_client" "default" {
  display_name = "IAP Client"
  brand        = google_iap_brand.default.name
}

Module 4: Pricing & Billing Management

GCP's pricing model rewards data-heavy, long-running workloads with automatic discounts (SUD, CUDs). But without visibility, costs can spiral quickly — especially with BigQuery, egress, and over-provisioned VMs. This module covers the tools to estimate, monitor, and control spend.

GCP Pricing Calculator

The Google Cloud Pricing Calculator helps estimate monthly costs for multi-service architectures before deployment. For BigQuery specifically, you need to estimate both storage and analysis costs separately.

Estimating BigQuery Costs (Example)

Scenario: Your team runs 50 queries/day, each scanning an average of 20 GB, on a 5 TB dataset.

Component	Calculation	Monthly Cost
Storage (Active)	5 TB × $0.02/GB = 5,000 GB × $0.02	$100.00
On-Demand Analysis	50 queries/day × 20 GB × 30 days = 30 TB × $6.25/TB	$187.50
Free tier offset	First 1 TB/month is free	-$6.25
Total		$281.25 / month

⚠

Compare with Capacity pricing: If you partition tables well and your teams run queries continuously, BigQuery Editions (slot-based) may be cheaper. At 100 Standard slots × $0.04/slot-hour × 730 hours = $2,920/month — only cost-effective if you'd exceed that in on-demand scanning. For most teams, on-demand + partitioning is the sweet spot.

Console Navigation

cloud.google.com/products/calculator → Add Services → Configure Parameters → Share Estimate Link

Hidden Costs Checklist

Common Cost Surprises on GCP Budget Review

Cloud NAT: $0.045/hour = ~$32/month per gateway + per-GB processing fees. Load Balancers: $0.025/hour = ~$18/month even with zero traffic. Egress: $0.12/GB to internet (first 1 GB/month free). Static IPs: $0.01/hour when not attached. Persistent Disk snapshots: Charged per GB stored. Log ingestion (Cloud Logging): First 50 GB/month free, then $0.50/GB.

Billing Export to BigQuery

GCP's most powerful cost analysis feature is Billing Export to BigQuery. Once enabled, every line item on your invoice is exported to a BigQuery table in near-real-time. You can then run SQL queries to find cost anomalies, break down spend by project/service/label, and build custom dashboards.

Setup

Billing Account → Billing Export → Edit Settings (BigQuery Export) → Select Project & Dataset → Enable Standard & Detailed Usage

SQL — Top 10 Costliest Services This Month

-- billing_analysis.sql — Find top cost drivers in your billing export

SELECT
  service.description AS service_name,
  SUM(cost) + SUM(IFNULL(
    (SELECT SUM(c.amount) FROM UNNEST(credits) c), 0
  )) AS net_cost
FROM
  `my-billing-project.billing_dataset.gcp_billing_export_v1_XXXXXX`
WHERE
  invoice.month = FORMAT_DATE('%Y%m', CURRENT_DATE())
GROUP BY
  service_name
ORDER BY
  net_cost DESC
LIMIT 10

SQL — Daily Spend by Project (for Anomaly Detection)

-- daily_by_project.sql — Track spend trends per project per day

SELECT
  DATE(usage_start_time) AS usage_date,
  project.id AS project_id,
  SUM(cost) AS daily_cost
FROM
  `my-billing-project.billing_dataset.gcp_billing_export_v1_XXXXXX`
WHERE
  usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  usage_date, project_id
ORDER BY
  usage_date DESC, daily_cost DESC

💡

Pro tip: Connect Looker Studio (formerly Data Studio) to your billing BigQuery table for free, auto-refreshing cost dashboards. Label all resources with team, environment, and cost_center labels — the billing export includes labels, making per-team chargeback trivial.

Quotas

Every GCP API has quotas — limits on how many requests per minute, how many resources per project, or how much capacity you can use. Quotas exist to protect both Google and you (preventing a runaway script from creating 10,000 VMs). Unlike AWS service limits, GCP quotas are granular and can block you silently.

Common Quota Traps

Quota	Default Limit	What Happens
CPUs per region	24 (new projects)	Cannot create VMs — API returns `QUOTA_EXCEEDED`
GKE nodes per zone	1,000	Node pool can't scale
BigQuery concurrent slots	2,000 (on-demand)	Queries queue and slow down
Cloud Functions per region	1,000	Deployment fails
External IP addresses	8 per region	Cannot attach public IPs

How to Request Quota Increases

GCP Console → IAM & Admin → Quotas & System Limits → Filter by Service → Select Quota → Edit Quotas

⚠

Quota increases are not instant. Some take minutes (CPU quotas), others take days (GPU quotas, especially for A100/H100). Plan ahead — request GPU quotas weeks before you need them. New billing accounts start with very low quotas; spending history increases your default limits.

Python — Check Quotas Programmatically

# check_quotas.py — List quotas and usage for a project/region
from google.cloud import compute_v1

client = compute_v1.RegionsClient()
project = "my-gcp-project-id"
region = "us-central1"

region_info = client.get(project=project, region=region)

print(f"Quotas for {region}:")
for quota in region_info.quotas:
    usage_pct = (quota.usage / quota.limit * 100) if quota.limit > 0 else 0
    if usage_pct > 50:  # Only show quotas above 50% usage
        print(f"  ⚠ {quota.metric:35} {quota.usage:.0f}/{quota.limit:.0f} ({usage_pct:.0f}%)")

Module 5: Common Pitfalls

Every cloud platform has traps. GCP's are unique because of its global VPC model, BigQuery's scan-based pricing, and the ease of creating new projects. This module covers the mistakes that cost real money and create real security incidents.

Default VPC Rules — The Silent Security Risk

Every GCP project comes with a default VPC and two dangerously permissive firewall rules:

Rule Name	What It Allows	Why It's Dangerous
`default-allow-internal`	All TCP, UDP, ICMP between `10.128.0.0/9`	Any VM can talk to any other VM across all subnets in the VPC — no segmentation
`default-allow-ssh`	TCP:22 from `0.0.0.0/0`	SSH is open to the entire internet
`default-allow-rdp`	TCP:3389 from `0.0.0.0/0`	RDP is open to the entire internet
`default-allow-icmp`	ICMP from `0.0.0.0/0`	Enables reconnaissance via ping sweeps

🚨

Never use the default VPC for production. The default-allow-ssh rule alone means every VM you create has SSH open to the internet by default. This is the GCP equivalent of leaving your front door open. Always create a custom VPC with explicit, least-privilege firewall rules.

Fix: Delete Default VPC Firewall Rules

# Delete the dangerous default rules (run once per project)
gcloud compute firewall-rules delete default-allow-ssh \
  --project=my-gcp-project-id --quiet

gcloud compute firewall-rules delete default-allow-rdp \
  --project=my-gcp-project-id --quiet

gcloud compute firewall-rules delete default-allow-icmp \
  --project=my-gcp-project-id --quiet

# Or better: delete the entire default VPC and create a custom one
gcloud compute networks delete default \
  --project=my-gcp-project-id --quiet

Terraform — Organization Policy to Block Default Network

# terraform/org_policy.tf — Prevent default VPC creation in all new projects

resource "google_organization_policy" "skip_default_network" {
  org_id     = var.org_id
  constraint = "constraints/compute.skipDefaultNetworkCreation"

  boolean_policy {
    enforced = true
  }
}

# This ensures every new project starts with NO default VPC.
# Teams must create custom VPCs with proper firewall rules.

BigQuery Query Costs — The $100 SELECT *

BigQuery's on-demand pricing charges $6.25 per TB scanned. A single SELECT * on a large, unpartitioned table can be shockingly expensive. This is the most common cost mistake on GCP.

The Scenario

Real-World Cost Example $$$

A developer runs SELECT * FROM events on a 20 TB unpartitioned table to "just look at a few rows." BigQuery scans the entire table — all 20 TB. Cost: 20 × $6.25 = $125.00 for one query. They run it 5 times with small modifications: $625 in an afternoon. With partitioning and a WHERE event_date = '2026-03-31' filter, the same query scans 50 GB: $0.31.

Prevention Strategies

Always partition tables: Use time_partitioning on a date/timestamp column. Queries with partition filters scan only matching partitions.
Always cluster tables: Cluster by frequently filtered columns (user_id, event_type). BigQuery skips irrelevant blocks.
Never use SELECT *: Specify only the columns you need. BigQuery is columnar — fewer columns = less data scanned.
Use --dry_run: Preview bytes scanned before executing. In the Console, check the green badge showing "This query will process X GB."
Set maximum_bytes_billed: Add maximumBytesBilled to every query config as an automatic guard.
Set per-user quotas: Limit each user to X TB/day of on-demand scanning.

Terraform — BigQuery Per-User Quota

# Set a custom per-user query quota via gcloud (not directly in Terraform)
# Limits each user to 1 TB of on-demand scanning per day

gcloud alpha bq settings update \
  --project=my-project-id \
  --default-query-job-timeout=600s

# For programmatic enforcement, use BigQuery Reservations API
# to set per-project slot caps in Capacity mode

💡

Use the BigQuery query validator. In the GCP Console BigQuery editor, the green/red status badge in the top-right shows estimated bytes scanned before you click "Run." Train your team to always check this badge. If it shows "TB" instead of "GB", stop and add a partition filter.

Project Proliferation

GCP makes it easy to create projects — too easy. Without governance, teams create ad-hoc projects for experiments, POCs, and one-off demos. These accumulate. Each project may have running resources (VMs, Cloud SQL instances, GKE clusters) that no one monitors. This is how a $5K/month GCP bill becomes $50K/month.

The Problem

Symptom	Root Cause	Impact
50+ projects in the org	No naming convention or folder structure	Impossible to track ownership or costs
Projects with no labels	No enforcement of labeling at creation	Cannot attribute costs to teams
Projects with no billing budget	No org-level budget policy	Runaway spend goes unnoticed for weeks
"Zombie" projects	POC completed but project not deleted	VMs, Cloud SQL, GKE clusters running 24/7

Prevention: Centralized Project Factory

Use a Project Factory pattern (via Terraform) to standardize project creation. Every project gets a naming convention, required labels, a billing budget, and folder placement.

# terraform/project_factory.tf — Standardized project creation

resource "google_project" "managed" {
  name            = "${var.team}-${var.environment}"
  project_id      = "${var.org_prefix}-${var.team}-${var.environment}"
  folder_id       = var.folder_id
  billing_account = var.billing_account_id

  labels = {
    team        = var.team
    environment = var.environment
    cost_center = var.cost_center
    created_by  = "terraform"
  }
}

# ── Automatically enable required APIs ──
resource "google_project_service" "required_apis" {
  for_each = toset([
    "compute.googleapis.com",
    "container.googleapis.com",
    "bigquery.googleapis.com",
    "logging.googleapis.com",
    "monitoring.googleapis.com",
  ])

  project = google_project.managed.project_id
  service = each.value
}

# ── Create a billing budget for the project ──
resource "google_billing_budget" "project_budget" {
  billing_account = var.billing_account_id
  display_name    = "Budget: ${google_project.managed.name}"

  budget_filter {
    projects = ["projects/${google_project.managed.number}"]
  }

  amount {
    specified_amount {
      currency_code = "USD"
      units         = var.monthly_budget
    }
  }

  threshold_rules {
    threshold_percent = 0.5   # Alert at 50%
  }
  threshold_rules {
    threshold_percent = 0.8   # Alert at 80%
  }
  threshold_rules {
    threshold_percent = 1.0   # Alert at 100%
  }
  threshold_rules {
    threshold_percent = 1.5   # Alert at 150% (overspend)
    spend_basis       = "CURRENT_SPEND"
  }

  all_updates_rule {
    monitoring_notification_channels = [var.notification_channel_id]
  }
}

Project Governance Checklist Organization Admin

1. Use Terraform Project Factory for all project creation — no Console/gcloud ad-hoc. 2. Enforce required labels via Organization Policy. 3. Attach a billing budget to every project at creation. 4. Quarterly audit: list all projects, check for running resources, delete zombies. 5. Use Folder structure (Engineering, Data, Shared) to group projects logically. 6. Enable Recommender API to surface idle VMs, overdprovisioned instances, and unused IPs.

🚨

Audit regularly. Run gcloud projects list --filter="lifecycleState=ACTIVE" monthly. Cross-reference with billing export to find projects with non-zero spend but no recent code deployments. These are your "zombie" projects.

Google Cloud (GCP) Engineer Handbook

Table of Contents

Module 1: Compute & Containers

Compute Engine

Machine Families

Pricing Models

Sustained Use Discounts (SUD)

Console Navigation

Terraform — Launch a Compute Engine VM

Cloud Functions

Gen 1 vs Gen 2

Python — Cloud Function Triggered by GCS Upload

Deploy with gcloud CLI

Terraform — Cloud Function (Gen 2)

Google Kubernetes Engine (GKE)

Autopilot vs Standard Mode

Console Navigation

Terraform — GKE Autopilot Cluster

Module 2: Data, Storage & Analytics

Cloud Storage (GCS)

Storage Classes

Location Types

Python — Upload and Download Objects

Terraform — Secure GCS Bucket

BigQuery

Architecture: Columnar & Serverless

Pricing Models

Console Navigation

Python — Query a Public Dataset

Cost Optimization Strategies

Terraform — BigQuery Dataset and Table

Cloud SQL & Firestore

Cloud SQL

Firestore

Terraform — Cloud SQL with HA

Module 3: Networking & IAM

VPC & Shared VPC

GCP VPC vs AWS VPC vs Azure VNet

Shared VPC

Console Navigation

Terraform — VPC with Public and Private Subnets

IAM & Resource Hierarchy

The GCP Hierarchy

Key IAM Concepts

Service Accounts — The GCP Way

Python — Authenticate and List Resources

Terraform — IAM Bindings

Cloud Identity-Aware Proxy (IAP)

How IAP Works

Use Cases

Terraform — IAP-Protected Backend

Module 4: Pricing & Billing Management

GCP Pricing Calculator

Estimating BigQuery Costs (Example)

Console Navigation

Hidden Costs Checklist

Billing Export to BigQuery

Setup

SQL — Top 10 Costliest Services This Month

SQL — Daily Spend by Project (for Anomaly Detection)

Quotas

Common Quota Traps

How to Request Quota Increases

Python — Check Quotas Programmatically

Module 5: Common Pitfalls

Default VPC Rules — The Silent Security Risk

Fix: Delete Default VPC Firewall Rules

Terraform — Organization Policy to Block Default Network

BigQuery Query Costs — The $100 SELECT *

The Scenario

Prevention Strategies

Terraform — BigQuery Per-User Quota

Project Proliferation

The Problem

Prevention: Centralized Project Factory