Fine-Tuning Machine Learning and LLM Models

A beginner-to-intermediate handbook focused on applied fine-tuning: what to tune, when to tune, how to structure data, which PEFT method to choose, and how to run a practical Hugging Face training loop without drowning in theory.

Beginner to Intermediate PyTorch + Hugging Face LoRA / QLoRA / Evaluation March 2026

Introduction to Fine-Tuning

Fine-tuning means taking a model that already learned general patterns and continuing training on a smaller, task-specific dataset. Instead of teaching the model everything from zero, you adapt what it already knows so it performs better on your domain, format, labels, or response style.

Pretraining on huge corpus

→

General language or task skill

→

Fine-tuning on your data

→

Specialized behavior

Comparison

Pretraining vs Fine-Tuning

Pretraining learns broad patterns from very large datasets such as web text, books, code, or domain corpora. It is expensive and usually done once.

Fine-tuning is a smaller follow-up step that adapts a pretrained model to a target task like classification, summarization, chat, ranking, or domain adaptation.

Comparison

Training From Scratch vs Transfer Learning

Training from scratch initializes random weights and requires a lot of data, compute, and time.

Transfer learning starts from a pretrained model and reuses existing knowledge. Fine-tuning is the most common transfer-learning pattern for modern ML and LLM work.

When Fine-Tuning Is Required vs Not Required

Fine-tune when the model must learn stable behavior Good fit

Use it for: domain terminology, consistent output format, label prediction, product-specific assistant behavior, ranking objectives, or local deployment where a smaller model must perform better on a narrow task.

Domain adaptation Instruction following Classification Reranking

Do not fine-tune by default Try cheaper options first

Skip it when: prompt engineering, retrieval-augmented generation (RAG), better evaluation prompts, or simple application logic can already solve the problem. Fine-tuning is slower to iterate and easier to get wrong if your dataset is weak.

Prompting first RAG for fresh facts Rules for deterministic logic

Practical rule: if the model already knows the knowledge but does not behave the way you want, fine-tuning can help. If the model lacks the facts and those facts change often, RAG or a search layer is usually a better first move.

Key Parameters in Fine-Tuning

Most failed fine-tuning runs come from a small set of knobs: learning rate, batch size, epochs, regularization, and scheduler behavior. You do not need to memorize every formula, but you should know what each parameter changes so you can debug training runs quickly.

from transformers import TrainingArguments

training_args = TrainingArguments(
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    warmup_steps=500,
    logging_steps=100
)

Formula

Weight update

w_{t+1} = w_t - \eta g_t. The learning rate \eta controls how large each parameter update is.

Formula

Effective batch size

batch * grad_accumulation * num_devices. This matters more than per-device batch size alone.

Formula

Warmup

lr_t = lr_max * t / warmup_steps during warmup. This prevents unstable early updates.

Formula

Gradient clipping

g_clipped = g * min(1, max_norm / ||g||). Large gradient spikes get scaled down.

Parameter	What it does	If you increase it	Overfitting	Convergence / Stability
Learning rate	Controls update size each step.	Training moves faster, but can overshoot good solutions.	Indirect effect. Too high can look like poor generalization because training becomes noisy.	Most sensitive knob. Too high causes divergence or oscillation. Too low wastes time and may underfit.
Batch size	Number of samples used for one gradient estimate.	Gradient becomes smoother and training often allows a slightly higher LR.	Very large batches can reduce helpful noise and sometimes generalize worse.	Usually improves stability, but uses more VRAM.
Epochs	How many full passes over the training data.	The model fits the training set more aggressively.	Main direct driver of overfitting if validation stops improving.	Enough epochs are needed to learn; too many memorize noise.
Weight decay	Penalizes large weights and acts as regularization.	Can reduce overfitting, but too much hurts fit.	Often helps if the model memorizes small datasets.	Moderate values improve robustness. Too high slows learning.
Gradient clipping	Caps very large gradient norms.	More aggressive clipping if the threshold is lower.	Not a direct anti-overfitting tool.	Strongly improves stability on transformer runs that spike in loss.
Warmup steps	Starts with a small LR and ramps up.	Safer early training, but too much warmup slows progress.	No major direct effect.	Useful for preventing early instability, especially with AdamW and large models.
Scheduler	Changes LR over time.	Depends on schedule type.	Good schedules reduce late-stage overfitting by lowering LR.	Linear and cosine are common because they balance speed and smooth finishing.

Scheduler Types

Scheduler	How it behaves	When to use it
Linear decay	LR ramps up, then decays steadily toward zero.	Safe default for many Hugging Face runs and classification tasks.
Cosine decay	LR drops slowly at first, then more smoothly near the end.	Common for LLM fine-tuning when you want gentle late-stage updates.
Constant with warmup	LR warms up, then stays fixed.	Useful when you want predictable tuning behavior or short runs.
Polynomial decay	LR shrinks using a curve you can shape.	More control-heavy setups; less common for beginners.

Debugging shortcut: if training is unstable, first lower learning rate, add or increase warmup, and make sure gradient clipping is on. If validation gets worse while training loss keeps improving, reduce epochs and strengthen regularization.

Data Formats for Fine-Tuning

Your dataset format should match the behavior you want the model to learn. Fine-tuning works best when the training examples already look like the prompts and outputs you expect at inference time.

Instruction Tuning Format

Use this when you want the model to follow task instructions and produce a single target answer.

{
  "instruction": "Translate English to French",
  "input": "Hello",
  "output": "Bonjour"
}

Chat Format

Use this when the model should behave like an assistant in multi-turn conversations.

[
  {"role": "user", "content": "What is AI?"},
  {"role": "assistant", "content": "AI stands for Artificial Intelligence..."}
]

Plain Text Format

Use this for language modeling, continued pretraining, or next-token prediction on raw text. The model learns to continue text rather than answer explicit instruction fields.

Artificial intelligence is a branch of computer science focused on building systems that can perform tasks requiring human-like reasoning.

Format	Best for	Pros	Cons
Instruction tuning	Single-turn tasks, extraction, summarization, QA, task transfer	Simple to build, easy to inspect, strong for SFT	Can miss conversational context if your real product is multi-turn
Chat format	Assistants, copilots, support bots, role-conditioned dialogue	Matches production chat behavior, preserves turns and tone	Formatting mistakes around roles can silently hurt training quality
Plain text	Continued pretraining, domain adaptation, language modeling	Easy to collect at scale, good for domain vocabulary	Does not directly teach instruction following or answer structure

Dataset Preparation Tips

Keep the output shape consistent. If production answers should be JSON, tables, or short bullets, teach exactly that format during training.
Deduplicate aggressively. Duplicates make the model memorize frequent samples and distort evaluation.
Clean bad labels first. A smaller high-quality dataset usually beats a larger noisy one.
Split train, validation, and test early. Do this before heavy preprocessing to reduce leakage.
Watch sequence length. Overly long samples waste tokens and can hide the actual signal.
Preserve edge cases. Include difficult examples so the model learns boundary behavior, not only easy cases.

Simple heuristic: if your application is a chat assistant, train on chat-shaped data. If it is a single structured task, instruction format is usually simpler and more sample-efficient.

Fine-Tuning Approaches

Not all fine-tuning means updating every weight. The main approaches differ in cost, memory, speed, and how much task-specific capacity you gain.

Approach

Full fine-tuning

Train all model parameters. Highest flexibility and often the best quality ceiling, but also the most expensive in VRAM, optimizer state, and checkpoint size.

Approach

Feature extraction

Freeze the base model and train only a small head on top. Common for classification when the pretrained representation is already strong and you only need a task-specific output layer.

Approach

PEFT

Keep most weights frozen and train a small number of new parameters such as LoRA adapters. This is the practical default for many LLM fine-tuning workloads.

Method	Memory	Speed	Performance	Use Case
Full fine-tuning	Highest	Slowest setup and heaviest checkpoints	Best ceiling when you have strong data and compute	High-value tasks, model ownership, strong hardware budget
Feature extraction	Lowest	Fast	Good for simpler supervised tasks, limited for generation	Classification, regression, small labeled datasets
PEFT	Low to medium	Fast training with small trainable state	Often close to full tuning on narrow tasks	LLMs on limited GPUs, rapid iteration, multiple adapters

Default recommendation: if you are fine-tuning a modern LLM and do not already know you need full fine-tuning, start with LoRA or QLoRA. The cost-to-quality ratio is usually much better.

LoRA, QLoRA, and Other Efficient Methods

Parameter-efficient fine-tuning lets you adapt large models without paying the full memory cost of updating every weight. This is why consumer-GPU fine-tuning became practical.

LoRA - Low-Rank Adaptation Most common PEFT method

Concept: instead of rewriting a full weight matrix W, LoRA learns a low-rank update Delta W = BA and applies W' = W + Delta W. Only the small adapter matrices are trained.

Why it is efficient: trainable parameter count drops dramatically, optimizer state becomes much smaller, and you can store many task adapters on top of one base model.

When to use it: instruction tuning, domain adaptation, lightweight experiments, or multiple task-specific adapters.

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1
)
model = get_peft_model(model, config)

Short intuition: r controls adapter capacity, lora_alpha scales the update, and target_modules decides where adapters are inserted.

QLoRA Quantization + LoRA

Concept: load the base model in low precision, typically 4-bit, and train LoRA adapters on top. The frozen base stays quantized while adapter weights remain trainable in higher precision.

Why it is memory efficient: the frozen base weights occupy far less VRAM, which makes 7B to 13B models accessible on much smaller GPUs.

When to prefer it over LoRA: when VRAM is your main constraint, especially for larger models or laptops and single-GPU systems.

Method	Approx VRAM for 7B training	Quality ceiling	Best use
Full fine-tuning	60 GB or more, depending on precision and optimizer	Highest	High-budget, high-stakes adaptation
LoRA	Roughly 16-24 GB	Very strong for many narrow tasks	Single-GPU adaptation with minimal performance loss
QLoRA	Roughly 10-16 GB	Slightly lower than LoRA in some setups, often close enough	Tight VRAM budgets and fast experimentation

Other Techniques

Adapters

Small trainable layers inserted between existing model layers. More parameters than prompt tuning, often easier to optimize than pure soft prompts.

Prefix tuning

Train virtual prefix vectors that steer attention without changing most model weights. Useful when you want a very small trainable footprint.

Prompt tuning

Train soft prompt embeddings only. Extremely lightweight, but can be weaker than LoRA for complex behavioral changes.

Technique	Main tradeoff	Good fit
LoRA	Best balance of simplicity, quality, and memory	Default PEFT choice
QLoRA	Lowest memory, slightly more complexity around quantization support	Limited VRAM environments
Adapters	More parameters than prompt-based methods	Reusable modular adapters
Prefix tuning	Very compact but less expressive for some tasks	Controlling style or task framing
Prompt tuning	Smallest footprint, but often weakest adaptation power	Resource-constrained experiments

Optimizers With Focus on Adam

An optimizer decides how parameter updates are applied. For transformer fine-tuning, Adam and especially AdamW are the common defaults because they adapt learning rates per parameter and handle noisy gradients well.

Optimizer

Adam

Adam combines momentum and RMS scaling. Momentum tracks the average direction of gradients, while RMS scaling shrinks updates for parameters that repeatedly receive large gradients.

Intuition

Why it works well

Instead of one global step size behaving the same everywhere, Adam adapts update size parameter-by-parameter. That makes it much easier to fine-tune deep transformer stacks than plain SGD.

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=2e-5)

Simple Explanation of Adam

Momentum: smooths noisy gradients by remembering recent directions. This helps the optimizer keep moving in a useful direction instead of reacting too hard to every mini-batch.
RMS scaling: uses a running estimate of squared gradients so parameters with consistently large gradients get smaller updates.
Bias correction: early moving averages are corrected so the optimizer does not underestimate them at the start.

m_t = beta1 * m_(t-1) + (1 - beta1) * g_t
v_t = beta2 * v_(t-1) + (1 - beta2) * g_t^2
update ~ m_t / (sqrt(v_t) + eps)

Choice	Use it when	Why
Adam	You want a simple adaptive optimizer baseline	Works well on many deep learning tasks, especially transformers
SGD	You have a very large dataset, vision-style training, or well-tuned classical pipelines	Can generalize well, but usually needs more careful tuning for LLM fine-tuning
AdamW	You are fine-tuning transformers	Decouples weight decay from gradient updates, which usually makes regularization cleaner and more stable

Adam vs AdamW: AdamW is usually the better default because weight decay is handled separately from the adaptive gradient update. In practice, most Hugging Face trainer runs use AdamW unless you override it.

Ranking and Training Objectives

Your training objective defines what “better” means during optimization. If you choose the wrong objective, the model can improve on paper while failing the real task.

Cross-entropy loss

The default objective for classification and next-token prediction. It rewards assigning high probability to the correct label or token.

Ranking loss

Optimizes order, not just absolute prediction. Useful when the model must put better results above worse results.

RLHF

Uses preference feedback to move the model toward outputs humans prefer. Often used after supervised fine-tuning for assistant behavior.

Cross-Entropy Loss

Use this for supervised fine-tuning, text classification, sequence labeling, and next-token prediction. It is usually the first objective to try because it aligns directly with “predict the correct answer.”

Ranking Loss

Use ranking objectives when the order of results matters more than a single label. Common examples include search, recommendation, reranking, and preference learning.

Type	What it compares	Typical use
Pairwise	One positive item vs one negative item	Search ranking, reward modeling, binary preference comparison
Listwise	A full ranked list	Recommendation systems, search pages, complex ranking pipelines

RLHF - Reinforcement Learning from Human Feedback

RLHF usually means: supervised fine-tune first, gather human preferences, train a reward model, then optimize the main model so it produces higher-reward responses. In practice, many teams now use simpler preference-optimization methods like DPO, but the high-level idea is the same: learn from what humans prefer, not only from fixed labels.

When ranking matters: search, recommendations, ads, product results, candidate reranking, and helpfulness or safety preference learning. If the business outcome depends on order, classification loss alone is often not enough.

Pretraining vs Fine-Tuning

Fine-tuning is usually enough, but not always. The harder question is whether your problem needs continued pretraining or even training from scratch before fine-tuning.

When pretraining is required Use rarely

Typical cases: very domain-specific data such as medical records, legal corpora, scientific literature, protein sequences, specialized codebases, or any setting where the base model repeatedly fails because the underlying vocabulary and distribution are too different.

When pretraining is not required Common case

Typical cases: general-purpose chat, classification, summarization, extraction, translation-like formatting, and many enterprise assistants where the base model already has the underlying capabilities.

Decision Checklist

Does the base model already understand the task? If yes, start with fine-tuning or even prompting.
Is your data highly domain-specific? If yes, continued pretraining may help before SFT.
Do you have millions or billions of tokens? If no, full pretraining is probably not worth it.
Do you mainly need style, format, or label adaptation? Fine-tuning is usually enough.
Do facts change frequently? Prefer RAG or retrieval over baking transient facts into weights.

Applied shortcut: use continued pretraining to teach the model the language of a domain, then use fine-tuning to teach the task behavior inside that domain.

Evaluation of Models

Evaluation is critical because training loss alone can lie. A model can memorize training data, look great on training metrics, and still fail on real users, new domains, or edge cases.

Why it matters

Avoid false confidence

Without evaluation, you cannot tell whether the model learned the task, memorized examples, or simply improved on easy cases while getting worse on hard ones.

What to check

Offline and online quality

Measure task metrics offline, inspect failures manually, and if the model ships to users, validate business outcomes in production.

Common Evaluation Methods

Metric	Best for	What to watch
Accuracy / F1	Classification, extraction, labeling	F1 is better than raw accuracy when classes are imbalanced
Perplexity	Language modeling, continued pretraining, causal LM tuning	Lower is better, but only meaningful when comparing on the same data distribution
BLEU / ROUGE	Translation, summarization	Helpful for automation, but can miss semantic quality and factuality
Human evaluation	Chat quality, safety, tone, helpfulness	Slow but often necessary for assistant behavior
Benchmark datasets	Standardized comparison	Useful for baselines, but make sure they match your real task

from datasets import load_metric

metric = load_metric("accuracy")
result = metric.compute(predictions=preds, references=labels)

The newer evaluate package is now common, but the example above still shows the same core idea: compute task metrics from predictions and reference labels.

Offline vs Online Evaluation

Mode	What it answers	Example
Offline	Did the model improve on held-out data?	F1 on a validation split, perplexity on dev text, human review set
Online	Did the model improve real product outcomes?	CTR, resolution rate, conversion, user preference, latency, cost

Overfitting Detection

Training loss keeps dropping while validation loss rises.
Generated outputs repeat training phrasing too closely.
Benchmark gains do not transfer to manual test prompts or business data.
Edge cases degrade even though average metrics improve.

Best Practices and Common Pitfalls

Most practical wins come from disciplined data and monitoring, not from exotic algorithms. These are the habits that consistently improve fine-tuning quality.

Best practices

Data quality > data quantity. Clean labels, consistent formatting, and realistic examples matter more than raw size.

Monitor validation loss. It is your early warning system for overfitting and bad hyperparameters.

Use early stopping. Stop when validation no longer improves.

Tune one thing at a time. Change learning rate or epochs first before chasing harder explanations.

Common pitfalls

Fine-tuning on noisy or duplicated data, skipping a validation set, using too high a learning rate, training for too many epochs, and shipping a model without manual error analysis are the fastest ways to waste compute.

Practical Tuning Tips

Start with a strong baseline. Compare prompt-only, RAG, and a simple fine-tune before scaling the project.
Log everything. Save configs, seeds, metrics, model revisions, and dataset versions.
Use early stopping and checkpointing. This saves compute and prevents losing a good run.
Inspect actual outputs every run. Metrics alone do not reveal formatting mistakes, hallucinations, or tone issues.
Keep a small gold set. Use a hand-curated evaluation set for fast regression checks.

Important: if your dataset contains sensitive or regulated information, do not assume fine-tuning is reversible. Model weights can absorb patterns from training data, so governance and privacy review should happen before training begins.

Hardware Considerations

Compute planning matters because the same method can feel easy or impossible depending on model size, precision, and optimization choices.

Setup	Typical fit	Notes
Single 12-16 GB GPU	Small models, LoRA, QLoRA, short sequences	Use 4-bit loading, gradient accumulation, and possibly gradient checkpointing
Single 24 GB GPU	7B-scale LoRA or QLoRA, moderate sequence lengths	Good sweet spot for practical experimentation
48 GB+ GPU	Larger LoRA runs, faster batch sizes, some full tuning experiments	Much more room for sequence length and optimizer state
Multi-GPU	Large models, distributed training, full fine-tuning	Needed when model states and activations no longer fit on one device

Mixed precision

Use fp16 or bf16 when supported. This usually reduces memory and speeds up training.

Gradient checkpointing

Recomputes activations during backward pass to save VRAM. Training becomes slower, but much more memory efficient.

Sequence length

Longer context quickly increases memory use. If you run out of VRAM, shorten sequence length before anything else.

Memory vs performance tradeoff: QLoRA, gradient checkpointing, and smaller batches make training possible on limited hardware, but they usually slow throughput. Decide whether your bottleneck is VRAM, time, or final model quality.

Real-World Use Cases

Fine-tuning is most useful when you need repeatable behavior that generic prompting cannot guarantee.

Support

Customer support assistants

Train on high-quality support conversations to improve tone, resolution steps, and response structure while keeping retrieval for changing policies.

Domain rerankers

Use pairwise or listwise ranking losses so relevant results appear higher for legal, medical, or enterprise search.

Analytics

Classification and extraction

Train smaller models for sentiment, routing, document labeling, or entity extraction with lower latency and lower cost than giant general-purpose models.

Domain adaptation

Specialized copilots

Adapt a base model to finance, law, healthcare, or internal company knowledge where vocabulary and expected output style differ from public internet text.

End-to-End Example

The example below shows a compact Hugging Face Trainer workflow for instruction-style fine-tuning on a small causal language model. It covers data loading, prompt formatting, tokenization, training, and evaluation.

Why this example is practical: it is small enough to understand in one sitting, but it uses the same pipeline shape you will use for larger instruction-tuning projects.

import math
from datasets import Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)

samples = [
    {
        "instruction": "Summarize the sentence",
        "input": "Fine-tuning adapts a pretrained model to a target task.",
        "output": "Fine-tuning specializes a pretrained model."
    },
    {
        "instruction": "Answer the question",
        "input": "What does GPU stand for?",
        "output": "GPU stands for Graphics Processing Unit."
    },
    {
        "instruction": "Rewrite in a formal tone",
        "input": "The launch failed because the config was wrong.",
        "output": "The launch failed due to an incorrect configuration."
    },
    {
        "instruction": "Extract the sentiment",
        "input": "The onboarding experience was smooth and very helpful.",
        "output": "positive"
    }
]

dataset = Dataset.from_list(samples).train_test_split(test_size=0.25, seed=42)
model_name = "distilgpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name)
model.config.pad_token_id = tokenizer.pad_token_id

def format_example(example):
    instruction = example["instruction"].strip()
    input_text = example["input"].strip()
    output_text = example["output"].strip()
    return (
        f"### Instruction:\n{instruction}\n\n"
        f"### Input:\n{input_text}\n\n"
        f"### Response:\n{output_text}"
    )

def add_text(example):
    example["text"] = format_example(example)
    return example

dataset = dataset.map(add_text)

def tokenize(example):
    tokens = tokenizer(
        example["text"],
        truncation=True,
        max_length=256,
        padding="max_length"
    )
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens

tokenized = dataset.map(tokenize)
tokenized = tokenized.remove_columns(["instruction", "input", "output", "text"])

training_args = TrainingArguments(
    output_dir="./ft-output",
    learning_rate=2e-5,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=3,
    weight_decay=0.01,
    warmup_steps=20,
    logging_steps=10,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    report_to="none"
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
    data_collator=data_collator
)

trainer.train()

metrics = trainer.evaluate()
print(metrics)

if "eval_loss" in metrics:
    print("perplexity:", math.exp(metrics["eval_loss"]))

What each step is doing

Load dataset: build or read examples with instruction, input, and output fields.
Format text: turn structured fields into the exact prompt style you want the model to learn.
Tokenize: convert text to token IDs and create labels for next-token learning.
Train with Trainer: use TrainingArguments to control learning rate, batch size, epochs, warmup, and logging.
Evaluate: inspect metrics and sample outputs before calling the run successful.

How to convert this into a PEFT run

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["c_attn"]
)

model = get_peft_model(model, lora_config)

For larger instruction-tuning projects, many teams switch from raw Trainer to specialized trainers such as SFTTrainer, but the core workflow remains the same.

Fine-Tuning Machine Learning and LLM Models

Introduction to Fine-Tuning

When Fine-Tuning Is Required vs Not Required

Key Parameters in Fine-Tuning

Scheduler Types

Data Formats for Fine-Tuning

Instruction Tuning Format

Chat Format

Plain Text Format

Dataset Preparation Tips

Fine-Tuning Approaches

LoRA, QLoRA, and Other Efficient Methods

Other Techniques

Optimizers With Focus on Adam

Simple Explanation of Adam

Ranking and Training Objectives

Cross-Entropy Loss

Ranking Loss

RLHF - Reinforcement Learning from Human Feedback

Pretraining vs Fine-Tuning

Decision Checklist

Evaluation of Models

Common Evaluation Methods

Offline vs Online Evaluation

Overfitting Detection

Best Practices and Common Pitfalls

Practical Tuning Tips

Hardware Considerations

Real-World Use Cases

End-to-End Example

What each step is doing

How to convert this into a PEFT run

Reference Links

Core Docs

Applied Reading