AP Courses AP Biology AP Biology Units AP Human Geography AP HUG Units AP Computer Science Principles AP CSP Units
Practice Daily Practice Practice by Course Practice by Topic Practice Tests
AP Exam Resources AP Exam Dates Registration Fees Scores & Credit What to Bring
Start Practicing → Login Register →

AP Computer Science Principles · Unit 2 · Data

Metadata and Information Extraction for AP CSP

Unit 2 · Metadata · ~7 min read

Data about data—EXIF GPS risks, filtering spreadsheets into information, cleaning, and correlation vs causation.

Metadata describes other data—timestamps, authors, camera models, GPS tags. AP CSP Unit 2 ties metadata to privacy risks, filtering spreadsheets into answers, and cleaning messy tables before charts or models run. Distinguish payload from labels, strip risky EXIF before publishing photos, and use precise words: data, information, metadata, filtering, cleaning.

Updated May 21, 2026Reviewed by APScore5 Editorial TeamAP CSP Unit 2 · Data

EXIF privacyFilteringData cleaning8 MCQs
Step 1DefinitionsPayload vs metadata.Step 2EXIF riskPhotos can leak location.Step 3FilteringTurn data into information.Step 4PracticeEight MCQs.
Direct answer

Metadata describes other data—timestamps, authors, camera models, GPS tags. AP CSP Unit 2 ties metadata to privacy risks, filtering spreadsheets into answers, and cleaning messy tables before charts or models run. Distinguish payload from labels, strip risky EXIF before publishing photos, and use precise words: data, information, metadata, filtering, cleaning.

Definitions

What Is Metadata?

How is it different from payload data?

Payload is the content you mean to share—the photo pixels, essay text, or audio samples. Metadata is the background label: who created it, when, with which device, sometimes where.

Sorting files by “date modified” uses metadata, not the essay’s thesis. Searching a music library by artist also reads metadata fields.

Students sometimes think metadata is optional decoration. For computers, metadata drives organization, search, and—on the exam—privacy stories.

Quick example: A CSV of survey results includes answers (payload) and column headers plus survey date (metadata). Keep the vocabulary straight in written responses.

Compare with big data and privacy in the same unit.

EXIF

Why Does Photo EXIF Matter for Privacy?

What real cases appear in class?

EXIF is camera metadata embedded in many photos. It can list lens settings, timestamps, and GPS coordinates captured at shutter time.

Posting a team photo after practice might leak home addresses if location tags stay attached. Journalism classes now teach “strip location before publish” alongside composition rules.

Removing metadata is not the same as blurring faces. Both can protect privacy, but EXIF removal stops map pins while blur hides identity in the pixels.

FieldExampleRisk
Date/time2026-05-21 14:03Reveals when you were present
DevicePhone modelHints at socioeconomic cues
GPSLat/longShows precise location

Compare with compressed file sizes in the same unit.

Filtering

How Does Filtering Produce Information?

What is an example filter?

Data becomes information when it answers a question. Filtering selects rows that meet criteria so you do not read a million-line spreadsheet manually.

Example: keep orders over $100 from a club fundraiser sheet to see which booths earned the most. The remaining rows answer a focused question.

Filters differ from sorting. Sorting rearranges; filtering hides non-matching rows. AP stems may test both—read carefully.

In pseudocode, a loop with an if-statement can filter a list the same way a spreadsheet filter does—connect metadata fields to conditions you code in Create Task projects.

Compare with lossless document scans in the same unit.

Cleaning

What Is Data Cleaning?

Why clean before analyzing?

Cleaning fixes typos, duplicate rows, inconsistent date formats, and missing codes. Charts built on messy tables mislead clubs and scientists alike.

If half the rows spell “PS” and half spell “PlayStation,” your game survey chart splits incorrectly until you standardize entries.

Cleaning is slow, unglamorous work, which is why AP highlights it—real analysis starts after sanity checks.

Lab habit: Before graphing survey results, sort each column for outliers and blank cells. Note fixes in a changelog so teammates trust the final chart.
Analysis

How Is Correlation Different From Causation?

What is a classic example?

Correlation means two variables move together; causation means one directly produces the other. Summer weather raises both ice cream sales and drowning incidents—heat is the shared cause, not ice cream.

Fitness tracker step counts may correlate with sleep hours without one causing the other; a busy schedule might drive both.

AP may show a scatter plot and ask what conclusion is valid. “Related” is safer than “causes” unless a controlled experiment supports causation.

When writing, name a possible hidden variable. That single sentence earns societal-impact credit.

Exam

What Metadata Questions Show Up?

Which vocabulary wins partial credit?

Use data, information, metadata, filtering, and cleaning precisely—avoid swapping terms mid-paragraph.

EXIF GPS stories appear often. Mention removal tools or publishing policies, not just “be careful online.”

Correlation vs causation pairs with big-data bias later; study both guides if you miss societal-impact items.

Answer the eight MCQs here, then read the big-data privacy guide for PII and re-identification depth.

Practice

AP-style practice on Metadata

Tap an answer to reveal the explanation. Choices shuffle on load. For a full mixed set, open 50 Unit 2 practice questions.

Metadata is best defined as:

Q1

EXIF in a photo may include:

Q2

Filtering a dataset means:

Q3

Correlation without causation example:

Q4

Data cleaning fixes:

Q5

Sort essays by date modified finds:

Q6

Information differs from raw data because:

Q7

Photo GPS metadata risk:

Q8
Before you leave

What you should be able to do now

Check each skill when you can explain it without looking at notes.

0 of 4 ready

Quick answers

Frequently asked questions

What is metadata in AP Computer Science Principles?

Metadata is data about data—information describing a file or record without being the main content. A photo's pixels are data; the camera model, timestamp, and GPS tag are metadata. Programs use metadata for sorting, searching, and privacy decisions. Exams expect precise vocabulary, not vague "extra info."

Why is photo EXIF metadata a privacy risk?

EXIF tags can record GPS coordinates, device serial numbers, and exact capture time. Sharing a cropped image online may still leak location if EXIF remains. Journalists and activists learn to strip metadata before posting. AP scenarios often ask you to identify location leakage, not to recite every EXIF field name.

What is data filtering?

Filtering selects records that meet conditions, such as orders over $100 or temperatures above 90°F. Filtering turns a large table into a smaller answer set aligned with a question. It is not the same as sorting; sorting reorders rows without necessarily removing any.

What is data cleaning and why does it matter?

Cleaning fixes typos, duplicate rows, wrong units, and missing values before analysis. Charts built on dirty data mislead decisions even when the code runs. A class survey with blank ages might drop those rows or flag them—either way, the step should be intentional and documented.

How is correlation different from causation?

Correlation means two variables move together; causation means one directly produces change in the other. Ice cream sales and drowning deaths both rise in summer because warm weather drives both, not because ice cream causes drowning. AP prompts reward naming a lurking variable, not just saying "correlation is not causation."

What is the difference between data and information?

Data are raw facts or symbols. Information is data processed to answer a question, such as average commute time after filtering rush-hour logs. Metadata sits beside both, describing context. Classification questions give a short scenario—decide what the user is trying to learn first.

How do search engines use metadata?

HTML title tags, descriptions, and file types help rank and display results. Music libraries read artist tags embedded in audio files. The pattern is the same: structured fields speed lookup without opening every byte of content. Connecting metadata to apps you use makes written answers concrete.

Can metadata be wrong or misleading?

Yes. Clocks on cameras can be unset, tags can be edited, and labels can be copied from the wrong file. Analysts verify critical fields instead of trusting them blindly. Mentioning error sources shows mature thinking on data-quality questions.

What is an example of turning data into information?

A school stores raw attendance clicks (data). An administrator filters to students below 90% attendance this month (information answering a policy question). The filter choice defines the information. Exams may ask which step creates information versus which step only stores data.

How should I practice metadata skills for the exam?

Answer the eight MCQs on this page, then read one news story about location leaks from photos and summarize metadata involved. Pair with the big-data guide for privacy law context. Vocabulary flashcards help, but scenario practice prevents mixing up filtering and compression.

Practice MCQs Unit 2 hub