Metadata is best defined as:
Q1Explanation: Context like time, author, GPS.
AP Computer Science Principles · Unit 2 · Data
Unit 2 · Metadata · ~7 min read
Data about data—EXIF GPS risks, filtering spreadsheets into information, cleaning, and correlation vs causation.
Metadata describes other data—timestamps, authors, camera models, GPS tags. AP CSP Unit 2 ties metadata to privacy risks, filtering spreadsheets into answers, and cleaning messy tables before charts or models run. Distinguish payload from labels, strip risky EXIF before publishing photos, and use precise words: data, information, metadata, filtering, cleaning.
Metadata describes other data—timestamps, authors, camera models, GPS tags. AP CSP Unit 2 ties metadata to privacy risks, filtering spreadsheets into answers, and cleaning messy tables before charts or models run. Distinguish payload from labels, strip risky EXIF before publishing photos, and use precise words: data, information, metadata, filtering, cleaning.
Payload is the content you mean to share—the photo pixels, essay text, or audio samples. Metadata is the background label: who created it, when, with which device, sometimes where.
Sorting files by “date modified” uses metadata, not the essay’s thesis. Searching a music library by artist also reads metadata fields.
Students sometimes think metadata is optional decoration. For computers, metadata drives organization, search, and—on the exam—privacy stories.
Compare with big data and privacy in the same unit.
EXIF is camera metadata embedded in many photos. It can list lens settings, timestamps, and GPS coordinates captured at shutter time.
Posting a team photo after practice might leak home addresses if location tags stay attached. Journalism classes now teach “strip location before publish” alongside composition rules.
Removing metadata is not the same as blurring faces. Both can protect privacy, but EXIF removal stops map pins while blur hides identity in the pixels.
| Field | Example | Risk |
|---|---|---|
| Date/time | 2026-05-21 14:03 | Reveals when you were present |
| Device | Phone model | Hints at socioeconomic cues |
| GPS | Lat/long | Shows precise location |
Compare with compressed file sizes in the same unit.
Data becomes information when it answers a question. Filtering selects rows that meet criteria so you do not read a million-line spreadsheet manually.
Example: keep orders over $100 from a club fundraiser sheet to see which booths earned the most. The remaining rows answer a focused question.
Filters differ from sorting. Sorting rearranges; filtering hides non-matching rows. AP stems may test both—read carefully.
In pseudocode, a loop with an if-statement can filter a list the same way a spreadsheet filter does—connect metadata fields to conditions you code in Create Task projects.
Compare with lossless document scans in the same unit.
Cleaning fixes typos, duplicate rows, inconsistent date formats, and missing codes. Charts built on messy tables mislead clubs and scientists alike.
If half the rows spell “PS” and half spell “PlayStation,” your game survey chart splits incorrectly until you standardize entries.
Cleaning is slow, unglamorous work, which is why AP highlights it—real analysis starts after sanity checks.
Correlation means two variables move together; causation means one directly produces the other. Summer weather raises both ice cream sales and drowning incidents—heat is the shared cause, not ice cream.
Fitness tracker step counts may correlate with sleep hours without one causing the other; a busy schedule might drive both.
AP may show a scatter plot and ask what conclusion is valid. “Related” is safer than “causes” unless a controlled experiment supports causation.
When writing, name a possible hidden variable. That single sentence earns societal-impact credit.
Use data, information, metadata, filtering, and cleaning precisely—avoid swapping terms mid-paragraph.
EXIF GPS stories appear often. Mention removal tools or publishing policies, not just “be careful online.”
Correlation vs causation pairs with big-data bias later; study both guides if you miss societal-impact items.
Answer the eight MCQs here, then read the big-data privacy guide for PII and re-identification depth.
Tap an answer to reveal the explanation. Choices shuffle on load. For a full mixed set, open 50 Unit 2 practice questions.
Metadata is best defined as:
Q1Explanation: Context like time, author, GPS.
EXIF in a photo may include:
Q2Explanation: EXIF is photo metadata.
Filtering a dataset means:
Q3Explanation: Filter narrows rows.
Correlation without causation example:
Q4Explanation: Hidden variable is temperature.
Data cleaning fixes:
Q5Explanation: Cleaning prepares analysis.
Sort essays by date modified finds:
Q6Explanation: Modified timestamp is metadata.
Information differs from raw data because:
Q7Explanation: Information is insight from data.
Photo GPS metadata risk:
Q8Explanation: Location tags can deanonymize.
Check each skill when you can explain it without looking at notes.
0 of 4 ready
Metadata is data about data—information describing a file or record without being the main content. A photo's pixels are data; the camera model, timestamp, and GPS tag are metadata. Programs use metadata for sorting, searching, and privacy decisions. Exams expect precise vocabulary, not vague "extra info."
EXIF tags can record GPS coordinates, device serial numbers, and exact capture time. Sharing a cropped image online may still leak location if EXIF remains. Journalists and activists learn to strip metadata before posting. AP scenarios often ask you to identify location leakage, not to recite every EXIF field name.
Filtering selects records that meet conditions, such as orders over $100 or temperatures above 90°F. Filtering turns a large table into a smaller answer set aligned with a question. It is not the same as sorting; sorting reorders rows without necessarily removing any.
Cleaning fixes typos, duplicate rows, wrong units, and missing values before analysis. Charts built on dirty data mislead decisions even when the code runs. A class survey with blank ages might drop those rows or flag them—either way, the step should be intentional and documented.
Correlation means two variables move together; causation means one directly produces change in the other. Ice cream sales and drowning deaths both rise in summer because warm weather drives both, not because ice cream causes drowning. AP prompts reward naming a lurking variable, not just saying "correlation is not causation."
Data are raw facts or symbols. Information is data processed to answer a question, such as average commute time after filtering rush-hour logs. Metadata sits beside both, describing context. Classification questions give a short scenario—decide what the user is trying to learn first.
HTML title tags, descriptions, and file types help rank and display results. Music libraries read artist tags embedded in audio files. The pattern is the same: structured fields speed lookup without opening every byte of content. Connecting metadata to apps you use makes written answers concrete.
Yes. Clocks on cameras can be unset, tags can be edited, and labels can be copied from the wrong file. Analysts verify critical fields instead of trusting them blindly. Mentioning error sources shows mature thinking on data-quality questions.
A school stores raw attendance clicks (data). An administrator filters to students below 90% attendance this month (information answering a policy question). The filter choice defines the information. Exams may ask which step creates information versus which step only stores data.
Answer the eight MCQs on this page, then read one news story about location leaks from photos and summarize metadata involved. Pair with the big-data guide for privacy law context. Vocabulary flashcards help, but scenario practice prevents mixing up filtering and compression.