Binary and storage
Bits, bytes, and place values
A bit is one binary digit (0 or 1). A byte is eight bits and provides 256 distinct patterns, often shown as 0–255 unsigned. Binary place values double from the right: 1, 2, 4, 8, 16. Convert by summing places with 1 bits; convert decimal to binary by subtracting the largest fitting power of two.
Overflow occurs when a result needs more bits than allowed—255 + 1 in eight unsigned bits wraps to 0. The Ariane 5 story reminds you that real systems fail when width is ignored.
- Bit = single switch; byte = 8 bits
- 2ⁿ distinct values for n bits
- ASCII: 'A' = 65 (one byte in simple English text)
Compression
Ratios, RLE, and format families
Compression shrinks files for storage and transfer. Ratio = original size : compressed size (same units). 100 MB → 25 MB is 4:1 and 75% saved—do not swap those answers.
Run-length encoding (RLE) stores count + symbol for repeats (AAAA → 4A). It helps simple graphics, not noisy photos.
- Lossless: ZIP, PNG, FLAC — exact rebuild
- Lossy: JPEG, MP3 — smaller, some detail discarded
- Random data compresses poorly; repetitive data compresses well
Information and metadata
From raw values to answers
Data becomes information when processing answers a question. Filtering keeps rows that meet criteria; cleaning fixes typos, duplicates, and inconsistent formats before charts run.
Metadata describes other data—timestamps, device model, GPS in EXIF photo tags. Publishing photos without removing location metadata can expose places you never typed in the caption.
Correlation shows variables moving together; causation requires evidence that one drives the other. Summer heat can correlate ice cream sales and drownings without ice cream causing drownings.
Big data and society
Scale, models, and responsibility
Volume, velocity, and variety characterize big data. Fitness trackers illustrate velocity; mixed survey plus image archives illustrate variety.
Machine learning trains on examples. Biased or incomplete training data produces biased predictions. Sampling bias misrepresents populations; measurement bias comes from broken sensors or bad labels.
PII identifies people. Re-identification links “anonymous” rows back to individuals when fields combine narrowly. GDPR names strict EU personal-data rules you may cite on societal-impact items.
Lists and programs with data
Filtering, cleaning, and tracing
Filtering keeps rows that meet a rule; sorting only reorders. Cleaning fixes typos, duplicate rows, or inconsistent date formats before charts run. Pseudocode may loop through a list of sensor readings—each item still occupies bytes in memory.
Example: a list of daily steps [4200, 5100, 5100, 0] might need cleaning when 0 means the tracker failed, not that the student did not walk. Removing bad rows changes averages; document that choice in written responses.
Network size vocabulary
Mbps versus MB
Mbps measures megabits per second on a connection; MB measures megabytes of file size. Eight bits per byte means 8 Mbps is about 1 MB per second in ideal conditions—real networks add overhead, but AP items test whether you label units correctly.
How to use these notes with practice
Skim this page before class, then open the matching concept guide when a topic feels fuzzy. Pair with the cheat sheet for formulas and with fifty practice MCQs after you can explain each heading without looking.
Rewrite each bullet in your own words once per week. If you cannot explain overflow without reading, spend ten minutes on binary numbers before moving to compression.