Unit 2: Data

Binary, data compression, extracting information, and metadata

Unit Resources

Select a resource below to start studying.

📚Study Guide: Data

Unit 2: Data

Overview: Data is the lifeblood of modern computing, and this unit explores how information is represented, processed, compressed, and analyzed in digital systems. At the most fundamental level, all digital data—whether text, images, audio, or video—is represented using binary digits (bits), which take values of 0 or 1. Groups of eight bits form a byte, and the number of possible values representable with n bits is 2^n. Students must understand the relationship between binary and decimal number systems and be able to perform basic conversions. Beyond representation, the unit covers data compression, which reduces the number of bits needed to store or transmit information. Lossless compression algorithms, such as run-length encoding and Huffman coding, preserve every bit of original information and allow perfect reconstruction, making them essential for text files and executable programs. Lossy compression, used in JPEG images and MP3 audio, permanently discards some information that is presumed to be less perceptible to human senses, achieving much higher compression ratios at the cost of fidelity. Metadata—data about data—plays a critical role in organizing, searching, and interpreting digital information. The unit also addresses how different types of data are digitally represented: text through character encoding schemes like ASCII and Unicode; images as grids of pixels with color values specified by RGB triples; and sound as sequences of discrete samples capturing amplitude at regular intervals. The analysis of data to identify patterns, correlations, and trends underpins fields from business intelligence to scientific research. However, the proliferation of data raises serious privacy and security concerns, including unauthorized access, surveillance, and the ethical implications of data mining. Big data—datasets so large and complex that traditional processing software is inadequate—has transformed industries but also introduced challenges related to bias, consent, and algorithmic transparency.

Key Concepts

  • Binary Representation: All digital information is stored as sequences of bits. A single bit has two states (0 or 1), and n bits can represent 2^n distinct values. Bytes (8 bits) are the standard unit of digital information.
  • Data Compression: Techniques to reduce file size. Lossless compression preserves all original data and allows exact reconstruction. Lossy compression achieves greater size reduction by discarding some data that is less noticeable.
  • Metadata: Information that describes other data, such as the date a photo was taken, the author of a document, or the sampling rate of an audio file. Metadata enables efficient searching, sorting, and management of datasets.
  • Digital Representation of Media: Text uses ASCII or Unicode to map characters to numeric codes. Images are grids of pixels, each described by color values such as RGB. Audio is sampled at discrete intervals, with higher sampling rates capturing more detail.
  • Data Analysis: The process of inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making.
  • Data Privacy: The protection of personal information from unauthorized access, use, or disclosure. Concerns include identity theft, surveillance capitalism, and unauthorized data sharing.
  • Lossy vs. Lossless: Lossless (PNG, ZIP, FLAC) preserves all data. Lossy (JPEG, MP3, MPEG) removes some data permanently for smaller files.
  • Big Data: Datasets characterized by high volume, velocity, and variety that exceed the capacity of traditional data processing applications.

Vocabulary

  • Bit: The smallest unit of data in a computer, represented as 0 or 1.
  • Byte: A group of 8 bits.
  • Binary: A base-2 number system using only digits 0 and 1.
  • Lossy Compression: A data compression method that permanently removes some information to reduce file size.
  • Lossless Compression: A data compression method that reduces file size without losing any original information.
  • Metadata: Data that provides information about other data.
  • ASCII: A character encoding standard using 7 bits to represent text.
  • Unicode: A universal character encoding standard that supports text in most of the world's writing systems.
  • Pixel: The smallest controllable element of a digital image.
  • RGB: A color model representing colors by combining red, green, and blue light.
  • Sampling Rate: The number of samples of audio carried per second, measured in hertz (Hz).
  • Big Data: Extremely large datasets that require specialized tools and techniques to store, process, and analyze.

Essential Structures

  • 1 byte = 8 bits
  • n bits can represent 2^n unique values
  • Binary to Decimal: Sum of (bit × 2^position) for each bit

Common Mistakes

  • Confusing lossy and lossless compression. Remember that lossy permanently discards data and cannot perfectly reconstruct the original.
  • Forgetting that all digital data is fundamentally binary, including text, images, audio, and video.
  • Thinking metadata is unnecessary or trivial. Metadata is essential for organization, searching, and context.
  • Confusing encryption with compression. Compression reduces size; encryption secures content by transforming it so only authorized parties can read it.

AP Exam Strategies

  • When asked about compression, explicitly state whether the method is lossy or lossless and justify with the type of data being compressed.
  • For binary conversion questions, write out the place values (128, 64, 32, 16, 8, 4, 2, 1) and check your work by converting back to decimal.
  • Always mention privacy and security implications when discussing data collection, especially for personal or sensitive information.
  • Connect data representation to real-world constraints: higher resolution images and higher sampling rate audio require more bits and more storage.

Real-World Applications

  • Streaming Video: Netflix and YouTube use lossy compression (H.264, VP9) to deliver high-quality video over limited bandwidth while accepting some quality loss.
  • Medical Imaging: Hospitals often use lossless compression (DICOM) for medical images to ensure that no diagnostic information is lost during storage or transmission.
  • EXIF Metadata: Digital photos contain EXIF metadata recording camera settings, GPS location, and timestamps, which can raise privacy concerns when shared online.

Practice Quiz: Data

Answer each question one at a time. Click an option to select your answer.

Question 1 of 150
Question
Loading...
Click to flip
Answer
Loading...
Click to flip back 🔀 Shuffle
1 / 39

🎥Free Video Lessons: Data

Watch these unit review videos directly on our site.

AP Comp Sci Principles – 5 hour CRAM review all units by Fiveable

AP CSP Unit 2 The Internet Review A by Janelle Whalen

AP CS Principles Exam Review - Binary by Flavio Kuperman

📄Cheat Sheet: Data

Quick reference for Data. Print this out and review before the exam!

Unit 2 Cheat Sheet: Data

  • Bit: 0 or 1; Byte: 8 bits
  • n bits = 2^n possible values
  • Lossy: JPEG, MP3, MPEG (permanent data loss, smaller files)
  • Lossless: PNG, ZIP, FLAC (perfect reconstruction)
  • Metadata: Data about data (author, date, location, etc.)
  • Text: ASCII (7-bit) or Unicode
  • Images: Grid of pixels; RGB = Red + Green + Blue
  • Audio: Sampled amplitude; higher sampling rate = better quality = larger file
  • Big Data: Volume, Velocity, Variety
  • Privacy: Protect personal data from unauthorized access

🔬Ultimate Review Packet Materials

Download official review materials for this unit.

No URP materials available for this unit yet.

Check back soon for study guides, practice questions, and review videos.

← Back to AP Computer Science Principles