This free CompTIA Data+ study guide walks through every content domain the Data+ (DA0-002) exam tests, organized to the current CompTIA exam objectives.[1]
It’s interactive, not a wall of text: every module has built-in checkpoint quizzes, flashcards, and practice questions, so you learn by doing — not just reading.
Data+ tests five official domains, and we teach them as five study modules, all five organized to the official blueprint. Read a module, test yourself at each checkpoint, then drill gaps with our free practice test and flashcards. This guide is a high-yield overview that maps the official content — not a full data-analytics textbook.
CompTIA Data+ is one of the 14 CompTIA certifications — explore our CompTIA study guides to compare and prep across the whole family.
Data+ Exam Snapshot
| Detail | Data+ Exam |
|---|---|
| Exam code | DA0-002 (V2; current — replaced DA0-001) |
| Questions | Maximum of 90 (multiple choice + performance-based) |
| Time | 90 minutes |
| Passing score | 675 on a 100–900 scale (scaled score, not a percentage) |
| Certifying body | CompTIA (delivered by Pearson VUE) |
| Cost | About $255 (voucher; ~$304 with retake assurance) |
| Prerequisites | None required (18–24 months in a data role recommended) |
| Validity | 3 years |
| Renewal | 30 CEUs over 3 years, or pass a higher CompTIA cert |
Data+ covers five domains. The largest — Data Analysis — and the next, Data Acquisition & Preparation, together make up nearly half the exam (46%), so that is where to invest first.[1] Study by weight:
Every analysis follows the same arc — define the question, get and clean the data, analyze it, then visualize and communicate the result. Keep this lifecycle in mind as you work through the modules:
- 1
1. Business question
Define the problem and the decision the analysis must support. Everything starts with the question, not the data.
- 2
2. Acquire data
Collect or extract data from sources — databases, APIs, files, surveys — and integrate it (ETL/ELT).
- 3
3. Prepare & clean
Profile, clean, and transform: handle missing values, duplicates, and outliers; recode and normalize.
- 4
4. Analyze
Apply descriptive, diagnostic, predictive, or prescriptive techniques and statistics to find patterns.
- 5
5. Visualize & report
Turn findings into the right charts, dashboards, and reports for the audience.
- 6
6. Communicate & act
Tell the data story, drive a decision, and monitor the outcome — governance applies throughout.
Module 1 · Data Concepts & Environments
One official domain, 20% of the exam. This is the foundation — the kinds of data you work with and the systems that store it. Nail the vocabulary here and the rest of the exam reads far more clearly.
1.1 Data Types & Structures
Start by classifying data two ways. By structure: fits neat rows and columns; (text, images, audio, video) does not; and (JSON, XML) sits between, carrying tags without a rigid table. By measurement: is categorical (nominal or ordinal), while is numeric (interval or ratio) and supports math.[1]
| Type | Scale | Example | Math allowed? |
|---|---|---|---|
| Qualitative | Nominal (categories, no order) | Eye color, country | Count only |
| Qualitative | Ordinal (ordered categories) | Survey: poor/fair/good | Order, not arithmetic |
| Quantitative | Interval (no true zero) | Temperature in °C | Add / subtract |
| Quantitative | Ratio (true zero) | Sales, height, count | All arithmetic |
1.2 Databases, Warehouses & Lakes
Data lives in different places for different jobs. A runs operations day-to-day () — normalized tables linked by a and . For analysis (), data flows into a (structured and modeled, often a ) or a (raw, any format). A blends both.[1]
Operational database (OLTP)
Runs the business day-to-day. Normalized, structured, optimized for fast reads/writes of single records.
Data warehouse (OLAP)
Central analytical store. Structured and modeled (star/snowflake), schema-on-write, optimized for queries and reporting.
Data lake
Holds vast raw data in its native format (structured, semi-, and unstructured). Schema-on-read; cheap and flexible.
Data lakehouse
A hybrid that adds warehouse-style structure and management on top of a lake — one platform for both.
| Aspect | OLTP (operations) | OLAP (analysis) |
|---|---|---|
| Purpose | Run the business (transactions) | Analyze the business (insight) |
| Workload | Many small reads/writes | Few large, complex queries |
| Design | Normalized for integrity | Denormalized/modeled for speed |
| Example | Order-entry system | Sales data warehouse |
1.3 Big Data & the Analytics Lifecycle
is data too large or complex for traditional tools, described by the V’s: volume, velocity, variety, veracity, and value.[5] Its scale and variety are exactly why data lakes and cloud platforms exist. Whatever the size, work follows the — and the question always comes before the data.
Checkpoint · Data Concepts & Environments
Question 1 of 10
In data analytics, what does the term "Data Lake" primarily refer to?
Module 2 · Data Acquisition & Preparation
One official domain, 22% of the exam. This domain — renamed from “Data Mining” in the DA0-001 era — is where raw data becomes analysis-ready. In practice it is where analysts spend most of their time, and it is heavily tested.
2.1 Acquiring & Integrating Data (ETL/ELT)
Data is acquired from databases, files, APIs, web scraping, surveys, and sensors, then combined through . The two pipeline patterns to know cold are (transform before loading — the classic warehouse approach) and (load raw, then transform in the target — the cloud/lake/big-data approach).[1]
ETL (Extract, Transform, Load)
- Extract → Transform → Load
- Data is cleaned/shaped BEFORE loading
- Transformation on a separate engine
- Classic, structured data warehouses
- Good when targets need clean, modeled data
ELT (Extract, Load, Transform)
- Extract → Load → Transform
- Raw data loaded FIRST, transformed in place
- Transformation uses the target's compute
- Cloud warehouses, data lakes, big data
- Good for large, varied, fast-changing data
When you can’t (or shouldn’t) use a whole population, you sample it. Good sampling — random, representative, large enough — keeps conclusions valid; biased sampling quietly breaks every downstream result.
2.2 Cleansing & Preparing Data
fixes the problems that would otherwise poison analysis. Handle missing values (delete the record, or with the mean, median, or a predicted value), remove duplicates, and investigate each (error or genuine extreme?). Then make values comparable with or , and convert data types as needed.[1]
| Problem | Technique |
|---|---|
| Missing values | Delete the row, or impute (mean/median/predicted) |
| Duplicate records | Deduplication (often via a unique key) |
| Outliers | Investigate; cap, transform, or remove if erroneous |
| Different scales | Normalize (0–1) or standardize (z-score) |
| Wrong data type | Type conversion / casting (e.g., text to date) |
| Inconsistent formats | Parsing and standardizing (dates, units, casing) |
2.3 Data Mining Techniques
finds patterns in large datasets. The four techniques to know are (assign to known categories), (group similar records with no labels), regression (model a numeric relationship), and (items that co-occur — the Apriori algorithm and market-basket analysis). Watch for , where a model memorizes the training data and fails on new data.[1]
| Technique | Learning type | Use it to… |
|---|---|---|
| Classification | Supervised | Sort records into known categories (spam / not spam) |
| Regression | Supervised | Predict a numeric value (next month's sales) |
| Clustering | Unsupervised | Group similar records with no labels (customer segments) |
| Association rules | Unsupervised | Find items bought together (market-basket analysis) |
Checkpoint · Data Acquisition & Preparation
Question 1 of 10
Which concept in data management focuses on the use of data across different domains and formats for improved decision-making?
Module 3 · Data Analysis
One official domain, 24% of the exam — the single heaviest. This is the statistical core: summarizing data, measuring relationships, and choosing the right kind of analysis. Invest the most time here.
3.1 Descriptive Statistics
Descriptive statistics summarize a dataset. Central tendency: the (average, outlier-sensitive), the (middle, robust), and the (most frequent). Spread: the range, , and (spread around the mean, in the data’s own units). A marks the value below which a given share of data falls.[1]
| Measure | What it tells you | Watch out for |
|---|---|---|
| Mean | The arithmetic average | Distorted by outliers / skew |
| Median | The middle value | Best for skewed data |
| Mode | The most common value | Can be none or several |
| Range | Max minus min (total spread) | Driven by extremes |
| Standard deviation | Typical distance from the mean | Same units as the data |
| Percentile / quartile | Position within the distribution | — |
3.2 Relationships & Inference
measures how two variables move together, from −1 to +1, but it never proves — a confounding third variable or coincidence can drive both. weighs sample evidence against a claim using a (a small p-value, often below 0.05, lets you reject the null hypothesis), and a gives a plausible range for the true value.[1]
3.3 Types of Analytics
Match the analysis to the question. says what happened, why, what will happen, and what to do about it. Each step up the ladder delivers more value and demands more sophisticated technique.[1]
- 1. Descriptive
What happened?
Summarizes past data (reports, KPIs).
- 2. Diagnostic
Why did it happen?
Finds causes (drill-down, correlation).
- 3. Predictive
What will happen?
Forecasts future outcomes (models).
- 4. Prescriptive
What should we do?
Recommends an action (optimization).
Checkpoint · Data Analysis
Question 1 of 10
In data mining, what is the primary purpose of the Apriori algorithm?
Module 4 · Visualization & Reporting
One official domain, 20% of the exam. Analysis only matters if it’s communicated. This domain is about choosing the right chart, building clear dashboards and reports, and not misleading your audience.
4.1 Choosing the Right Chart
The single most-tested visualization skill is matching a chart to a goal. Use a to compare categories, a line chart for a trend over time, a pie or stacked bar for parts of a whole, a for the relationship between two variables, a or for distribution, and a to find the vital few.[1]
| Chart | Best for | Example |
|---|---|---|
| Bar / column | Comparing categories | Sales by region |
| Line | Trends over time | Monthly revenue |
| Pie / stacked bar | Parts of a whole | Market share |
| Scatter plot | Relationship between two variables | Ad spend vs. sales |
| Histogram | Distribution of one variable | Customer ages |
| Box plot | Distribution + outliers | Salary spread by team |
| Pareto chart | The vital few (80/20) | Top defect causes |
| Heat map | Magnitude across two dimensions | Activity by hour/day |
4.2 Dashboards & Reports
A surfaces the right for an audience at a glance, with interactivity such as filters and drill-downs. Choose the report type for the need — ad hoc (one-off), recurring (monitoring), or self-service (exploration) — and design for clarity. Above all, never mislead: truncated axes, distorted proportions, and cherry-picked ranges are accuracy and ethics failures.[1]
| Do | Avoid |
|---|---|
| Start bar-chart axes at zero | Truncating the y-axis to exaggerate differences |
| Pick the chart that fits the data | Forcing a 3-D or fancy chart that distorts |
| Label clearly; show units and source | Clutter (chartjunk) that hides the message |
| Match KPIs to the audience's decision | Dumping every metric onto one screen |
Checkpoint · Visualization & Reporting
Question 1 of 10
What is the primary purpose of using a box plot in data analysis?
Module 5 · Data Governance, Quality & Controls
One official domain, 14% of the exam. Smaller in weight but conceptually important — this domain is about trusting your data and using it responsibly: governance, quality, privacy, and security controls.
5.1 Governance & Master Data
is the framework of policies, roles, and standards controlling data across its lifecycle. Data owners are accountable; a handles day-to-day quality. Supporting tools include a (an inventory of data assets), (where data came from and how it moved), and (one authoritative “golden record” per entity).[1]
5.2 Data Quality
is measured across dimensions — accuracy, completeness, consistency, timeliness, uniqueness, validity, and integrity. A failure in any one can invalidate an entire analysis, which is exactly why cleansing (Module 2) and governance exist.[1]
Accuracy
Values correctly reflect the real-world fact.
Completeness
No required values are missing.
Consistency
Values agree across systems and records.
Timeliness
Data is current and available when needed.
Uniqueness
No unintended duplicate records exist.
Validity
Values conform to the defined format and rules.
Integrity
Relationships between data are maintained.
5.3 Privacy, Security & Controls
Sensitive data demands controls. Identify (and PHI for health data), then protect it with , or anonymization, encryption (at rest and in transit), and access controls.[6] Regulations dictate the rules: (EU privacy), (U.S. health), PCI-DSS (payment cards), and CCPA (California).
| Control | What it does |
|---|---|
| Data classification | Labels data by sensitivity to apply the right controls |
| Data masking | Replaces sensitive values with realistic fakes for safe use |
| Anonymization | Removes identifiers so individuals can't be re-identified |
| Encryption | Protects data at rest and in transit from unauthorized reading |
| Access controls | Limits who can see or change data (least privilege) |
| Retention & disposal | Keeps data only as long as needed, then securely destroys it |
Checkpoint · Data Governance, Quality & Controls
Question 1 of 10
What does the term "Data Governance" primarily refer to?
How to Use This Data+ Study Guide
This guide is built to be worked, not just read. The most efficient path to a pass:
- Study by weight. Data Analysis (24%) and Data Acquisition & Preparation (22%) are nearly half the exam — master statistics, correlation vs. causation, and data cleansing first.
- Check off as you go. Use the Study Guide Contents to mark each section done; it raises your exam-readiness score.
- Take every checkpoint. The end-of-module quizzes show you exactly which domains need another pass.
- Drill the weak domain. Send your weak area into the flashcards and a practice test until the score climbs.
- Practice the PBQs. Performance-based questions reward applied skill — read a dataset, pick the right chart, and interpret a statistic until it’s automatic.
Data+ Concept Questions
Common Data+ concepts candidates search while studying — each answered briefly and backed by an official source. Test yourself, then drill them as flashcards.
Data+ Glossary
The high-yield Data+ terms in one place — hover any dotted term in the guide, or flip the whole deck here as a self-grading flashcard set.
- Association rules
- Finding items that frequently occur together (e.g., the Apriori algorithm for market-basket analysis).
- Bar chart
- A chart that compares values across distinct categories (bars have gaps).
- Big data
- Datasets too large or complex for traditional tools, characterized by the V's: volume, velocity, variety, veracity, value.
- Box plot
- A chart showing a distribution's median, quartiles, and outliers.
- Causation
- A relationship in which one variable directly causes a change in another; not proven by correlation alone.
- Classification
- A supervised technique that assigns records to predefined categories.
- Clustering
- An unsupervised technique that groups similar records without predefined labels.
- Confidence interval
- A range of plausible values for a population parameter, with a stated level of confidence.
- Correlation
- A measure of how two variables move together, from −1 (perfect negative) to +1 (perfect positive).
- Dashboard
- An interactive display of the most important metrics and visuals for an audience, at a glance.
- Data analytics lifecycle
- The end-to-end process: define the question, acquire, prepare, analyze, visualize, then communicate and act.
- Data catalog
- An organized inventory of an organization's data assets with descriptions and metadata.
- Data classification
- Labeling data by sensitivity (e.g., public, internal, confidential) to apply the right controls.
- Data cleansing
- Detecting and correcting errors and inconsistencies — missing values, duplicates, outliers — to improve data quality.
- Data governance
- The framework of policies, roles, and standards controlling how data is managed across its lifecycle.
- Data integration
- Combining data from multiple sources into a unified store for analysis.
- Data lake
- A repository that stores vast amounts of raw data in its native format (schema-on-read); cheap and flexible.
- Data lakehouse
- A hybrid architecture that adds warehouse-style structure and management on top of a data lake.
- Data lineage
- A record of data's origin and how it moves and transforms through systems.
- Data mart
- A subject-specific subset of a data warehouse serving a single department or function.
- Data masking
- Replacing sensitive values with realistic but fake data so it can be used without exposing the real values.
- Data mining
- Discovering patterns and relationships in large datasets using techniques like classification and clustering.
- Data quality
- The degree to which data is fit for purpose across dimensions like accuracy, completeness, and consistency.
- Data steward
- A person responsible for the day-to-day quality and proper use of a data domain.
- Data warehouse
- A central analytical store of structured, modeled data (schema-on-write) optimized for reporting and analysis.
- Descriptive analytics
- Analytics that summarizes what happened (reports, KPIs).
- Diagnostic analytics
- Analytics that explains why something happened (drill-down, correlation).
- ELT
- Extract, Load, Transform — load raw data first, then transform it inside the target (cloud/lake/big-data pattern).
- ETL
- Extract, Transform, Load — clean and shape data before loading it into the target (classic data-warehouse pattern).
- Foreign key
- A column that references the primary key of another table, enforcing relationships between tables.
- GDPR
- General Data Protection Regulation — the EU law governing personal-data privacy and protection.
- Heat map
- A chart that uses color intensity to show magnitude across two dimensions.
- HIPAA
- U.S. law protecting health information (PHI — Protected Health Information).
- Histogram
- A chart showing the distribution of one continuous variable by grouping values into bins (bars touch).
- Hypothesis testing
- A method for deciding whether sample evidence supports a claim about a population, using a p-value.
- Imputation
- Filling in missing values using a strategy such as the mean, median, or a predicted value.
- KPI
- Key Performance Indicator — a measurable value that shows how well a goal is being met.
- Master Data Management (MDM)
- Maintaining a single authoritative version of core business entities (the 'golden record').
- Mean
- The arithmetic average of all values; sensitive to outliers.
- Median
- The middle value of sorted data; robust to outliers.
- Mode
- The most frequently occurring value in a dataset.
- Normalization
- Rescaling numeric values to a fixed range, typically 0 to 1, so features are comparable.
- OLAP
- Online Analytical Processing — systems optimized for complex queries and aggregations over large, historical datasets.
- OLTP
- Online Transaction Processing — systems optimized for many fast, small reads and writes that run daily operations.
- Outlier
- A value far outside the typical range of a dataset; may be an error or a genuine extreme to investigate.
- Overfitting
- When a model memorizes training data and performs poorly on new, unseen data.
- p-value
- The probability of seeing results at least as extreme as the data if the null hypothesis is true.
- Pareto chart
- A bar chart ordered largest-to-smallest with a cumulative line, highlighting the vital few (80/20).
- Percentile
- A value below which a given percentage of observations fall (e.g., the 90th percentile).
- PII
- Personally Identifiable Information — data that can identify an individual (name, SSN, email).
- Predictive analytics
- Analytics that forecasts what is likely to happen using models.
- Prescriptive analytics
- Analytics that recommends what action to take (optimization).
- Primary key
- A column (or set of columns) whose value uniquely identifies each row in a table.
- Qualitative data
- Descriptive, categorical data (e.g., colors, names) — also called nominal or ordinal.
- Quantitative data
- Numeric, measurable data that supports mathematical operations (interval or ratio scales).
- Relational database
- A structured store organizing data into related tables linked by keys; queried with SQL.
- Scatter plot
- A chart that plots two numeric variables as points to reveal their relationship or correlation.
- Semi-structured data
- Data that carries tags or markers (JSON, XML) but does not fit a rigid table structure.
- Standard deviation
- A measure of spread around the mean, in the same units as the data; the square root of variance.
- Standardization
- Rescaling values to a mean of 0 and standard deviation of 1 (a z-score).
- Star schema
- A data-warehouse design with a central fact table linked to surrounding dimension tables.
- Structured data
- Data organized into a defined schema of rows and columns, such as a relational database table.
- Unstructured data
- Data with no predefined model — text documents, images, audio, and video — that cannot be stored in simple rows and columns.
- Variance
- A measure of how far values spread from the mean (the square of the standard deviation).
Data+ Study Guide FAQ
DA0-002 (V2) is the current version — it launched October 14, 2025, and DA0-001 (V1) retired in English on April 14, 2026. The exam has a maximum of 90 questions (multiple choice plus performance-based questions) and a 90-minute time limit.
You need a scaled score of 675 on a 100–900 scale. It is weighted scoring, not a simple percentage of questions correct, so don't try to convert it to a percent — focus on mastering every domain. You get your pass/fail result immediately.
Data Analysis (24%), Data Acquisition & Preparation (22%), Data Concepts & Environments (20%), Visualization & Reporting (20%), and Data Governance, Quality & Controls (14%). Data Analysis carries the most weight, so prioritize statistics and analytical techniques.
Study by weight: lead with Data Analysis (24%) and Data Acquisition & Preparation (22%) — master descriptive statistics, correlation vs. causation, and data cleansing first. Read each module, take the checkpoint quiz, then drill gaps with our free practice test and flashcards.
DA0-002 rebalanced the blueprint: it renamed the old 'Data Mining' domain to 'Data Acquisition & Preparation,' broadened 'Visualization' to 'Visualization & Reporting,' and added current content on cloud data environments, AI, and modern governance. Study to the DA0-002 objectives.
There are no required prerequisites. CompTIA recommends 18–24 months in a data analyst or similar role, with exposure to databases, analytical tools, basic statistics, and data visualization. Anyone can register, but the recommended background makes the exam much more manageable.
The certification is valid for three years. You renew through CompTIA's Continuing Education program — earning 30 continuing-education units (CEUs) over the three years, or by passing a higher-level CompTIA certification.
Yes — this study guide, the module checkpoints, the glossary, the concept questions, the practice test, and the flashcards are 100% free with no account required.
Data+ is considered moderately challenging — the difficulty is breadth (data concepts, statistics, mining, visualization, and governance) plus performance-based questions that test applied skills. Broad, organized review and lots of practice questions are the key to passing.
References
- 1.CompTIA. “CompTIA Data+ (DA0-002) Certification — Exam Details & Objectives.” comptia.org. ↑
- 2.CompTIA. “CompTIA Data+ (DA0-001) — Retiring Version.” comptia.org. ↑
- 3.CompTIA. “CompTIA Data+: Your Questions Answered (FAQ).” comptia.org. ↑
- 4.CompTIA. “Continuing Education — Renewal Fees & CEU Requirements.” comptia.org. ↑
- 5.National Institute of Standards and Technology. “NIST Big Data Program.” nist.gov. ↑
- 6.National Institute of Standards and Technology. “NIST Privacy Framework.” nist.gov. ↑

Career Employer
Career Employer is the ultimate resource to help you get started working the job of your dreams. We cover topics from general career information, career searching, exam preparation with free study materials, career interviewing, and becoming successful in your career of choice.
All PostsCareer Employer’s Editorial Process
Here at Career Employer, we focus a lot on providing factually accurate information that is always up to date. We strive to provide correct information using strict editorial processes, article editing, and fact-checking for all of the information found on our website. We only utilize trustworthy and relevant resources. To find out more, make sure to read our full editorial process page here.
