Certified Analytics Professional (CAP) Sample question

Sample Test Questions


Domain I—Business Problem (Question) Framing

1. In the kickoff meeting with a client for a new project, which of the following is the MOST important information to obtain?

a) Business issue and project goal
b) Timeline and implementation plan
c) Analytical model to use
d) Available budget

2. A company is considering designing a new automobile. Their options are a design based on current gasoline engine technology or a government proposed “Green” technology. You are a government official whose job is to encourage automakers to adopt the “Green” technology. You cannot provide funding for development costs, but you can provide a subsidy for every car sold. The development costs and the wholesale price, in thousands of dollars, of the cars are shown in the table below:



(numbers in $ thousands)


(numbers in $ thousands)

Wholesale Price/vehicle 25 40
Variable Cost/vehicle 15 35
Fixed Cost 100,000 200,000

How large a subsidy per vehicle sold will be required, assuming there will be enough demand to motivate the switch?

a) Greater than $5000
b) Less than $5000
c) Cannot be determined
d) Equal to $5000

3. You have simulated the NPV of a decision. It ranges between – $10 million and +$10 million. To best present the likelihood of possible outcomes, you should:

a) present a histogram to show likelihood of various NPV ranges.
b) present a single NPV estimate to avoid confusion.
c) trim all outliers to present the most balanced diagram.
d) relax constraints associated with extreme points in the simulation.

4. Two investors who have the same information about the stock market buy an equal number of shares of a stock. Which of the following statements must be true?

a) Both investors are subject to the same uncertainty.
b) Both investors are subject to the same risks.
c) The risks for the two investors are statistically independent.
d) If the investors are optimistic, they should have borrowed, rather than bought the shares.

Domain II—Analytics Problem Framing

5. Conjoint analysis in market research applications can:

a) allow calculation of relative importance of varying features and attributes to customers.
b) only trade off relative importance to customers of features with similar scales.
c) give its best estimates of customer preference structure based on in-depth interviews with a small number of carefully chosen subjects.
d) only trade off among a limited number of attributes and levels.

6. The monthly profit made by a clothing manufacturer is proportional to the monthly demand, up to a maximum demand of 1000 units, which corresponds to the plant producing at full capacity. (Any excess demand over 1000 units will be satisfied by some other manufacturer, and hence yield no additional profit.) The monthly demand is uncertain, but the average demand is reliably estimated at 1000 units. At this level of demand the monthly profit is $3,000,000. Which of the following statements must be true of the expected monthly profit, P?

a) P is less than $3,000,000.
b) P is possibly greater than $3,000,000.
c) P is equal to $3,000,000.
d) P can have any positive value.

7. Which of the following statements is true of modeling a multi-server checkout line?

a) Variability in arrival and service times will tend to play a critical role in congestion.
b) A queuing model can be used to estimate average arrivals.
c) A queuing model can be used to estimate service rates.
d) Poisson distributions are not relevant.

8. A segmentation of customers who shop at a retail store may be performed using which of the following methods?

a) Clustering and decision tree
b) Clustering, factor and control charts
c) Decision tree and recursive function analyses
d) Monte Carlo Markov Chain and ANOVA

Domain III—Data

9. When analyzing responses of a survey of why people like a certain restaurant, factor analysis could reduce the dimension in which of the following ways?

a) Collapse several survey questions regarding food taste, health value, ingredients and consistency into one general unobserved “food quality” variable.
b) Condense similar survey respondent answers into clusters of like-minded customers for market segment analysis.
c) Reduce the variability of individual subject ratings by centering each respondent’s ratings around his or her average rating.
d) Decrease variability by analyzing inter-rater reliability on the question items before offering the survey to a wide number of respondents.

10. A common way of organizing data in a data warehouse for reporting and analysis is:

a) multidimensional modeling.
b) transactional-based modeling.
c) relation-based modeling.
d) Tuple-based modeling.

11. A multiple linear regression was built to try to predict customer expenditures based on 200 independent variables (behavioral and demographic). 10,000 rows of data were fed into a stepwise regression, each row representing one customer. 1,000 customers were male, and 9,000 customers were female. The final model had an Adjusted R-squared of 0.27 and seven independent variables. Increasing the number of rows of data to 100,000 and rerunning the stepwise regression will most likely:

a) have no impact upon the Adjusted R-squared.
b) increase the impact of the male customers.
c) change the heteroskedasticity of the residuals in a favorable manner.
d) decrease the number of independent variables in the final model.

12. Which of the following best describes the data and information flow within an organization?

a) Information architecture
b) Information strategy
c) Information mapping
d) Information assurance

13. A box and whisker plot for a dataset will MOST clearly show:

a) if the data is skewed and, if so, in which direction.
b) the 90% confidence interval around the mean.
c) where the [actual-predicted] error value is not zero.
d) the difference between the second quartile and the median.

Domain IV—Methodology (Approach) Selection

14. A furniture maker would like to determine the most profitable mix of items to produce. There are well-known budgetary constraints. Each piece of furniture is made of a predetermined amount of material with known costs, and demand is known. Which of the following analytical techniques is the most appropriate one to solve this problem?

a) Optimization
b) Multiple regression
c) Data mining
d) Forecasting

15. A company ships products from a single dock at their warehouse. The time to load shipments depends on the experience of the crew, products being shipped and weather. The company thinks there is significant unmet demand for their products and would like to build another dock in order to meet this demand. They ask you to build a model and determine if the revenue from the additional products sold will cover the cost of the second dock within two years of it becoming operational. Which of the following is the MOST appropriate modeling approach?

a) Discrete event simulation because there are a sequence of discrete random events through time.
b) Optimization because the company’s objective to maximize profit and capacity at the dock is a limited resource.
c) Forecasting because you can determine the throughput at the dock, calculate the net revenue and compare this with the cost of the new dock.
d) Optimization because it is a transportation problem.

16. Which of the following is an effective optimization method?

a) Mixed integer programming
b) Generalized linear regression model
c) Box-Jenkins Method
d) Analysis of variance

17. A clothing company wants to use analytics to decide which customers to send a promotional catalogue in order to attain a targeted response rate. Which of the following techniques would be the most appropriate to use for making this decision?

a) Logistic regression
b) Integer programming
d) Linear regression

Domain V—Model Building

18. All times in the decision tree below are given in hours. What is the expected travel time (in hours) of the optimal (minimum travel time) decision?

CAP Question 18 diagram

a) 7.0
b) 6.9
c) 7.4
d) 7.8

19. A project seeks to build a predictive data-mining model of customer profitability based upon a series of independent variables including customer transaction history, demographics, and, if desired, externally purchased credit-scoring information. There are currently 100,000 unique customers available for use in building the predictive model. Which of the following strategies would reflect the best allocation of these 100,000 customer data points?

a) Use 70,000 randomly selected data points when building the model, and hold the other 30,000 out as a test dataset.
b) Use all 100,000 data points when building the model.
c) Build four separate models and randomly partition the data into 4 separate datasets with 25,000 data points per dataset.
d) Use 1,000 randomly selected data points when building the model.

20. One of the main advantages of tree-based models and neural networks is that they:

a) reveal interactions without having to explicitly build them into the model.
b) build models with higher R2 than other regression techniques.
c) are easy to interpret, use, and explain.
d) can be modeled even when there is a significant amount of missing data.

21. In the diagram below, what is true of Strategy B compared to Strategy A?

CAP sample question 21 diagram

a) Strategy B exhibits stochastic (probabilistic) dominance over Strategy A.
b) Strategy B has the same downside risk as Strategy A since the curves have the same shape.
c) Strategy B must have the same uncertainties impacting it as Strategy A because the curves are so similar in shape.
d) Strategy A exhibits stochastic (probabilistic) dominance over strategy B.

Domain VI—Deployment

22. Each month you generate a list of marketing leads for direct mail campaigns. Which of the following should you do before the list is used?

a) Remove opt-outs.
b) Retain x% of the leads as control for performance measurement.
c) Exclude people who were on the list the previous month.
d) Exclude people who were never on the list.

23. After building a predictive model and testing it on new data, an under prediction by a forecasting system can be detected by its:

a) BIAS being positive.
b) BIAS being negative.
c) mean absolute deviation being negative.
d) mean squared error being zero.

Domain VII—Model Life Cycle Management

24. An analytics professional is responsible for maintaining a simulation model that is used to determine the staffing levels required for a specific operational business process. Assuming that the operational team always uses the number of staff determined by the model, which of the following is the most important maintenance activity?

a) Determine if there has been a change in model accuracy over time.
b) Ensure that all of the model input data items are available when needed.
c) Ensure that all users are reviewing the model results in a timely fashion.
d) Determine if the model’s reports are understood by the users.