How NCARB Develops the ARE: Understanding Exam Forms

Before becoming an architect, all candidates must take and pass the Architect Registration Examination® (ARE®)—a multi-part exam developed with the help of hundreds of volunteer architects, psychometricians, and other professionals.

Interested in learning more about how the exam is put together? NCARB is committed to being transparent about how the ARE is developed and administered, so candidates, licensing board members, and the public can trust the validity of ARE results. In parts one and two of this blog series, we explored the individuals involved in developing the exam and the process that exam questions go through before they become scored items. In part three, we’ll explain how NCARB assembles exam forms.

What is an exam form?

An exam form is the unique version of a division—say, Practice Management—that you see at the test center. NCARB has multiple versions of each exam division in use at all times. This allows candidates to retake a division without encountering the same questions that they saw on their previous attempt.

What’s included on each exam form?

Each exam form includes standalone questions and case study questions. Standalone questions are located at the beginning of the content portion, and case studies are located at the end.

How are exam forms compiled?

NCARB and its psychometricians utilize what we call “assembly rules” when compiling new examination forms to ensure that each unique form is similar. The rules define the composition of each exam form, including testing and break time, content and item type distribution, number of operational (scored) and pretest (unscored) items, number of case study items, and number of calculation items. Assembly rules help assure that aspects such as difficulty, variance of scores, reliability, and time are balanced and as similar as possible for candidates testing on different forms.

Why can a candidate only attempt the same division three times in a 12-month period?

In a typical year, NCARB releases four operational exam forms for each division of the ARE. Because of this, candidates can only test on the same division three times within a 12-month window. This ensures a candidate will see a different operational exam form when retesting on the same division. The fourth exam form provides a candidate with the opportunity to test on the same division more than three times in the unlikely event that they experience a technical issue during one of their three previous exam attempts.

How does NCARB set the difficulty level of the exam?

With every exam administration, NCARB and its psychometricians collect information about the performance of each ARE item. This results in a wealth of item performance data that allows NCARB to measure how difficult the item is for candidates, how well the item discriminates between high- and low-performing candidates, the possibility for bias in an item favoring one demographic group over another, and other item performance characteristics.

We use a combination of Classical Test Theory and Item Response Theory (IRT) statistics to evaluate the performance of items and to assemble forms. Classical Test Theory statistics include p-values (difficulty), point-biserial (or item-score correlations), item reliability (a statistical combination of difficulty and discrimination), and response time. We also use item response theory (IRT) statistics that help us evaluate item performance across different samples of people (i.e., across forms) without having to assume random equivalence. In addition to difficulty, we evaluate the model fit of the items to the IRT model. The IRT statistics are also what allow us to scale forms.

Based on the recommendations of our psychometric consultants, NCARB sets thresholds for acceptable item performance and continuously monitors performance data to make sure ARE items do not fall below or rise above these thresholds. Items with poor statistical performance are automatically retired from the ARE. Any item with questionable performance characteristics is flagged for review by a group of volunteer architectural subject matter experts. These experts will then determine if the item should remain eligible for use on the exam, be modified and re-pretested, or be removed and retired.

When there are more items available than needed, we also use these statistics to select which items to use. They also allow us the flexibility to assemble pre-equated forms.

Why do exam questions include several answers that seem like they could be correct?

NCARB uses “distractors” to help differentiate between candidates who have the necessary knowledge and skills to practice a profession and those candidates who don’t. Distractors are incorrect response options to a question and are very common on licensure exams.

Well-written distractors are just as plausible as the correct answer (known as a “key”), but wrong in the specific scenario presented in the item. In fact, distractors may be correct in a different scenario—candidates need to demonstrate that they know the right decision to make depending on the situation at hand.

On the ARE, all multiple-choice items contain either two or three distractors, plus the key. All check-all-that-apply items contain six response options total: either two, three, or four of the options will be the key, and the rest of the response options are distractors.

Since hotspot item types don’t require candidates to choose from a set of possible options, they don’t have “distractors” in the traditional sense. However, for both item types, a correct scoring region, or regions, are defined in the background. One could consider all the areas outside of the correct scoring region as a potential distractor.