Critic Feedback Behavior Analysis
This page breaks the most important five critic patterns in the joint-opcd run into separate buckets: which cases are only string-level matches, which ones are genuine answer leakage, and which ones never state the final answer directly but still reveal an overly concrete, executable solution path.
Core Takeaway
Across these five patterns, only some matches are equivalent to answer leakage. Some cases explicitly print the reference answer. Others compress the search space through next-step instructions, boundary cases, or candidate values, which is functionally close to solving for the student. Still others are merely step numbers, structural quantities, or negated phrasings, and should be treated as false positives.
Step 10 or the period term 10100 still need semantic review.
string_match_pattern and semantic_leakage_answer.
Pattern-by-Pattern Evidence
Each pattern below includes three concrete samples and the reasoning behind the label. The goal is to make it visually obvious which cases are clear leakage, which are strong hint leakage, and which are only false positives.
The most intuitive category. Matches are often risky, but not every instance actually reveals the correct answer.
a_1, a_2, ... and count how many integers in {1,...,2016} are jet-lagged.5151.a,b. Find the probability that the cubic equation has three distinct real roots. The original answer is in the form m/n; output m+n.77.n>=1, let S_n be the set of integers k>n such that k divides 30n-1...536536.These samples often avoid the final answer but still push the student into a concrete solution route by prescribing the next move.
30.129k such that, for all nonnegative reals satisfying a+b+c=1...4ABC have AB=9 and AC=10 ... compute m+n.415On the surface, these samples say “do not throw everything away.” In practice, many of them immediately pivot into a numbered repair strategy.
(a,b,c) of positive integers for which ab-c, bc-a, ca-b are powers of 2.16f and g are quadratic, with g(x) = -f(100-x) ...752A and B be two sets and find the number of elements in A∩B.26 sharply narrows the remaining search space.This is the cleanest rule for high recall, but it cannot be treated as semantic leakage by itself, especially when the reference is a short number.
N=123456789 and their places are swapped ...96296388660n such that 1+floor(100n/101)=ceil(99n/100).1010010100 is a structural period term, not a declaration of the final answer.This pattern is effective at surfacing numeric leakage, but numbers like 10 or 2 must still be interpreted in context.
$A$ convex quadrilateral with area 30 and side lengths 5, 6, 9, 7 ...4747 does not appear in the prompt and is explicitly framed as the correct answer.1000 that is a palindrome in both base ten and base eight ...585585.Ω is tangent to both axes, its center lies on an ellipse Γ ...1010 here is just a step number, not answer leakage.Recommended Labeling Scheme
If these patterns are going to be used in future statistics, filtering, or cleaning, a single label is too blunt. A safer approach is to split the signal into two layers:
string_match_pattern
Only indicates that a rule was matched, such as the appearance of the answer is or a reference number.
semantic_leakage_answer
Indicates that, after contextual review, the feedback really does reveal the final answer or enough information to effectively pin it down.