Localization is the perfect tool for this application. M-Local is not the right solution.
Differences Between Localization and M-Local
Localization:
Localization creates the transformation parameters between two different coordinate systems. In Jim's example, he has coordinates that are like State Plane but are not actually State Plane. He doesn't know exactly what the difference is between his SPC-like local coordinates and true SPCS. These local coordinates have values that are similar to SPCS but are estimated from orthophotos and will be different from true SPCS in translation and rotation (and possibly scale as well). Localization will automatically determine these four parameters (four parameters for a 2D transformation: translate N, E, rotation around the Up-axis, scale, vs. seven parameters for 3D: translate N, E, U, rotation around N-axis, E-axis, Up-axis, scale).
The geodetic coordinates of survey points are not actually affected by localization. The base position and the vectors from the base to the rover determine the geodetic coordinates of all surveyed points. The localization only affects the geodetic position of design coordinates based on the transformation parameters determined by the localization. As the localization is altered, the geodetic coordinates of design points in the local coordinate system will change. Also when viewing surveyed coordinates expressed in the local system, the local coordinates of surveyed points will change if the localization is altered. So if Jim starts his localization with two points (the minimum required for translation and rotation), and stores this localization, he will have geodetic coordinates for all design points in that local system and he can view his surveyed points in the local system. He can then use this localization to reduce the size of his search area for the third point. At the beginning his search area is based on his ability to estimate the location of his geometry on an aerial photo, but after the first two points are localized (provided he has no blunders in his survey and design data and the points are spaced out reasonably well) he will have a much smaller target area to the third point. Supposing he recovers the third point, he can add this third point to the localization further refining the transformation. The result of this new localization using three points instead of two, however is that 1) the geographic position of design points will change but the grid coordinates will not 2) the local grid coordinates of survey points will change but the geodetic positions will not. This process can be repeated ad infinitum further refining the localization.
M-Local:
M-Local determines the difference in geodetic position between the base and the rover points collected from that base and a surveyed point with known coordinates. M-Local determines a simple 3 parameter transformation (translation in N, E, U) and applies this translation to the base and all points surveyed from that base. This is an excellent tool for making minor shifts (only a few meters) from a base that was started using autonomous coordinates to WGS84 or NAD83 coordinates, or for merging several base points with DPOS derived coordinates to one single coordinate value (likely only a few centimeters). Unlike localization, M-Local does change the geodetic position of surveyed points. M-Local does not affect the geodetic position nor the grid position of design points. In Jim's example, there is likely some rotation applicable in the transformation because he's taking geometry with an unknown relationship to North and estimating the rotation to SPCS using the aerial imagery. M-Local does not consider rotation (nor should it for the intended purpose of the application).
Localization vs. CoGo Rotate and Translate
In my opinion localization is superior to rotate and translate found in CoGo because localization uses all points in the localization to determine the rotation and translation, while CoGo rotate and scale uses one single point for translation and one single baseline for rotation. The likelihood is that all points in the two coordinate systems (local and geodetic) have some error, the notion of a single local point (design) and a single geodetic point (surveyed) having zero error is unlikely. Having said that, a good localization requires that the design geometry be relatively accurate. This will likely be from modern surveys using theodolite/EDM or total station, and from quality work done with transit and chain. Compass and chain surveys are not good candidates for multi-point localization, but are better suited for single point, single baseline rotations, like CoGo translate and rotate. It is possible, and in my opinion more convenient to do this in localization than CoGo by changing the single pair of points used for translation, setting the rotation using a second pair of points, and then locking the rotation after it is solved in the parameters screen and switching the second pair of points to "check". This forces the localization to only use the single design point/ survey point pair for translation and the second design/survey pair to contribute only to the rotation. This can be changed and saved again and again using different points as necessary to keep the localization related to the nearest found points.
Summary
Jim's localization appears to have worked properly in his last attempt. If the inverse between a particular design point and surveyed point showed the same distance that the residuals in the localization showed for that particular pair, then the localization was successful. I cannot explain the stakeout issue. From what I saw of Jim's localization, my one question would be pertaining to scale. For the purposes of developing search coordinates, it doesn't matter, but to be the most technically correct in my opinion, Jim would force the localization to use the most appropriate geodetic scale factor rather than allowing the software to determine the scale factor from the localization. The localization scale factor will be determined by a comparison of design and surveyed points, which should hopefully be very close to the actual geodetic scale factor. In this example Jim likely knows or can make an initial assumption that the design data is from an on-the-ground survey in which case, the scale factor should be the reciprocal of the combined factor. He can use the localization determined scale factor to help verify that assumption, but ultimately, before calling the localization finished he would decide how his design data linear units relate to his survey linear units and force the scale factor to that.