I think you are mistaking arbitraging the systems for some sort of systemic failure. The system was built to track a binary condition, "was a payment collected?" CAN employees figured out that so long as the condition was true they would show fewer delinquencies since it doesn't sounds like a payment schedule was created at origination and payments were reconciled against it on an ongoing basis. Now, there's an argument that the system should have solved for that, by say tracking expected payment against actual received, which is a fine bit of armchair quarterbacking. It might have been thousands of payments, but the number of impacted loans is what actually matters.

Let's say CAN has made 500,000 loans. Now, let's assume that 10% were fudged. That leaves 450,000 totally valid and valuable observations over 19 years. Even if the number of impacted loans is 20% of their book, that is still more valid observations than probably anyone in the rest of the industry. Any negative impact to the data can be mitigated through effective ETL prior to using the data for analytics or modeling.