Comparing large data sources with tolerance?

Hello,

We are trying to write unit tests that compare large data source values (for example, the vertices of a mesh) against an expected reference value.

However, we’re running into issues when it comes to comparing values that differ slightly, mainly with floating point discrepancies across platforms/architectures. Since the values can be slightly different, we need to be able to compare values with a certain tolerance, which means we can’t rely on the VtValue == operator.

We’ve tried using the HdDebugPrintDataSource method to print data sources to text files, and compare text dumps obtained during test execution against a reference value dumped out earlier. However, floats are being printed out exactly, so the printed values can be textually different, even if the values themselves are nearly the same (e.g. 2.7755576e-16 vs 2.220446e-16). Trying to patch these using e.g. regexes is much too slow on large data sources to be viable.

We’ve tried to see if there was a way to control how the DataSource/VtValue/floats are printed, for example such that we can specify how much precision we would like to use when printing out. However, it seems floats are currently printed out as char*, independently using Google/V8’s double-to-string conversion utility, which prevents us from using things such as std::setprecision, and we can also not specify how we want the converter to be used (as it is both const and local to one cpp file). Currently, the ToShortestSingle and ToShortest methods are used, but we cannot alter which method should be used (e.g. ToFixed would allow us to specify how many decimals we want).

Another of our projects using USD currently has some utilities that implement custom VtValue comparison methods that accept a tolerance threshold, and the comparison method used to compare two VtValues is retrieved based on the type info of the values. This mostly solves the tolerance problem, however, this would also require us to author the reference values in code, which for large data sources would be quite cumbersome. Something that would be useful to remedy this would be DataSource serialization : is this something that has been thought about? This would allow us to setup the expected result, and instantly dump out a reference value, rather than manually extracting it out into code. I would imagine it could be useful for other purposes as well.

If you have guidance on what approach is best, or if there is something else planned that could address this, please let us know.

Thanks!

Philippe