How to make a Unicode identifier valid?

Hi experts,
USD 24.03 supports Unicode identifiers, but the C++ API function TfMakeValidIdentifier still keeps the old behavior.
Is there any method to make a Unicode identifier valid?

Thanks.

1 Like

see Bi-Directional Transcoding of Invalid Identifiers by miguelh-nvidia · Pull Request #37 · PixarAnimationStudios/OpenUSD-proposals · GitHub for a proposal on this :pray:t3:

3 Likes

If you want a functional equivalent of TfMakeValidIdentifier, you should be able to use the unicode utilities provided in tf to convert the string to code points, replace the non-identifier code points with _, and then convert the code points back into a string.

We didn’t update TfMakeValidIdentifier / TfIsValidIdentifier since they have potential usages outside of SdfPath validation. We were also hesitant about making tf dependent on what sdf considers to be a valid identifier.

2 Likes

It seems like I’ve landed at this question today as well. Is the following what we’re required to implement in order to mimic TfMakeValidIdentifier?

std::string make_valid_name(const std::string &name) const
{
  // OLD
  // return pxr::TfMakeValidIdentifier(name);

  // NEW
  if (name.empty()) {
    return "_";
  }

  const pxr::TfUtf8CodePoint cp_underscore = pxr::TfUtf8CodePointFromAscii('_');

  bool first = true;
  std::stringstream str;
  for (auto cp : pxr::TfUtf8CodePointView{name}) {
    const bool cp_allowed = first ? (cp == cp_underscore || pxr::TfIsUtf8CodePointXidStart(cp)) :
                                    pxr::TfIsUtf8CodePointXidContinue(cp);
    if (!cp_allowed) {
      str << '_';
    }
    else {
      str << cp;
    }

    first = false;
  }

  return str.str();
}

I think that would work @deadpin.

1 Like