MATLAB: What are the internal differences between Matlab strings and character arrays

internalstring

Matlab strings were introduced in 2016b presumably to make working with a string more similar to other languages, as compared to character arrays.
Although the documentation clearly states most details that people will want to know about a string, I'm a bit unclear as to how a string and character array are different, other than having different methods.
Presumably strings are still encoded using UTF-16. Although I haven't tried it, I wouldn't expect that mex files support strings.
Also in this post, https://blogs.mathworks.com/loren/2016/09/15/introducing-string-arrays/, Loren mentions that "string arrays" are "more efficient" for storing text data. Why? (This might be a string array question and not something specific to strings vs chars).

Best Answer

When storing multiple items of text, to store it as a cell array of character vectors requires 112 bytes of overhead per item, because that is the overhead for non-empty cell array entries: cell arrays do not know ahead of time that each entry will be the same type and so has to store the type and full size information for each.
string arrays, on the other hand, need an overall size, and an overall type that applies for the entire array, but after that need only a length (not full array dimensions) and data pointer per entry.
The size also changes in ways that indicate some internal chunking:
  • strings of length 0 through 10 take 132 bytes
  • strings of length 11 through 15 take 142 bytes
  • longer strings take an additional 16 bytes for each 8 characters or fewer
For unshared strings, this would allow small numbers of characters to be appended without reallocating, which could help performance.