Saving Space by Using JSON Inside Serialized PHP

Sometimes we have a chunk of arbitrary data—a string, a float or something more complex, such as an array of arrays—that we need to persist until some later point.

A nice real-world example of a mechanism that facilitates this, though certainly not the only one, is the WordPress Options API. Using this, I mostly don’t need to worry about the shape or size of my data; I just pass it across and, through some magic, will be able to retrieve it sometime in the future.

Does working this way (in the sense of storing ad hoc, loosely structured pieces of data) reflect good design? Sometimes it doesn’t, but we’ll park that discussion as it’s also true that we may find ourselves building on top of systems that we don’t have complete control over, and so making things work becomes the order of the day.

Under the hood, when using the Options API, whatever data I pass is serialized using native PHP serialization and stored in a LONGTEXT column, which provides 4mb of storage space. That’s fine, right up until it isn’t. What if I have an array containing several gazillion rows and 4mb just isn’t big enough, or what if I’m working with something other than WP Options and the available space is a lot more restricted than 4mb?

Well, we can do lots of things: we might chunk the data, or we might decide we really do have to use (and create, if it doesn’t already exist) a new storage solution, be that a bespoke table or something else. Sometimes, though, those things are luxuries that are not available to us in the immediate term, and in those cases we might be able to make some economies by getting smart about the way we store data.

Consider the following—basically, an array of arrays, with each inner array describing a different employee:

$data = [
    [
        'employee_id' => 1234,
        'name' => 'Archibald Spigly',
        'dob' => '1970-05-06',
        'scale' => 'L4',
    ],
    [
        'employee_id' => 2345,
        'name' => 'Nancy Bellweather',
        'dob' => '1987-11-20',
        'scale' => 'L5',
    ],
    [
        'employee_id' => 3456,
        'name' => 'Lallen Spade',
        'dob' => '1984-08-30',
        'scale' => 'L3',
    ],
];

Serializing this with serialize() gives us a string that is 363 characters long. If instead we pass it through json_encode(), though, we end up with a string that is just 235 characters long. That’s a saving of about 35%.

This happens because JSON is far less verbose than the serialized PHP format, as the latter includes extra information such as the number of elements in each array as well as the length of each string. JSON, on the other hand, does no such thing and therefore is more compact.

In my example we can also see that each inner array shares the same structure: we have the same elements in the same order. Potentially, then, we could run them through array_values(), which would strip the keys and give us a structure like this:

$data = [
    [
        1234,
        'Archibald Spigly',
        '1970-05-06',
        'L4',
    ],
    [
        2345,
        'Nancy Bellweather',
        '1987-11-20',
        'L5',
    ],
    [
        3456,
        'Lallen Spade',
        '1984-08-30',
        'L3',
    ],
];

If we repeat the same experiment, we now see that when passed through native PHP serialization we end up with a string that is just 255 characters long—almost but not quite as compact as the JSON we generated last time round—and if we then turn this newly simplified structure into JSON, we end up with a string that contains a mere 130 characters. Nice.

What’s interesting is that the ‘overhead’ of native PHP serialization increases according to the complexity of the structure. Something simple such as a single string (and that’s what we have, after JSON-ifying things) doesn’t inflate after serialization nearly as much as an array of arrays.

Given that, if I were to take our nice and short 130 character-long JSON string and serialize it with PHP (by virtue of passing it to the Options API, or similar), I would end up with a new string that is still just 139 characters long. Still nice and compact and roughly a 60% saving when compared to the very first serialized representation we looked at (when the array keys were still present).

Summing up:

  • JSON-encoding a structure (that is ultimately going to be serialized by PHP) can be more space-efficient than not JSON-encoding it.
  • Stripping the array keys can introduce further efficiences and we can potentially combine this with the above technique.
  • As with anything YMMV…it’s all about trade-offs and sometimes this just won’t be viable or the data you are storing may not be a good candidate for this strategy.

Leave a comment

Blog at WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started