The case against the ifsetor function
Recently igorw wrote a blog post on how to traverse nested array structures with potentially
non-existing keys without throwing notices. The current “idiomatic” way to do something like this, is to use
isset()
together with a ternary operator:
$age = (isset($data['people'][0]['age'])) ? $data['people'][0]['age'] : null;
The suggested alternative is a get_in
function, which is used as follows:
$age = get_in($data, ['people', 0, 'age'], $someDefault);
Someone on /r/PHP pointed out that there is an alternative approach to this problem, namely the use
of an ifsetor
function:
function ifsetor(&$value, $default = null) {
return isset($value) ? $value : $default;
}
The use of this function is very elegant:
$age = ifsetor($data['people'][0]['age'], $someDefault);
Note that this function will not throw a notice if $data['people'][0]['age']
(or any index in between) does not
exist, because the $value
is passed by-reference.
By-reference argument passing
This seems to come as a surprise to most people, because the code definitely looks like it accesses an undefined index and ought to throw a notice. Here the magic of references comes in: If you perform a by-reference argument pass (or assign) PHP will be using a different fetch type for retrieving the array offsets. In this particular case it would issue a number of “dim w” (dimension write) fetches.
A write-fetch obviously doesn’t throw a notice if the assigned index doesn’t exist yet (otherwise you wouldn’t be able to create indexes without throwing notices). What’s interesting is that the whole thing also work recursively, so none of the indices in the chain have to exist:
$array[0][1][2] = 'foobar';
The above example will not throw a notice if $array[0][1]
doesn’t exist, it won’t throw a notice if $array[0]
doesn’t exist and it even won’t throw a notice if the $array
variable itself doesn’t exist.
PHP implements write-fetches by creating the respective offset and initializing it to null
(if it doesn’t yet exist).
This is compatible with nested index assigns because PHP allows silent casts from null
(and other falsy values) to
arrays. So the following will run without notices:
$null = null;
$null[42] = 'foobar';
var_dump($null); // [42 => 'foobar']
So what the $array[0][1][2] = 'foobar'
assignment really does is something along these lines (obviously heavily
simplified):
$array = null;
$array[0] = null; // implicitly converts $array to array
$array[0][1] = null; // implicitly converts $array[0] to array
$array[0][1][2] = 'foobar'; // implicitly converts $array[0][1] to array
Issues with the ifsetor function
Creation of dummy values
Getting back to ifsetor
, the exact same thing occurs there as well. Before the value is passed to ifsetor
the
write-fetches will already have created the offset-chain and initialized it to null
. This means that the function
call will leave behind a null
value in the index that way passed to it (unless it already existed previously):
$age = ifsetor($data['people'][0]['age'], $someDefault);
var_dump($data['people'][0]['age']);
// NULL, without notice (assuming the index didn't exist beforehand)
Depending on the context where this occurs, the additional null value might not be of any consequence. On the other hand it could just as well lead to data corruption, e.g. it could result in a 0 age to be persisted, or something like that.
This is a somewhat subtle issue and quite easy to overlook if you’re not familiar with the details of by-reference passing.
No notices for nested indices
Another issue is that ifsetor
doesn’t only suppress the notice on the outermost index, but also on all previous
indices and even the array variable itself. As an example, consider the following, slightly changed code:
$age = ifsetor($date['people'][0]['age'], $someDefault);
This line contains a typo, namely it incorrectly uses $date
instead of $data
. Normally PHP would immediately tell
you about such a typo, but here all notices are suppressed.
This behavior may or may not be what you want. Obviously you’d want a notice for a misspelled $data
, but it depends
on the context whether a non-existent $data['people'][0]
should throw a notice or not. Igor’s get_in
function lets
you control which behavior you want:
$age = get_in($data, ['people', 0, 'age'], $someDefault); // notice on missing $data
$age = get_in($data['people'], [0, 'age'], $someDefault); // notice on missing $data['people'] as well
$age = get_in($data['people'][0], ['age'], $someDefault); // notice on missing $data['people'][0] as well
From my experience you usually only want to suppress the notice on the outermost index. The $data['people'][0]['age']
example I’ve been using throughout the post is somewhat artificial and the real code would likely not involve an
explicit [0]
access and use a loop instead:
$people = ifsetor($data['people'], []);
foreach ($people as $person) {
$age = ifsetor($person['age']);
// do something with $age
}
See how in both cases something of type $knownExisting['potentiallyNotExisting']
is fetched? I think that most uses
of the isset($x) ? $x : $d
pattern are of this form. We’ll get back to that at the end of the post.
Null values treated as non-existing
Recall the definition of the ifsetor
function:
function ifsetor(&$value, $default = null) {
return isset($value) ? $value : $default;
}
Another more explicit, but otherwise strictly identical, way of writing the function is this (as isset
on an existing
variable is just a null
check):
function ifsetor(&$value, $default = null) {
return null !== $value ? $value : $default;
}
Note that this function is not capable of distinguishing a null
value that was actually stored at the index and a
null
value that was implicitly created by the write-fetch. To ifsetor
a non-existing offset and an offset containing
null
are the same.
If $default
is not null this might result in unexpected behavior:
$array = [null];
$result = ifsetor($array[0], 42);
// $result is now 42, not null, even though $array[0] existed
Once again, whether or not this is an issue heavily depends on context. In many cases it’s okay to treat null
as
non-existent. In other cases it isn’t.
The usual way to avoid this is to use array_key_exists
instead of isset
(e.g. the get_in
function does this), but
there is no way to implement ifsetor
this way.
Default is always evaluated
This is an issue that was already mentioned in the comment on reddit: The default value passed to ifsetor
is always
evaluated, even if it is not needed:
$value = ifsetor($array['index'], new Foo);
Here the new Foo
object is always created, even if $array['index']
exists. In this case it would just cost one
additional object allocation, but it could also be a more expensive operation (with potential side-effects):
$value = ifsetor($array['index'], calculateValue());
This issue exists with both ifsetor
and get_in
and can’t really be solved in any convenient way. You need to either
write the conditional explicitly or use a (rather verbose) lambda function.
Personally I think that this is actually the least of the issues, because typical default values are simple falsy values
like null
, false
, ''
or maybe []
. In the cases where you need a complex default value it’s probably okay to
write out an if statement.
By-reference passing often forces a copy
Consider this snippet again:
$people = ifsetor($data['people'], []);
Because ifsetor
accepts the $value
by reference, there is a good chance that the above call will require a full
copy of the $data['people']
array if it exists. The reason behind this is that
PHP’s copy-on-write implementation forces a zval separation when converting a value to a reference, if
the zval had a refcount>1 beforehand (i.e. if the $data['people']
zval was used in more than once place).
Once again I think that this is not a particularly large issue, because ifsetor
would mostly be used on small scalar
types like integers or strings, rather than arrays with a hundred thousand elements, so the copy doesn’t matter much.
Still, I think that it’s important to be aware of this issue, because in specific cases this minor issue can easily degenerate to pathological slowdowns. E.g. the Twig templating engine once had a similar issue, caused by lack of copy-on-write support for the ternary operator in ancient versions of PHP (“ancient” obviously means PHP 5.3 :P)
The ifsetor language construct
As you can see the ifsetor
function has a lot of issues. I think they can be roughly summarized as “there is way too
much by-ref magic involved”.
One way to solve some of the problems is to turn ifsetor
into a language construct. This feature
was proposed some time ago and declined, the main reason being that you can use the userland
implementation discussed above.
The ifsetor
language construct would effectively work by a direct expansion to the isset($x) ? $x : $d
pattern:
$value = ifsetor($array['index'], $default);
// expands to
$value = isset($array['index']) ? $array['index'] : $default;
Lets take a look at how this would solve some of the issues:
- Creation of dummy values: The
isset()
construct does not use write-fetches. It has its own “is” fetch type, which will also suppress notices during the lookup, but won’t create any dummynull
values. - Default is always evaluated: The ternary operator will only evaluate the branch is actually taken, so the default will only be computed if it is used.
- By-reference passing often forces a copy: Since no by-reference pass is involved anymore (and the ternary operator uses copy-on-write since PHP 5.4) a copy is no longer necessary.
This leaves only two open issues:
- Null values treated as non-existing: With a language construct this could actually be resolved as well (it’s
hard to put the necessary transformation in terms of PHP code, but internally it’s possible), but I’m not sure it
would make sense, because it makes
ifsetor
inconsistent toisset
. - No notices for nested indices: No way around this, at least with that kind of approach.
For me personally the last issue is pretty critical, because I don’t like to suppress more errors than strictly necessary (but well, I could probably live with that.)
What to do?
After explaining why the ifsetor
function may not be such a good idea, let’s get to what you can use instead. In cases
where you have deeply nested structures with potentially undefined indices all along the path, I’d use Igor’s get_in
function.
But for the 95% use case, where you only need the notice suppression on the outermost index, why not just go with the simplest and least-magic solution?
function array_get(array $array, $index, $default = null) {
return array_key_exists($index, $array) ? $array[$index] : $default;
}
// usage
$age = array_get($person, 'age', $someDefault);
I know, I’m stating the obvious here. This is really just get_in
without the nesting support (thus simplifying its use
for this particular case).
But this still looks rather clumsy. What I really want is the same using scalar objects:
$age = $person->get('age', $someDefault);
Yes, this is me calling the get
“method” on an array. I think this is a very simple and readable solution. But anyway,
that’s just a bit of day-dreaming, I have no idea when we’ll be introducing scalar objects in PHP (not 5.6 at least).
To finish up this post, let me show you another very elegant way of approaching the problem:
$age = ($_=& $person['age']) ?: $someDefault; // great for code obfuscation