Last month, Gene Callahan brought to my attention an error that is present in Robert Axtell’s paper “Why Agents?”
Before I respond to his post, I encourage readers to familiarize themselves
with not only this paper, but Axtell’s work and other research representative
of the growing agent-based paradigm. While I don’t expect that equilibrium
models that employ systems of linear equations will disappear any time soon,
agent-based and complexity economics are here to stay. The paradigm
will only grow more influential over the next decade or two.
The error itself exists in what appears
to be a tangential discussion within Axtell’s paper. He identifies “bugs” that
arise in the program as “artifiacts”. These artifacts produce output that
typically produce results that are not robust to small parameter changes.
Axtell writes,
This architecture, in which very little source code effectively
controls a much larger amount of execution code, is the basis for the highly
scalable nature of agent-based models. The number of agents or commodities, for
instance, are conveniently specified as user-defined constants in the source
code or read in from the command line, and thus the scale of the model can be
specified at compile or run-time. Typically, no significant rewriting of the
code is needed in order to change the scale of the model.
It is also the case that the “small source, large execution code”
character of agent computing is partially responsible for the production of
artifacts, an important class of
systematic problems that can arise in agent models, as alluded to above. When a
small amount of code — say a single commodity exchange rule, for example —
controls so much other code, then it will sometimes be the case that an
idiosyncrasy in the rule code will produce output that one might erroneously
take to be a significant result of the model. A common route to
phenomena of this type occurs when the agent interaction methods impose some
spurious correlation structure on the overall population — for instance, agents
are interacting with their neighbors more than with the population overall in
what is supposed to be a “soup” or "mean field" interaction model —
then an ostensibly systematic result — large price variance, say — is clearly
artifactual.8 There is no real solution to this problem, aside from
careful programming. One can, however, look for the existence of such artifacts
by making many distinct realizations of an agent model, perturbing parameters
and rules. When small perturbations in the code produce large changes in the
model output, then artifacts may be present. Sometimes, large changes in output
are realistic and not signatures of artifacts. For example, imagine that a
small change to a threshold parameter makes an agent die earlier than it
otherwise would, and therefore induces at first a small change in agent
exchange histories (i.e., who trades with who), that over time is magnified
into a wholesale change in the networks of agent exchange. Perhaps this is not
unrealistic. But when such large scale changes have origins that are
unrealistic empirically, then one should be instantly on guard for undiscovered
flaws in the source code.
Gene Callahan identified as a causal
error in his post,
Rob Axtell, in his 2000 paper "Why agents?
On the Varied Motivations for Agent Computing in the Social Sciences,"
attributes the existence of what he calls "artifacts" (program
behavior that is not a part of the model being created, but a byproduct of a
coding decision which was intended only to implement the model, but actually did
something else as well) "partially" to the fact that, in agent
models, a small amount of source code controls a large amount of
"execution" code. As an example, he offers a system where millions of
agents may be created and which might occupy up to a gigabyte of memory, even
though the source code for the program is only hundreds of lines long.
But this explanation cannot be right, because the causal factor he is talking about does not exist. In any reasonable programming language, only the data for each object will be copied as you create multiple instances of a class. The functions in the agent-object are not copied around again and again: they sit in one place where each agent "knows" how to get to them. What causes the huge expansion in memory usage from the program as it sits on disk to the program running in RAM is the large amount of data involved with these millions of agents: each one has to maintain its own state: its goal, its resources, its age: whatever is relevant to the model being executed.
So what we really have is a small amount of code controlling a large amount of data. But that situation exists in all sorts of conventional data-processing applications: A program to email a special promotional offer to everyone in a customer database who has purchased over four items in the last year may control gigabytes of data while consisting of only a few lines of source code. So this fact cannot be the source of any additional frequency of artifacts in agent-based models.
But this explanation cannot be right, because the causal factor he is talking about does not exist. In any reasonable programming language, only the data for each object will be copied as you create multiple instances of a class. The functions in the agent-object are not copied around again and again: they sit in one place where each agent "knows" how to get to them. What causes the huge expansion in memory usage from the program as it sits on disk to the program running in RAM is the large amount of data involved with these millions of agents: each one has to maintain its own state: its goal, its resources, its age: whatever is relevant to the model being executed.
So what we really have is a small amount of code controlling a large amount of data. But that situation exists in all sorts of conventional data-processing applications: A program to email a special promotional offer to everyone in a customer database who has purchased over four items in the last year may control gigabytes of data while consisting of only a few lines of source code. So this fact cannot be the source of any additional frequency of artifacts in agent-based models.
The code itself is not copied. Rather, the same code is
used to instantiate some number of agents. Each of these agents has objects of
its own. These objects, as opposed to the code itself, occupy memory. So the
error here is mostly identified by Gene, but it can be investigated further.
What we have is an increase in the
number of interactions which occurs as a result of having numerous agents, as
compared to a single program. It just so happens that the objects that are
instantiated along with the agent take up space. However, even if they took up
no space, the problem of "artifacts" / bugs would exist. The more
agents there are, and the more they interact, the more opportunities there are
for bugs to arise. The problem is a combinatorial one. All else equal, the increase in the number of agents and
their objects grow the behavior space of an ABM program at an increasing rate.
To demonstrate, consider a simple
combinatorial problem where we want to find the number of possible permutations
(ordering matters). How many ways can a line of 5 people be arranged? The
answer: they can be arranged in 5! possible states, 5 * 4 * 3 * 2 * 1 = 120.
The problem is simple to think through. If agent 1 stands at the front of the
line, the other four agents can be arranged in 4! (= 24) different states.
Repeat this process with agents 2 through 5. I could continue to break the
problem down iteratively, start with agent 1 in front and 2 second in line. The
others could reorganize in 3! (= 6) different states. Replace agent 2 with
agents 3, 4, and 5 and now we have 4! different states with agent 1 in front.
Replace agent 1 with agents 2, 3, 4, and 5 and, voila, we’ve arrived back at my
earliest formulation of the answer. Increase the number of agents to 6, and now
the number of possible permutations grows from 120 to 720. Increase to 7 and
now there are 5760 possible permutations.
Now imagine growing the number of agents
to something in the range of 100,000 or 1,000,000 agents. Let’s say that each
of these agents own 10 different objects. If all of these objects can
potentially interact with one another due as part of pairwise agent
interactions, then the model represents a behavior space that is vast. Within
that behavior space is likely a large number of “artifacts” that cannot be
detected until runtime.
Axtell is correct that “artifacts” might arise due to
emergent patterns not accounted for by the programmer. He mentions one example:
A common route to phenomena of this type occurs when the agent interaction methods impose some spurious correlation structure on the overall population — for instance, agents are interacting with their neighbors more than with the population overall in what is supposed to be a “soup” or "mean field" interaction model — then an ostensibly systematic result — large price variance, say — is clearly artifactual.
Also note that in the first quite, Axtell does recognize that "A common route to phenomena of this type occurs when the agent interaction methods impose some spurious correlation structure on the overall population." He intuitively, even if not explicitly, sees that the problem is combinatorial.
Axtell is pointing to an effect of a bug in the code. This effect – in this case, a higher than expected rate of local interactions - is what skews the data and may make a spurious result appear to be significant. However,
this does not capture the full extent of the problem. The combinatorial growth of interacting agents and the objects
associated with them implies that the behavior space is growing at
an increasing rate. With every increase in memory, the rate of
growth of the behavior space is increasing! But in this analysis, memory is
only a stand-in for the objects that occupy it. What Axtell appears to be calling execution code are the objects instantiated by that code. The amount of memory occupied by these objects - as opposed to the growth rate - has no bearing on the combinatorial problem. It is the number of objects that is of interest. These objects, whose content
are transformed by the processes associated with them, promote massive growth of the behavior space as population increases. It is the job of the scientist to identify and
categorize different types of states and progressions. One of those types is
the “artifactual.”