In this post, I want to expand our understanding of the term ‘model’ – the Machine Learning type, not the Derek Zoolander type. The fundamental task in machine learning is to turn data into models and that’s what this post focuses on.

The dictionary definition of a model is:

“a simplified description, especially a mathematical one, of a system or process, to assist calculations and predictions”

Let’s pick it apart in a slightly different order:

**a system or process**– our ultimate goal is to*emulate*a system or process.**simplified description**– we want this description to be simple. By ‘simple’, we usually mean*compact*, so there’s an aspect of compression. This is the most important facet of a model – a compact description implies a deeper understanding of the process that we’re trying to model.**In the context of machine learning, compressing data into a model means chipping away at redundancy i.e. the only reason we expect machine learning to work is because we believe that the input data has redundancies that we hope to exploit to construct this compact description.****especially mathematical**– typical models are mathematical, but not all e.g. physical models used to understand and prepare for floods. In this post, I will focus exclusively on mathematical models, but it is important to recognize the amazing achievements of physical models.**assist calculations and predictions**– obviously, we would like to be able to predict, but note that we are happy with*assistance*i.e. compact descriptions which are not predictive are also considered models. Final point to consider is the word*prediction*– personally, I find it more useful to instead think of mapping from the space of inputs to the space of outputs. I find it better because the word prediction has a vague implication of causality, while most predictive systems are not causal.

##### Types of Models

###### Algebraic

An equation that describes a system.

As an example, consider Ohm’s Law, which states that

V=IRV=IR

This simple, compact description allows us to compute one statistical property given the other two. This fits all points in the definition above. Note, that this model is *statistical* in nature, i.e. we do not know OR reason about each individual atom or electron in the system that we are measuring, but we are happy measuring lower resolution properties of the system instead.

###### Combinatorial

A network of connections that describe a system.

There are many systems that are best described as networks e.g.

- Flowcharts. Flowcharts are compact representations of algorithms and are predictive (in the sense that they map their inputs to a (usually) finite set of outputs).
- Process flow diagrams. A process flow diagram (PFD) is a diagram commonly used in chemical and process engineering to indicate the general flow of plant processes and equipment. The PFD displays the relationship between major equipment of a plant facility and does not show minor details such as piping details and designations.
- Biological pathways. KEGG is a good example of biological models that are best described as networks.
- Topological Models. Topology is the study of shape and when used to analyze data, almost always relies on or constructs network representations of data. Simplicial Complexes such as this used for persistent homology calculations and Extended Reeb Graphs used for meta-modeling are examples.

Typically, combinatorial models such as these are not thought of as models, but it is important to recognize that this is a very flexible class of models that:

- Depict systems
- are Compact,
- mathematical (since graphs are easily represented as matrices)
- and assist in Computations

###### Software

A piece of software that describes a system.

As an example, consider a rule processing system. These systems are used to represent knowledge in specific domains e.g. most transaction monitoring systems used to prevent money laundering at banks.

##### Considerations

###### Regularization

We discussed the need for models to be compact earlier. Regularization is a set of methods in machine learning that force the models to be as compact as possible. Over and above leading to smaller models, regularization is also beneficial in reducing *Generalization Error* (i.e. the tendancy of a system to produce better results on training data rather than new/unseen data).

With four parameters I can ﬁt an elephant, and with ﬁve I can make him wiggle his trunk. – Jon VonNeumann look here

###### Implicit causality

Fundamental Attribution Error is at play here. In terms of modeling, people consider models to be agents i.e. we imbue intent to a piece of software. The fact that models can be used to predict, somtimes lulls us into thinking that models have some causal information (i.e. the dependent variables cause the independent variable). This is not the case. This is why I prefer to think of models as functions that map from the space of inputs to the space of outputs rather than think of them as being ‘predictive’.

Amusingly, I see many economists make this fundamental error.

Okay, so what if you **did** want to argue about causality. There are ONLY two ways of ever being able to argue about causality:

- The data used to construct the model has causality built in. Sometimes, the data has time dependency built-in – as an example think about the relationship between the number of tellers and the average wait time of a customer at a retail store. This is obviously causal, so any model that we derive from this data will allow us to argue about cause and effect.
- We find a model that does well and then run an experiment in real life. In many biological settings e.g. biomarker discovery, this is how it is done – find correlations and then run an experiment in real life to figure out whether the correlations were causal or not.

Outside of these two fundamental methods, there is NO way to argue about causality based on a model.

##### Summary

In the context of machine learning, compressing data into a model means chipping away at redundancy i.e. the only reason we expect machine learning to work is because we believe that the input data has redundancies that we hope to exploit to construct this compact description.

All models are wrong but some are useful. – George Box