Vital few, trivial many: May 2021

Thursday, May 27, 2021

On the Role of Counterfactuals in Learning

The following is a hypothesis regarding the purpose of counterfactual reasoning (particularly in humans). It builds on Judea Pearl's three-rung Ladder of Causation (see below).

One important takeaway from this hypothesis is that counterfactuals really only make sense in the context of computationally bounded agents.

Summary

Counterfactuals provide initializations for use in MCMC sampling.

Preliminary Definitions

Association (model-free):

Pr (Y = y ∣ X = x)

Intervention/Hypothetical (model-based):

Pr (Y = y ∣ d o (X = x))

Counterfactual (model-based):

Pr (Y = y ∣ d o (X = x), Y = y^{'})

In the counterfactual, we have already observed an outcome $y^{'}$ but wish to reason about the probability of observing another outcome $y$ (possibly the same as $y^{'}$ ) under $d o (X = x)$ .

Note: Below, I use the terms "model" and "causal network" interchangeably. Also, an "experience" is an observation of a causal network in action.

Assumptions

Real-world systems are highly complex, often with many causal factors influencing system dynamics.
Humans minds are computationally bounded (in time, memory, and precision).
Humans do not naturally think in terms of continuous probabilities; they think in terms of discrete outcomes and their relative likelihoods.

Relevant Literature:

Lieder, F., Griffiths, T. L., Huys, Q. J., & Goodman, N. D. (2018). The anchoring bias reflects rational use of cognitive resources. Psychonomic bulletin & review, 25(1), 322-349.

Sanborn, A. N., & Chater, N. (2016). Bayesian brains without probabilities. Trends in cognitive sciences, 20(12), 883-893.

Theory

Claim 1.

From a notational perspective, in going from a hypothetical to a counterfactual, the generalization lies solely in the ability to reason about a concrete scenario starting from an alternative scenario (the counterfactual). In theory, given infinite computational resources, the do-operator can, on its own, reason forward about anything by considering only hypotheticals. Thus, a counterfactual would be an inadmissible object under such circumstances. (Perfect knowledge of the system is not required if one can specify a prior. All that is required is sufficient computational resources.)

Corollary 1.1.

Counterfactuals are only useful when operating with limited computational resources, where "limited" is defined relative to the agent doing the reasoning and the constraints they face (e.g., limited time to make a decision, inability to hold enough items in memory, and any such combinations of these constraints).

Corollary 1.2.

If model-based hypothetical reasoning (i.e. "simulating") is a sufficient tool to resolve all human decisions, then all of our experiences/observations should go toward building a model that is as accessible and accurate as possible, given our computational limitations.

By Assumption 1, the vast majority of human decision-making theoretically consists in reasoning about a "large" number of causal interactions at once, where “large” here means an amount that is beyond the bounds of the human mind (Assumption 2). Thus, by Claim 1, we are in the regime where counterfactuals are useful. But in what way are they useful?

By Corollary 1.2, we wish to build a useful model based upon our experiences. A useful model is one that is as predictively accurate as possible while still being accessible (i.e. interpretable) by the human mind. Given that: (1) a model is describable as data, (2) the most data can be stored in our brains in the form of long-term memory, and (3) the maximal predictive accuracy of a model is a non-decreasing function of its description length, then a maximally predictive model is one that is stored in our long-term memory. However, human working memory is limited in capacity relative to long-term memory.

Claim 2.

The above are competing factors: A more descriptive (and predictive) model (represented by more data) may fit in long-term memory, but due to a limited working memory, it may be inaccessible (at least in a way that leverages its full capabilities). Thus, attentional mechanisms are required to guide our retrieval of subcomponents of the full model to load into working memory.

Again, by Assumptions 1, 2, our models are approximate — both inaccurate and incomplete. Thus, we wish to improve our models by integrating over our entire experiences. This equates to computing the following posterior distribution:

$Pr (c a u s a l n e t w o r k ∣ e x p e r i e n c e)$

= \frac{Pr (e x p e r i e n c e ∣ c a u s a l n e t w o r k) \times Pr (c a u s a l n e t w o r k)}{Pr (e x p e r i e n c e)}

By Assumption 3, humans cannot compute updates to their priors according to the above formula.

Claim 3.

Humans do something akin to MCMC sampling to approximate the above posterior. Because MCMC methods (e.g., Gibbs sampling, Metropolis-Hastings) systematically explore the space of models in a local and incremental manner (e.g., by conditioning on all but one variable in Gibbs sampling, or by taking local steps in model space in Metropolis-Hastings) AND only require reasoning via likelihood ratios (Assumption 3), we can overcome the constraints imposed by our limited working memory and still manage to update models that fit in long-term memory but not entirely in working memory.

MCMC methods require initialization (i.e. a sample to start from).

Claim 4.

Counterfactuals provide this initialization. Given that our model is built up entirely of true samples of the world, our aims is to interpolate between these samples. (We don't really have a prior at birth on the ground-truth causal network on which the world operates.) Thus, we can only trust our model with 100% credibility at observed samples. Furthermore, by Assumption 2, we are pressured to minimize time to convergence of any MCMC method. Hence, the best we can do is to begin the MCMC sampling procedure starting from a point that we know belongs in the support of the distribution (and likely in a region of high density).

From the Metropolis-Hastings Wikipedia:

Although the Markov chain eventually converges to the desired distribution, the initial samples may follow a very different distribution, especially if the starting point is in a region of low density. As a result, a burn-in period is typically necessary.

Counterfactuals allow us to avoid the need for any costly burn-in phase.

Wednesday, May 26, 2021

Wider aspects of science management

Decide on your core principles of management and disseminate these. If you value scientific quality very highly, then tell everyone. Do not be sucked into micro-management.
Whether you know it or not, you set the atmosphere or culture for those who work for you. This will influence all parts of their outlook and behavior at work.
Management starts with recruiting great people. Character is often more important than knowledge. You can teach knowledge but you cannot change someone’s character.
Project management has three pillars for you to control; quality, time and budget. Quality is clearly important but all three pillars need your attention.
Don’t get so mired in process that you lose sight of the purpose of the organization!
Empower people by giving them your trust.
When the opportunity arises, don’t hesitate to praise staff, but be genuine.
Be loyal to your staff in public.
As opportunities change, you may need to re-structure, but give your staff a good rationale. Review evidence from the past to find out what worked?
Communicate clearly; avoid, if not eliminate entirely, as much jargon and acronym use as possible.
Reduce your emails and put effort into seeing people face to face.

How to be a better supervisor

Your attitude is critical to the success and development of your students.
Remember that those training for a PhD are not ‘hired hands’.
Make deliberate plans to meet those you supervise regularly (between once and four times a month); don’t leave it to chance.
Work on building a relationship of trust and honesty.
Recognize they will need you most when things don’t go to plan.
When progress is much less than expected, do not be confrontational, try to get to the bottom of problems and, if needed, summon additional resources to help.
You want your chicks to be able to fly the nest, so build their confidence and encourage their independence.
When complex issues arise, doing what is best for your student will also be best for you.

When things are not going well

Problems, mishaps, experimental failures and rejection are common to the research experience of all scientists.
If you are feeling anxiety and self-doubt about your research, you will not be alone. This is not a failure or weakness in you.
Don’t assume that working longer and longer hours will solve all your research problems.
If stresses are really getting to you, then open up to friends, welfare officers and your supervisor(s) about it. It is essential your supervisor knows because he or she will be part of the solution.
Maintain a healthy and fulfilling life outside science. Avoid 24/7 immersion!
Be prepared to take a complete break if you need to. Refresh your brain.
Seek professional help and advice about depression early, don’t fight it entirely on your own.

How to cope with rejection

Rejection in science is a normal part of life and in part reflects the ‘self-correcting’ nature of science. Every scientist suffers!
When it happens don’t lash out or over-react! Let the dust settle for a day or two.
Having ideas rejected or criticized by your supervisor is better than being knocked down publicly later on. The subsequent refinements could lead to greater success in your work.
Having your paper rejected happens to everyone, to good and bad scientists. But you can bounce back with submission to another journal.
Console yourself that many theories we now accept and use were initially rejected!
Despite the temptation, don’t assume your grant or job application was turned down purely because of some problem in your character.
You will survive!

Writing grant proposals

Do your research on funding organisations and carefully analyse what a call for proposals may be requesting.
Enlist as much help as you can!
Having collaborators with different skills can add to the quality of the science but you must meet them and ensure you understand each other.
Clarity will be vital in your writing.
State early on what hypotheses you are testing.
Find a way to bring in a ‘wow factor’ with your offer.
Get started early; time will run out fast!
Learn what makes a good proposal by becoming a reviewer yourself!

Writing a (good) scientific paper

First of all, review your data and identify what is unique and special about your paper. This will now be your focus. Stick to this main focus/finding!
Think through the structure of the paper before you start writing. Only use material that will really contribute to the story.
Structure your paper with lots of subheadings, don’t be shy!
Have you gone through every single sentence in the paper to make sure it is as short as possible (a sentence should be no more than 1–1.5 lines in A4 12 point)? Organise carefully to avoid repetition.
Have you taken every step to make sure the paper is as short as possible (to maintain the focus)? Fewer words are ALWAYS better than more!
Avoid complex phrases and keep to simple words.
Avoid using personal terms such as ‘our’ or ‘we’. Science is supposed to be impersonal and objective.
Use acronyms sparingly and ensure they are properly explained at first use.
Make figure legends full and complete and explain as much as possible (date samples taken, replication, etc.).
Do not make the figure the subject of the sentence!
Discuss your results and end the sentence with the figure/table in brackets. For example ‘the highest concentrations of Zn were in the Yellow river (Fig. 6)’, rather than ‘Figure 6 shows . . .’
Do not repeat in the text long lists of data which you have already presented in the figures and tables. Results and discussion should be a limited summary of the main findings. The data in the figures/tables do not need to be repeated in the text.
Do not use several significant figures without good reason! A value of 7.6 is better than 7.5894! This is particularly true when model estimations are given, because here there will be large uncertainty.
Ensure that the amount of discussion is proportional to the importance of the topic. Do not distract the reader with long discussions on aspects that are trivial compared to the main focus of the paper.
Maintain that narrative thread! You must keep the readers’ attention so that they can follow your story. Don’t run several different stories or mess things up with information that distracts from your storyline. If you try to put too much stuff into a bag, it will break!
Use emphasis carefully. Don’t say ‘this clearly shows’ as that implies certainty and hints at arrogance. Try instead to use ‘this strongly suggests’ or ‘this indicates’ which, whilst revealing your conviction, still leaves a space for uncertainty.
Work hard to squeeze out any ambiguity. Try to make each sentence stand alone and not require the preceding sentence to make sense. Put the paper down and re-read after a week. Does it still make sense?
The conclusions section should be as brief as possible, a paragraph of no more than 1/3 of a page. Don’t re-open the discussion. If permitted, bullet points are very useful.

Giving a presentation

Be sympathetic to your audience! Don’t overestimate the knowledge of your audience or how quickly they can assimilate your information.
Consider your talk as a story and draw your audience to an inescapable conclusion.
Manage the amount of information presented on each slide wisely so that it is easily digestible.
Ensure all material is legible, even from the back of a large hall.
Always strive to help your audience understand by carefully identifying the key findings.
Avoid acronyms as much as possible.
Show honesty and your human side.
Look out at your audience so that no one feels left out, some movement will help you feel relaxed and keep the audience with you.
Practice and ensure you can keep to time.

Time management

Identify which tasks are critical and prioritize them.
Do make a plan and keep it near you, in plain sight.
When working with others, double check that the task is understood by all parties.
Protect yourself from the time vampires!
Keep nurturing scientific relationships.
Check if your character is playing a detrimental role.
Permit yourself breaks.