ruk·si

♻️ Reproducibility

Updated at 2024-12-03 07:44

Reproducibility is the basis of scientific method.

Building machine learning solutions without reproducibility is like programming without version control. In addition to that, you need to keep track of the training data and experiment results.

Benefits of reproducible machine learning:

  1. You can build your own model again from scratch with controlled changes.
  2. You can build on top of your colleague's work and via versa.
  3. You can share your model to the broader community and use models from others.
  4. If key data scientist leaves, the company can still work on the model.

Requirements to reproduce a machine learning model:

  1. Recording each step it takes to build the model.
  2. Recording what raw data and which version was used.
  3. Recording what pre-processing was done for the raw data.
  4. Recording system-level dependencies such as OS and GPU drivers.
  5. Recording code-level dependencies such as NumPy and TensorFlow versions.
  6. Recording the actual code used for each step.

Requirements for machine learning teamwork:

  1. Sharing what experiments has been tried e.g. new features or hyperparameters.
  2. Sharing code changes made.
  3. Sharing what other team members are currently working on.

Main objection to reproducibility is how much extra process it introduces. If the workflow becomes a lot more complex, data scientists won't use the service/tool.

Developers' assumptions about the problem, scope and data should be documented. Then these assumptions can be applied to new information and data.

Control over pseudo-random is important. Machine learning uses pseudo-random numbers for tasks like sampling. You can control this by seeding your random generators.

The basic things to record for reproducibility:

  • Training data; model behavior comes partly from data
  • Implementation i.e., the code; even slight variations can have impact
  • Settings i.e., the configuration; like learning rate
  • Runtime environment; different hardware has different limits per batch
  • Runtime dependencies; different library versions work differently
  • How to report results; progress should be quantitative

Sources