Skip to content

Latest commit

 

History

History
73 lines (47 loc) · 5.29 KB

README.md

File metadata and controls

73 lines (47 loc) · 5.29 KB

Machine Learning Mischief

Machine Learning Mischief

It is possible to "bend" machine learning experiments towards achieving a preconceived goal?

This involves systematically exploiting evaluation metrics and/or scientific tests to achieve desired outcomes without actually meeting the underlying scientific objectives.

These behaviors are unethical and might be called cherry picking, data dredging, or gaming results.

Reviewing examples of this type of "gaming" (data science dark arts) can remind beginners and stakeholders (really all of us!) why certain methods are best practices and how to avoid being deceived by results that are too good to be true.

Examples

Below are examples of this type of gaming, and simple demonstrations of each:

How To Spot

Results presented using these methods are easy to spot with probing questions:

  • "Why did you use such a specific random number seed?"
  • "Why did you choose this split ratio over other more common ratios?"
  • "Why did you remove this example from the test set and not that example?"
  • "Why didn't you report a performance distribution over repeated resampling of the data?"

All this highlights that the choices in an experimental method must be defensible! Especially those that deviate from widely adopted heuristics.

DO NOT DO THIS

This project is for educational purposes only!

If you use these methods on a project, you're unethical, a fraud, and your results are garbage.

Also, results/models will be fragile and will not generalize to new data in production or a surprise/hidden test set. You will be found out. A competent senior data scientist (or LLM?) will see what is up very quickly.

So why give examples?

I've never seen anything like this for machine learning and data science. Yet, most experienced practitioners know that they are a real thing.

Knowing what-to-look-for can help stakeholders, managers, teachers, paper reviews, etc.

Knowing what-not-to-do can help junior data scientists.

Also, thinking about and writing these examples feels naughty + fun :)

More

See the related ideas of magic numbers, researcher degrees of freedom, and forking paths problem.

If you like this project, you may be interested in Data Science Diagnostics.

If you have ideas for more examples, email me: [email protected] (you won't, that's okay)