Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

(github.com)

112 points | by hauntsaninja 4 days ago

6 comments

hauntsaninja 4 days ago
git bisect works great for tracking down regressions, but relies on the bug presenting deterministically. But what if the bug is non-deterministic? Or worse, your behaviour was always non-deterministic, but something has changed, e.g. your tests went from somewhat flaky to very flaky.
In addition to the repo linked in the title, I also wrote up a little bit of the math behind it here: https://hauntsaninja.github.io/git_bayesect.html
[-]
- ajb 14 minutes ago
  Nice! I implemented a similar thing a while back: https://github.com/Ealdwulf/BBChop
  I'm going to have to check out how you got linear time with Shannon entropy, because I used Renyi entropy to do that, to make the algebra easier.
  It's also possible to do it over the DAG, rather than a linear history - although that makes the code a lot more complicated. Unfortunately there doesn't seem to be a linear time cumulative sum algorithm over dags, so it's super linear in some cases.
- Myrmornis 2 hours ago
  This is really cool! Is there an alternative way of thinking about it involving a hidden markov model, looking for a change in value of an unknown latent P(fail)? Or does your approach end up being similar to whatever the appropriate Bayesian approach to the HMM would be?
supermdguy 3 days ago
Okay this is really fun and mathematically satisfying. Could even be useful for tough bugs that are technically deterministic, but you might not have precise reproduction steps.
Does it support running a test multiple times to get a probability for a single commit instead of just pass/fail? I guess you’d also need to take into account the number of trials to update the Beta properly.
[-]
- hauntsaninja 3 days ago
  Yay, I had fun with it too!
  IIUC the way you'd do that right now is just repeatedly recording the individual observations on a single commit, which effectively gives it a probability + the number of trials to do the Beta update. I don't yet have a CLI entrypoint to record a batch observation of (probability, num_trials), but it would be easy to add one
  But ofc part of the magic is that git_bayesect's commit selection tells you how to be maximally sample efficient, so you'd only want to do a batch record if your test has high constant overhead
  [-]
  - __s 49 minutes ago
    recompiling can be high constant overhead
    [-]
    - ajb 6 minutes ago
      In theory, the algorithm could deal with that by choosing the commit at each step, which gives the best expected information gain; divided by expected test time. In most cases it would be more efficient just to cache the compiled output though.
SugarReflex 33 minutes ago
I hope this comment is not out of place, but I am wondering what the application for all this is? How can this help us or what does it teach us or help us prove? I am asking out of genuine curiosity as I barely understand it but I believe it has something to do with probability.
[-]
- Retr0id 14 minutes ago
  If you're git bisecting a flakey test, normally your only option is to run the test many times until you're ~certain it's either flakey or not flakey. If your test suite is slow, this can take a long time.
  One way to think about the tool presented is that it minimizes the number of times you'd need to run your test suite, to locate the bad commit.
- augusto-moura 26 minutes ago
  The writeup [1] linked on the README has examples and a better explanation
  [1]: https://hauntsaninja.github.io/git_bayesect.html
- curuinor 26 minutes ago
  Bayesian inference is, to be overly simple, a way to write probabilistic if-statements and fit them from data. The "if" statement in this case is "if the bug is there...", and of course it's often the case that in actual software that if statement is probabilistic in nature. This thing does git bisect with a flaky bug with bayesian inference handling the flakiness to get you a decent estimate of where the bug is in the git history. It seems to be usable, or at least as usable as a Show HN thingy is expected to be.
Retr0id 2 hours ago
Super cool!
A related situation I was in recently was where I was trying to bisect a perf regression, but the benchmarks themselves were quite noisy, making it hard to tell whether I was looking at a "good" vs "bad" commit without repeated trials (in practice I just did repeats).
I could pick a threshold and use bayesect as described, but that involves throwing away information. How hard would it be to generalize this to let me plug in a raw benchmark score at each step?
davidkunz 1 hour ago
Useful for tests with LLM interactions.
skrun_dev 16 minutes ago
[dead]