• 5 Posts
  • 24 Comments
Joined 1 year ago
cake
Cake day: November 13th, 2023

help-circle















  • > Multiple passes at lower learning rates isn’t supposed to produce different results.

    Oh, I was wrong on this, then, my bad.

    So would my interpretation be correct that this is essentially causing the overfitting to still happen, just significantly slower, and that a higher LR would work? The problem is at first the average loss tanked in the span of like a single epoch to near zero which overfit, but this LR didn’t have the same effect.







  • I am of the opinion that security through obscurity (of model weights) does not work.

    The capabilities of these models would have to be consistently powerful beyond what the current state of the art is, and not just consistently, but by orders of magnitude to carry out the threats that have been proposed as pseudo-realistic risk.

    Using your own compute instead of scraped GPT API keys when open models are at a state where their generalized performance is not directly comparable greatly diminishes the threat of bad actor risks. I’d maybe start to sweat if GPT4 was getting better instead of worse every time they do a rollout.

    This is also another alignment paper that cites theoretical examples of biochemical terrorism. We live in a post-internet era where that type of information has already landed in the hands of the people who would be the most capable of carrying it out, but the post-internet era has consequentially also made those kinds of attacks much more difficult to carry out.

    As the number of routes for possible attack vectors increases, the number of ways for that attack to be circumvented also increases.