I need a bit more info from people who installed Llama2 locally and using it to support web apps, or just local information.

  • What is the ideal hardware for the 65b version?
  • How many tokens can this hardware process per second, input, and output?
  • Regarding safety, since it is used for business, what is the change that this model will end up arguing with the customer 😊 ?