I am building a system to separate fraudulent transactions, these will be then manually verified, helping me in turn build a labelled dataset over time.

For now I have transaction data and customer behavior Information.

I intend to do this like this:The possible fraud cases include:- Abuse of all cashbacks and discounts ( Coupons / Vouchers / Auto Refund)- Retailing

- Acquiring sensitive SKUs

Since I don’t really have labelled data, I am going with the unsupervised learning approach (isolation forest).I plan on having 3 modules : Users, SKUs, Localities For the last 2 I am suffering with setting meaningful thresholds, I standardized slope of sales trend and intercept, then divided them to get a compound variable which I am using to compare. Please share thoughts and or Resources.

  • flamboyantkoala@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    This seems like it’ll be a really long journey without labeled data. Do you have an idea of how much fraud is in your sample set?