The benchmark with 1B rows in this blogpost seems irrelevant for comparing performance of different programming languages.

It seems like the execution time of a program would be dominated by loading data from the file. And a lot of people posted solution with specs of cpu but not specs of disk (hdd, ssd, raid) although that seems more relevant.

Why would they compare languages and solutions in this way?

  • aubeynarf@lemmynsfw.com
    link
    fedilink
    arrow-up
    13
    ·
    9 months ago

    For most organizations, the cost of paying programmers far exceeds the cost of CPU time; benchmarks really should include how long the solution took to envision/implement and how many follow up commits were required to tune it.

    • aluminium@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      9 months ago

      Also big “enterprise” Software usually becomes slow due to fundamental issues or issues in the architecture.

      For example I worked on maintaining an old Java EE project and people there constantly made multiple sequencial HTTP requests despite the requests not being dependent on one another.

  • killeronthecorner@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    9 months ago

    To answer your question about environmental and hardware factors - from the repo:

    Results are determined by running the program on a Hetzner Cloud CCX33 instance (8 dedicated vCPU, 32 GB RAM). The time program is used for measuring execution times, i.e. end-to-end times are measured.

  • orhtej2@eviltoast.org
    link
    fedilink
    English
    arrow-up
    4
    ·
    9 months ago

    I would assume they want to factor in startup time as well as IO handling overhead - raw disk IO should be the same given programs are run in the same environment.