GBNF is super powerful, and anyone developing software with locals LLMs should learn about how to use it. As part of my larger open source project, Swiss Army Llama, I recently made a couple very handy tools for working with GBNF grammars. You can supply either an example JSON or a Pydantic data model, and it will automatically generate the complete GBNF grammar for you reflecting the same fields. It even supports some degree of nested fields. And there is another tool for taking a complete GBNF grammar specification and validating it. You can see how I implemented these particular tools here:
https://github.com/Dicklesworthstone/swiss_army_llama/blob/main/grammar_builder.py
Or if you just want to use the tools, you can install my project:
https://github.com/Dicklesworthstone/swiss_army_llama/tree/main
And just find the relevant endpoints in the Swagger page, which makes it super easy to try them out.
No, it’s using Whisper for the transcripts and whisper doesn’t currently support speaker diarisation. But even if I tried to include that using one of the projects that claims to have it, you still need to manually label which speaker is which. Since my goal was to get something totally automated here that could just grind through a huge playlist, it didn’t seem worth it.