Hi Emacs community,

I’m an elisp noob, and I recently wrote a function to get the references on a wikipedia page. I plan on using it for org-mode/org-roam so I can do research faster (even though there’s probably already a package for that sort of thing). Unfortunately, it’s probably not as robust as I would like to think it is, as some of the dois/isbns appear to be missing in some wikipedia pages I’ve tested. Here it is for reference:

(defun get-wikipedia-references (subject)
  "Gets references for a wikipedia article"
  (let ((wikipedia-prefix-url "https://en.wikipedia.org/wiki/"))
    (with-current-buffer
	(url-retrieve-synchronously (concat wikipedia-prefix-url subject))
      (let* ((html-start (progn (goto-char (point-min))
				(re-search-forward "^$")))
	     (dom (libxml-parse-html-region (1+ (point)) (point-max)))
	     (result))
	(dolist (cite-tag (dom-by-tag dom 'cite) result)
	  (let ((cite-class (dom-attr cite-tag 'class)))
	    (cond ((string-search "journal" cite-class)
		   (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
		     (setq result (cons (cons (concat "doi:" (dom-text a-tag))
					      (let* ((cite-texts (dom-texts cite-tag))
						     (title-beg (1+ (string-search "\"" cite-texts)))
						     (title-end (string-search "\"" cite-texts (1+ title-beg))))
						(substring cite-texts title-beg title-end)
						))
					result))))
		  ((string-search "book" cite-class)
		   (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
		     (setq result (cons (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
					      (dom-text (dom-child-by-tag cite-tag 'i)))
					result))))
		  (t
		   (let ((a-tag (assoc 'a cite-tag)))
		     (setq result (cons (cons (dom-attr a-tag 'href) (dom-text a-tag)) result))))
		  ))
	  )))))

(get-wikipedia-references "Graph_traversal")
(("doi:10.1109/SFCS.1979.34" . "Random walks, universal traversal sequences, and the complexity of maze problems")
 ("doi:10.1016/j.tcs.2015.11.017" . "Lower and upper competitive bounds for online directed graph exploration")
 ("doi:10.1016/j.tcs.2020.06.007" . "Online graph exploration on a restricted graph class: Optimal solutions for tadpole graphs")
 ("doi:10.1587/transinf.E92.D.1620" . "The Online Graph Exploration Problem on Restricted Graphs")
 ("doi:10.1016/j.tcs.2021.04.003" . "An improved lower bound for competitive graph exploration")
 ("doi:10.1137/0206041" . "An Analysis of Several Heuristics for the Traveling Salesman Problem"))

And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things. I would appreciate any criticism from the Emacs community about my elisp!

  • github-alphapapaB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    My first suggestion would be to use plz for HTTP. Then I’d use cl-loop and pcase to simplify the rest of the code. Here’s a partial rewrite with a TODO for further exercise. :)

    (defun wikipedia-article-references (subject)
      (let* ((url (format "https://en.wikipedia.org/wiki/%s" (url-hexify-string subject)))
             (dom (plz 'get url :as #'libxml-parse-html-region)))
        (cl-loop for cite-tag in (dom-by-tag dom 'cite)
                 for cite-class = (dom-attr cite-tag 'class)
                 collect (pcase cite-class
                           ((rx "journal")
                            (let ((a-tag (dom-search cite-tag
                                                     (lambda (tag)
                                                       (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
                              (cons (concat "doi:" (dom-text a-tag))
                                    ;; TODO: Use `string-match' with `rx' and `match-string' here.
                                    (let* ((cite-texts (dom-texts cite-tag))
                                           (title-beg (1+ (string-search "\"" cite-texts)))
                                           (title-end (string-search "\"" cite-texts (1+ title-beg))))
                                      (substring cite-texts title-beg title-end)))))
                           ((rx "book")
                            (let ((a-tag (dom-search cite-tag
                                                     (lambda (tag)
                                                       (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
                              (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
                                    (dom-text (dom-child-by-tag cite-tag 'i)))))
                           (_ (let ((a-tag (assoc 'a cite-tag)))
                                (cons (dom-attr a-tag 'href) (dom-text a-tag))))))))
    

    Regarding this:

    And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things

    First of all, cl and seq are built-in to Emacs and are used in core Emacs code. There’s no reason not to use them. Second, dash and s are on ELPA and are widely used; it’s largely a matter of style, but they are solid libraries, so again, no reason not to use them. They don’t have cooties. ;)

    • ElfOfPiOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      I read a reddit post saying that using cl-lib was kind of a bad thing, and I think I’ve always had a fear that using libraries in my config would just make it more bloated/slow Emacs down. But after all the comments here, I think I’ll change my stance on that.