Add usage examples #14

ustun · 2014-10-16T11:15:16Z

Once the html is parsed, how can most efficiently query the parsed document? That is, I would want to be able to drill down as if it were a map:

(get-in x [:html :head :title])

It would be great if you added some recommendations how to do that transformation (for example https://github.com/cjohansen/hiccup-find looks promising).

The text was updated successfully, but these errors were encountered:

collinalexbell · 2015-03-04T16:09:03Z

Ditto this. As a clojure noob, this vector thing confuses the hell out of me

nathell · 2015-03-04T16:44:58Z

Thanks for chiming in!

A quick-and-dirty solution could be something along the lines of (untested, might be buggy):

(defn get-in-html [tree [tag & tags]]
  (if tag
    (when tree
      (recur (first (filter #(= (first %) tag) (rest tree))) tags))
    tree))

Note that you'd want to call it as (get-in x [:head :title]), bypassing the :html.

This is very simplistic and only supports seqs of tags. If you want to extract arbitrary subtrees, you may want to take a look at Enlive. (When I have free time, I intend to explore the possibility of integrating clj-tagsoup and Enlive, as I feel that both projects might benefit from this.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add usage examples #14

Add usage examples #14

ustun commented Oct 16, 2014

collinalexbell commented Mar 4, 2015

nathell commented Mar 4, 2015

Add usage examples #14

Add usage examples #14

Comments

ustun commented Oct 16, 2014

collinalexbell commented Mar 4, 2015

nathell commented Mar 4, 2015