Yesterday I’ve reviewed a library to apply CSS3 selectors to the HTML nodes, produced by Plump. And this allowed making our Twitter crawler more concise.
lQuery will take it to the next stage because it makes possible to describe a whole HTML processing pipeline in a very declarative way:
POFTHEDAY> (lquery:$ (initialize (dex:get "https://twitter.com/shinmera"))
".tweet-text"
(render-text)
(map (alexandria:curry
#'str:shorten 40))
(lt 5))
#("Hi, I'm a #gamedev. My latest project..."
"Aw thank you, here's the whole story ..."
"雨林pic.twitter.com/BFwcd0AWSE"
"いらっしゃいませ~!pic.twitter.com/wwaWDD6B3Q"
"The logic of Splatoon.pic.twitter.com...")
Each “call” here is a special function which is applied either to a set
of HTML
nodes or to a single node in a set.
All lquery
functions are defined in the lquery-funcs
package and
documented here.
You can add your own functions for data processing, using define-lquery-function and define-lquery-list-function. This can be useful if some operation is frequent. For example, let’s write a function to make strings shorter!
First, we need to define lQuery function. It will process one node at a time:
POFTHEDAY> (lquery:define-lquery-function shorten (text max-length)
(check-type text string)
(check-type max-length (integer 0 65535))
(str:shorten max-length text))
LQUERY-FUNCS::SHORTEN
Now we can use it to make our web crawler even more beautiful!
POFTHEDAY> (lquery:$ (initialize (dex:get "https://twitter.com/shinmera"))
".tweet-text"
(render-text)
(shorten 40)
(lt 5))
#("Hi, I'm a #gamedev. My latest project..."
"Aw thank you, here's the whole story ..."
"雨林pic.twitter.com/BFwcd0AWSE"
"いらっしゃいませ~!pic.twitter.com/wwaWDD6B3Q"
"The logic of Splatoon.pic.twitter.com...")
There are other define-*
macros in lquery
as well. Read its
documentation to learn more about how to extend it. It would be nice if
@shinmera add more examples on how to extend lquery
!