Skip to content

Latest commit

 

History

History
209 lines (158 loc) · 7.92 KB

0096-cl-reddit.org

File metadata and controls

209 lines (158 loc) · 7.92 KB

cl-reddit

This is an API wrapper for Reddit.

Today I found an interesting thread from SpaceX, where it’s software developers answer the questions. I wondered if there was a discussion around Lisp in the Space?

But this post has about 8000 comments and I didn’t found a search in a single post’s comments on the Reddit. So, I decided to use cl-reddit to fetch all post comments and to search through them a “lisp” term.

Here is how you can connect to the Reddit and list of your subreddits:

POFTHEDAY> (defvar *user*
             (cl-reddit:api-login
              :username "svetlyak40wt"
              :password *password*))

;; This is how we can retrieve a list of my subreddits:

POFTHEDAY> (mapcar #'cl-reddit:subreddit-title
                   (cl-reddit::get-reddits-mine *user*))
("programming" "Lisp Advocates" "Lisp" "Scheme Programming Language articles"
 "M-x emacs-reddit" "Web Startups" "Common Lisp" "Filmmakers" "coding"
 "Github: social coding" "EarthPorn: Amazing images of light and landscape"
 "LISP ja" "Learn Lisp" "Startup Accelerators" "defunkydrummer" "" "" ""
 "Steel Bank Common Lisp"
 "(find-if (alexandria:conjoin #'funny-p #'about-lisp-p) *reddit*)")

But we want to run a search in the post comments, right?

Then we need to extract post id from this link:

https://www.reddit.com/r/spacex/comments/gxb7j1/we_are_the_spacex_software_team_ask_us_anything/

and to fetch comments tree using cl-reddit:

POFTHEDAY> (cl-reddit::get-comments "gxb7j1" *user*)

(#<CL-REDDIT:COMMENT {1002009693}> #<CL-REDDIT:COMMENT {10020098F3}>
 #<CL-REDDIT:COMMENT {1002016BF3}> #<CL-REDDIT:COMMENT {1002016CB3}>
 ...
 #<CL-REDDIT:COMMENT {1002016D73}> #<CL-REDDIT:MORE {1002016E33}>)

POFTHEDAY> (length *)
151

Well, we received 151 comments, but real comments count is about 7.7k. This is because other items either replies or CL-REDDIT:MORE objects which contain ids of the replies:

POFTHEDAY> (defparameter *comments*
              (cl-reddit::get-comments "gxb7j1" *user*))

POFTHEDAY> (rutils:last-elt *comments*)
#<CL-REDDIT:MORE {1002016E33}>

POFTHEDAY> (cl-reddit:more-children *)
("ft35r7m" "ft0rpxi" "ft0jn61" "ft34002" "ft4b6z4"
 ...
 "ft0pado" ...)

POFTHEDAY> (cl-reddit:comment-replies
               (first *comments*))
(#<CL-REDDIT:MORE {1003607093}>)

POFTHEDAY> (cl-reddit:comment-replies
               (second *comments*))
(#<CL-REDDIT:COMMENT {1007607293}>)

POFTHEDAY> 

We need to write a comment walker which cl-reddit does not provide.

This walker will expand all MORE items and collect comments into the flat list:

POFTHEDAY> (let ((post-id "gxb7j1"))
             (uiop:while-collecting (collect)
               (labels ((visit (item)
                          (etypecase item
                            (cl-reddit:comment
                             (collect item)
                             (mapc #'visit
                                   (cl-reddit:comment-replies
                                    item)))
                            (cl-reddit:more
                             (expand-more item))))
                        (expand-more (more)
                          (loop for id in (cl-reddit:more-children more)
                                for comments = (cl-reddit::get-comments
                                                post-id
                                                *user*
                                                :comment id)
                                do (mapc #'visit
                                         comments))))
                 (mapcar #'visit
                         (cl-reddit::get-comments post-id
                                                  *user*)))))

When I started this code the first time, it broke my Lisp. A quick investigation showed a resource leak. This library uses Drakma for data fetching and instead of receiving response bodies, request a stream.

Drakma’s documentation says, that you might need to close the stream. If you don’t - a connection to the server remains opene.

Seems, the library’s author never used it for making thousands of requests.

I’ve fixed this leak, but comments fetching still was too slow - about 2 comments per second. This is because MORE item contains comment ids and I have to fetch them individually one by one.

I tried to parallelize the fetching process using lparallel, reviewed two days ago:

POFTHEDAY> (defparameter *all-comments*
             (let ((post-id "gxb7j1")
                 (counter 0))
               (uiop:while-collecting (collect)
                 (labels ((visit (item)
                            (etypecase item
                              (cl-reddit:comment
                               (collect item)
                               (incf counter)
                               (when (zerop (mod counter 10))
                                 (log:info "~A comments collected"
                                           counter))
                               (mapc #'visit (cl-reddit:comment-replies item)))
                              (cl-reddit:more
                               (expand-more item))))
                          (expand-more (more)
                            (log:info "Expanding" more)
                            (loop with more-ids = (cl-reddit:more-children more)
                                  with replies = (lparallel:pmapcar
                                                  (lambda (id)
                                                    (cl-reddit::get-comments
                                                     post-id
                                                     *user*
                                                     :comment id))
                                                  more-ids)
                                  for comments in replies
                                  do (lparallel:pmapc #'visit comments))))
                   (lparallel:pmapc #'visit
                           (cl-reddit::get-comments post-id
                                                    *user*))))))

But encountered these strange errors:

The condition Socket error in "connect": EINTR (Interrupted system call) occurred with errno: 0.

Condition USOCKET:TIMEOUT-ERROR was signalled.
   [Condition of type USOCKET:TIMEOUT-ERROR]

Restarts:
 0: [TRANSFER-ERROR] Transfer this error to a dependent thread, if one exists.
 1: [KILL-ERRORS] Kill errors in workers (remove debugger instances).
 2: [ABORT] abort thread (#<THREAD "lparallel" RUNNING {10071B3DE3}>)

Backtrace:
 0: (USOCKET::HANDLE-CONDITION #<SB-EXT:TIMEOUT {10059E7C53}> #<USOCKET:STREAM-USOCKET {100871AB43}> "www.reddit.com")
 1: (SB-KERNEL::%SIGNAL #<SB-EXT:TIMEOUT {10059E7C53}>)
 2: (ERROR SB-EXT:TIMEOUT)
 3: (USOCKET:SOCKET-CONNECT "www.reddit.com" 80 :PROTOCOL :STREAM :ELEMENT-TYPE FLEXI-STREAMS:OCTET :TIMEOUT 20 :DEADLINE NIL :NODELAY :IF-SUPPORTED :LOCAL-HOST NIL :LOCAL-PORT NIL)
 4: (DRAKMA:HTTP-REQUEST #<PURI:URI http://www.reddit.com/comments/gxb7j1.json?comment=ft3odzl> :METHOD :GET :USER-AGENT "cl-reddit/0.2 (common lisp api wrapper)" :COOKIE-JAR #<DRAKMA:COOKIE-JAR (with 5 c..
 5: (CL-REDDIT::GET-JSON "http://www.reddit.com/comments/gxb7j1.json?comment=ft3odzl" #<CL-REDDIT:USER {10079B9973}>)
 6: (CL-REDDIT:GET-COMMENTS "gxb7j1" #<CL-REDDIT:USER {10079B9973}> :ARTICLE NIL :COMMENT "ft3odzl" :CONTEXT NIL :DEPTH NIL :LIMIT NIL :SORT NIL :THREADED NIL :SHOWMORE NIL)

Tried to switch to the Dexador instead of Drakma, but found another bug - it is not able to work in multiple threads:

fukamachi/dexador#88

So, we’ll never know if SpaceX developers are using Lisp in their space ships :)