-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Regular expressions #498
base: master
Are you sure you want to change the base?
Conversation
RegexReader transforms #"foo"iL into (re2 "foo" {:ignore_case :literal})
…xp is refered directly
Interesting work! One thing that (I believe) is problematic with this approach is the "freeing" of resources. You are never calling It seems that one idiomatic way of handling this in pixie would be to wrap regex (and options) instances in a type and implement IDisposable (a bit like here: Lines 9 to 14 in 189395f
The user still has to release resources manually but at least it's a bit more explicit/obvious with this protocol, and I guess down the road pixie could/might handle this internally once no ref to an instance remains (this is probably the idea behind IDisposable). |
One could also imagine we'd have a generic regex protocol, that cre2 would implement with that type I mention. That would unify a bit the interface for if/when people make other implementations (pcre2 etc). |
@mpenet you're absolutely right about freeing resources, it's one large problem i was hoping somebody would point me to a decent solution (which you did, thanks!). i imagine it should also be possible to somehow hook into GC and tell which function to use to free certain objects? protocol is a good idea as well, i wasn't focusing much on how to do everything properly. one problem with different regexp engines is how to deal with "late binding" of pixie.regex/regexp function that implements literal conversion into native object? if somebody refers to re2 and passes literal into namespace that refers pcre - that obviously won't work. |
Right, I didn't think about the literal, when you expand #"" you can just translate into a fn call that does the right thing, which would use the default active re engine (that can be set in a global var). and have that Something like this (dont mind the awful naming): (def ^:dynamic *default-re-engine* :cre2)
;; an "open" engine registry
(defmulti re-engine (fn [k s] k))
;; add cre2 to registry
(defmethod re-engine :cre2 [_ s]
(new cre2-re-engine s))
;; dispatch on the right engine constructor via registry
(defn re-pattern
([s] (re-pattern s *default-re-engine*))
([s kw] (re-engine kw s)))
(defprotocol IRegex
...
(re-find [r s])
(re-seq [r s]))
(deftype cre2-re-engine [...]
IRegex
...
IDisposable
... )) Just thinking out loud, there might be better approaches, but that way you can change impl using |
You're right IDispose doesn't hook into the GC, but IFinalize does (with All that to say, I think IFinalize should work just fine for regexes. On Thu, Feb 18, 2016 at 5:43 AM, Max Penet [email protected] wrote:
“One of the main causes of the fall of the Roman Empire was that–lacking |
@halgari awesome, I didn't know about |
- some renames - defprotocol IRegex - deftype CRE2Regex implementing IFinalize and IRegex
@mpenet @halgari thanks for insights, i've added deftype and defprotocol and multimethod dispatch to current (only) engine. i've also changed literal expansion into i think expected functionality is for literals to be available in any part of codebase but not sure how to achieve that properly. |
927c66f
to
ecd68b9
Compare
bit stuck with alloc'ing pointer to / array of cre2_string_t structs that will contain capture groups example usage in cre2 library: https://github.com/marcomaggi/cre2/blob/master/src/cre2.cpp#L222 how do i do that in pixie? right now i'm just doing this: 4c61558#diff-a532eb022eaed3bf076b7d519289551aR63 |
I'm not sure if there is a good way for allocating a pointer to an array in pixie at the moment. I don't think there is a nice way of creating an array for a given struct/type either. Perhaps we could start a list of FFI things to work on as a github issue? |
@thomasmulvaney i'm afraid my ffi-fu is not strong enough to properly describe what are we missing using the right language but i would like to put more effort into this pr if it has any chance of being merged. will appreciate any help with ffi things even if it is just issue description and maybe some links to get me started. |
@keymone I pulled down this branch last night. Nice work! To extract the strings from the matches we need an ffi function which when given a I also noticed that ffi-infer thinks the I have had a look at implementing these 2 functions. |
I've made some progress with handling charps: master...thomasmulvaney:charp Currently this seems to only work when run interactively. When compiled, |
the reduce implementation was creating some lazy lists
first of all, this is not merge-ready. this PR introduces regexp literal and builds cre2 C library that wraps google's re2. to get all of it working you need:
make re2_cre2
- this will build re2 and cre2 wrapper in externals/ln -s 'pwd'/externals/cre2/build/.libs/libcre2* /usr/local/lib/
- i have no idea what i'm doingpixie-vm -c pixie/regex.pxi
- this might require something likeexport DYLD_LIBRARY_PATH='pwd'/externals/cre2/build/.libs
, see previous pointafter this is done, fire up your repl and run
(ns user (:require [pixie.regex :refer [regexp match]]))
this PR introduces regexp literal like in clojure:
#"123"
- is translated in RegexReader into(regexp "123" #{})
#"123"iL
- is translated into(regexp "123" #{:ignore_case :literal})
most options available in
re2
are supportedthis is how far i got and i will certainly need help from somebody more knowledgable in building and using C libraries.