Coverage report: /development/source/library/org/datagraph/spocq-shard/src/odbc/full-text-view.lisp
| Kind | Covered | All | % |
| expression | 11 | 935 | 1.2 |
| branch | 0 | 50 | 0.0 |
Key
Not instrumented
Conditionalized out
Executed
Not executed
Both branches taken
One branch taken
Neither branch taken
1
;;; -*- Mode: lisp; Syntax: ansi-common-lisp; Base: 10; Package: org.datagraph.spocq.implementation; -*-
3
(in-package :org.datagraph.spocq.implementation)
4
;; (load "/development/source/library/org/datagraph/spocq/src/store/pgsql/full-text-view.lisp")
7
(:documentation "Integrate simple and materialized full text index operators"
8
"The two text index forms rely on postgresql full text index support to extend materialized views.
9
The approach recognizes an active predicate (a magic property) as an operator in a BGP
10
and supports parameterized SPARQL view reuests.
11
The BGP variant accepts the pattern plus optional
12
constraints and yields matched results to the BGP evaluation process.
13
The SPARQL view variant accepts arguments for pattern parameters to constrain the solutions returned from the materialzed cache.
15
For a simple index, those results are the matched string and optionally the rank, normalized to
17
For a materialized index, the results combines the text results with the additional terms projected from the respective view.
19
The approach was chosen after considering the alternatives demonstrated by other implementations.
22
https://www.w3.org/2009/sparql/wiki/Feature:FullText
23
http://www.w3.org/TR/xpath-full-text-10/
25
The propose to ad a `contains`operator for filters
28
FILTER ( ?p IN (dct:title, foaf:name, rdfs:label)) .
29
FILTER ( ?o contains 'foo*' ) .
33
FILTER ( ?o contains ('foo*', constraint) ) .
35
they propose also to adopt xquery text matching syntax.
38
https://www.w3.org/TR/xpath-full-text-10/#id-grammar (@197 productions v/s 39 ?)
39
https://www.w3.org/TR/xpath-full-text-10/#id-xpath-grammar (@125 productions v/s 70 https://www.w3.org/TR/xpath20/#nt-bnf)
41
In order for this approach to improive on regular expression filter constraints, it would have to be compiled
42
to generate index results generation rather than scan a filter.
46
https://www.stardog.com/docs
47
https://docs.stardog.com/query-stardog/full-text-search
49
It supports both magic properties and a sub-query form.
50
For magix properties, they follow Jena
52
SELECT DISTINCT ?s ?score
55
(?l ?score) <tag:stardog:api:property:textMatch> ('mac' 100).
60
(term score) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern limit)
63
(term score) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern minScore limit)
65
The subquery/service clause form permits more arguments and results.
69
service fts:textMatch {
70
[] fts:query 'Mexico AND city' ;
81
https://jena.apache.org/documentation/query/text-query.html
83
says, 'the text index provides an inverted index that maps query string matches to subject URIs'
84
but, in fact, the relation is (subject x graph x string x language) plus an identifier argument (to distinguish predicates ?)
86
PREFIX ex: <http://www.example.org/resources#>
87
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
88
PREFIX text: <http://jena.apache.org/text#>
93
( ?s ?score ) text:query (rdfs:label 'printer') ;
96
This is not complete: the general intent is not to constrain just the text term,
97
but to index the entires statement with an graph context.
101
( ?s ?score ?lbl) text:query (rdfs:label 'printer') ;
104
The general syntax permits literals in the subject position as well
106
?s text:query 'word' # query
107
?s text:query ('word' 10) # with limit on results
108
?s text:query (rdfs:label 'word') # query specific property if multiple
109
?s text:query (rdfs:label 'protégé' 'lang:fr') # restrict search to French
110
(?s ?score) text:query 'word' # query capturing also the score
111
(?s ?score ?literal) text:query 'word' # ... and original literal value
112
(?s ?score ?literal ?g) text:query 'word' # ... and the graph
114
where the form is further qualified by its appearance in a graph clause.
115
(see https://jena.apache.org/documentation/query/text-query.html#queries-with-graphs)
118
dotnet : https://github.com/dotnetrdf/dotnetrdf/wiki/UserGuide-Full-Text-Querying-With-SPARQL
121
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
123
SELECT * WHERE { ?match pf:textMatch ( "text" 0.75) . }
125
marklogic : https://docs.marklogic.com/guide/semantics/semantic-searches#id_43103
129
FILTER cts:contains(?o, cts:or-query(("Monarch", "Sovereign")))
130
FILTER(?p IN (dc:description, rdfs:type))
133
oscar : https://github.com/opencitations/oscar
134
permits tailored interactive interfaces which generate, execute and display the results of sparql queries
137
neptune : https://docs.amazonaws.cn/en_us/neptune/latest/userguide/full-text-search-sparql-examples.html
138
follows the stardog pattern which expresses the freetext match as a service clause
140
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
141
PREFIX neptune-fts: <http://aws.amazon.com/neptune/vocab/v01/services/fts#>
143
SERVICE neptune-fts:search {
144
neptune-fts:config neptune-fts:endpoint 'http://your-es-endpoint.com' .
145
neptune-fts:config neptune-fts:queryType 'match' .
146
neptune-fts:config neptune-fts:field foaf:name .
147
neptune-fts:config neptune-fts:query 'michael' .
148
neptune-fts:config neptune-fts:return ?res .
152
virtuoso : http://docs.openlinksw.com/virtuoso/rdfsparqlrulefulltext/
153
relies on bif:contains to cause the respective object to serve as a match pattern
154
handles return properites with "OPTION" clause
156
PREFIX bif: <http://noidea.com/>
161
?o bif:contains 'NEW AND YORK'
162
OPTION (score ?sc) . # this form fails
168
graphdb : http://graphdb.ontotext.com/documentation/free/full-text-search.html
170
constructs lucene documents which comprise "molecules".
171
these include all contained "nodes" in a matched result.
175
- Provide the capabilitiy with as little configuration as possible
177
- Update it explicitly
178
- Delete it explicitly
179
- Any graph qualification is as a consequence of the statement pattern context only.
180
The text index is context-free.
181
- The service forms (fts:textMatch in StarDog and neptune-fts:search in Neptune) are
182
syntactic reformulation of subselects, for whichthe compiler constructs the equivalent
183
from a distinguoshed predicate.
184
- Avoid redundant elements inthe match expressions.
185
Either a term passes an argument or it bind a result value.
188
?s text:query (rdfs:label 'printer'); rdfs:label ?lbl .
190
The `rdfs:label` is a match constraint and a statement pattern constraint.
191
It constrains the index match, but only where the index comrpises statements.
192
- Formulations as filters customarily consume, which makes it improper to place variables
193
in the filter expression to bind the secondary results. The are more readliy included
195
- The object list can include constraints as positional arguments.
196
If their number increased and the precedence is not obvious, then a property list path
197
would be appropriate.
198
- The indicator should be a special predicate - any one of those known.
199
- on the right side the arguments should be
200
- the respective object term
201
- additional constraints; either a single argument for a limit or a property list for others
202
this would be enough to create a text index upon first reference.
203
to delete it, drop the graph named for that predicate, to clear it clear the graph
204
- on the left side is the return argument list
205
(jena includes also liternal and graph, but they are already present,
206
but the index must be partitioned by graph as well as predicate)
210
This yields the alternatives:
212
#### alternative 1.1 : magic property binding subject
214
?s a ex:Product ; rdfs:label ?lbl .
215
( ?s ?score ) text:query (rdfs:label 'printer' 'lang:x') .
220
?s a ex:Product ; rdfs:label ?lbl .
221
( ?s ?score ?lbl ) text:query (rdfs:label 'printer' 'lang:x') .
224
#### alternative 1.2 : magic property binding literal term
226
?s a ex:Product ; rdfs:label ?lbl .
227
(?lbl ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern limit)
229
#### alternative 2 : service clause (see above)
232
### alternative 3 : filter clause
233
?s a ex:Product ; rdfs:label ?lbl .
234
filter ( text:query(?lbl 'printer' 'lang:x')
235
# no where to put returned properties : ( ?s ?score )
238
- the service clause is unnecessarily complex,
239
- alternative 1.1 means the subject is redundantly indexed,
240
- alternative 3 cannot bind search result properties.
242
Which leaves alternative 1.2
244
The implmentation involves three steps
245
- During BGP macroexpansion, the indicator predicate causes the subject and object argument
246
list be be coallesced into a function invovation predicate term, to leave just that one stagement to
248
- Adopt <http://jena.hpl.hp.com/ARQ/property#textMatch> as the standard, but with the variant signature
249
rearranges the bgp to isolate the function calls.
252
(?lbl) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern)
253
?lbl <http://jena.hpl.hp.com/ARQ/property#textMatch> pattern
254
(?lbl) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern limit)
255
(?lbl ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern)
256
(?lbl ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch> (pattern limit)
259
- The compiled BGP invokes the match function with any in-line bindings.
260
- The match implementation initiates the match and passwes any results through in the same manner
261
as a scanned statement pattern.
264
If there is a score filter, it is interposed before any result is passsed onwards in the evaluation process.
265
See https://www.postgresql.org/docs/9.6/textsearch-controls.html for ts_rank for the score.
267
The materialized text index follows the pattern of materialized views, in that
268
projected values are recorded as term numbers, but, rather than indexing term numbers for
269
request parameters the indexes cover the respective string values.
270
Each involves an addditional string column, language column, and stem vector column.
272
In both cases, the cache content is refereshed completely.
273
In the absence of a mechanism to limit it to inserted and deleted statements and/or projected records,
274
the limiting operation is that which retrieves the string values for the respective terms.
275
Even if the update used no-update-on-conflict or read-before-write, that retrieval time would
281
(defclass internal-text-repository (view-database-repository internal-materialized-repository)
284
(defclass internal-text-view-repository (internal-text-repository)
287
:initform 'internal-text-view-repository-revision))
288
(:documentation "extend internal materialized repository to combine it with a specific view
289
and free-text indices.
290
This adds the dimensions for the sql query - both the projection attributes and the request parameters.
291
It changes the execution not to compile the query but instead to execute a select directly for
293
This is similar to an intermal matirlaiazed view, but adds the free-text index columns"))
295
(defclass internal-text-view-repository-revision (internal-materialized-repository-revision internal-text-view-repository)
298
(defclass internal-text-index-repository (internal-text-repository)
301
:initform 'internal-text-index-repository-revision))
302
(:documentation "extend internal materialized repository to combine it with a specific view
303
and free-text indices.
304
This adds the dimensions for the sql query - both the projection attributes and the request parameters.
305
It changes the execution not to compile the query but instead to execute a select directly for
307
This is similar to an intermal matirlaiazed view, but adds the free-text index columns"))
309
(defclass internal-text-index-repository-revision (internal-materialized-repository-revision internal-text-index-repository)
312
(pushnew (list 'internal-text-index-repository (cl-ppcre:create-scanner "[^/]+/[^_]__text__view"))
313
*repository-id-type-map*
315
;; this and internal view repositor are ambiguous
316
(pushnew (list 'internal-text-view-repository (cl-ppcre:create-scanner "[^/]+/[^_]__[^_]+__view"))
317
*repository-id-type-map*
320
;;; sudo -u postgres psql "${RAILS_ENV}"
321
;;; create schema views;
323
(defparameter *text-language-map*
346
"Specify the relation between postgresql lamguage configurations and rdf language tags.
347
Taken from \dF (as of v12) and
348
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry")
350
(defparameter *text-index-database* nil)
352
(defmacro with-text-index-database ((store location &rest args) repository &body body)
353
(let ((op (gensym "materialized-op-")))
354
`(flet ((,op (,store ,location)
355
(declare (ignorable ,store ,location))
357
(declare (dynamic-extent #',op))
358
(call-with-text-index-database #',op ,repository ,@args))))
360
(defgeneric text-index-suported-p (repository)
361
(:method ((repository-designator string))
362
(text-index-suported-p (repository repository-designator)))
363
(:method ((repository repository))
364
(flet ((test-version (result)
365
(destructuring-bind (version) result
366
(return-from text-index-suported-p
367
;; requires postgres with generated columns
368
(>= (parse-integer version) 120000)))))
369
(declare (dynamic-extent #'test-version))
371
(with-text-index-database (text-db location) repository
372
(clsql-sys::map-query-for-effect #'test-version "SELECT current_setting('server_version_num')" text-db :auto))
375
(defgeneric call-with-text-index-database (function repository &key verbose)
376
(:method ((function t) (repository internal-text-index-repository) &rest args)
377
(apply #'call-with-text-index-database function
378
(spocq::make-postgresql-uri :ID NIL
379
:USER *postgres-user*
380
:PASSWORD *postgres-password*
381
:AUTHORITY *postgres-authority*
383
:DATABASE *MYSQL-DATABASE* ;; same
384
:PARAMETERS `((:TABLE . ,(repository-view-table-name repository))
385
(:SCHEMA . "views")))
388
(:method ((function t) (database-designator spocq:postgresql-uri) &key (verbose *sql-verbose*))
389
(let ((schema (spocq:postgresql-uri-schema database-designator)))
390
(clsql:with-database (*text-index-database*
391
(list (spocq:postgresql-uri-authority database-designator)
392
(spocq:postgresql-uri-database database-designator)
393
(spocq:postgresql-uri-user database-designator)
394
(spocq:postgresql-uri-password database-designator))
395
:if-exists :new ;; :old
396
:database-type :postgresql-socket
398
(flet ((sql-operation ()
400
(clsql-sys:database-execute-command (format nil "SET search_path TO ~a" schema) *text-index-database*))
401
(funcall function *text-index-database* database-designator)))
403
(unwind-protect (progn (clsql:start-sql-recording :database *text-index-database*)
405
(clsql:stop-sql-recording :database *text-index-database*))
406
(sql-operation)))))))
408
(defmethod compute-repository-view-table-name ((repository internal-text-index-repository) account-name repository-name (view-name null))
409
"A text index repository defaults the view name to 'text'."
410
(format nil "~a_~a_text" account-name repository-name))
412
(defmethod compute-materialized-repository-view ((repository internal-text-index-repository))
413
(let ((view-name (materialized-repository-view-name repository)))
415
(repository-view (repository-source-repository repository) view-name))
416
(call-next-method))))
419
(defmethod map-repository-strings (operator repository (text-index spocq:postgresql-uri) pattern &rest args)
420
(with-text-index-database (text-db location) repository
421
(apply #'map-repository-strings operator text-db pattern args)))
423
(let ((unsupported-map (make-registry :test 'equalp :weakness :key)))
424
(defun iana-language-to-sql (language)
425
(string (cond (language (cond ((first (rassoc language *text-language-map* :test #'string-equal)))
426
((gethash language unsupported-map))
428
(setf (gethash language unsupported-map) "simple")
429
(log-warn "iana-language-to-sql: unsupported language: ~s" language)
434
(defun ensure-btree-gin (db)
435
(flet ((indicate-presence (result)
436
(declare (ignore result))
437
(return-from ensure-btree-gin t)))
438
(declare (dynamic-extent #'indicate-presence))
439
(clsql-sys::map-query-for-effect #'indicate-presence
440
"select * from pg_extension where extname='btree_gin';"
443
(clsql-sys:database-execute-command "CREATE EXTENSION btree_gin;" db)))
446
(defmethod ensure-materialized-view ((repository internal-text-index-repository) (view null))
447
"Create the cache database text index for the view repository.
448
A text index is just one index"
449
(let* ((view-db-designator (repository-view-database repository))
450
(table-namestring (repository-view-table-name repository))
451
(sql-table-name (clsql-sys::sql-expression :table table-namestring))
452
(index-name (concatenate 'string table-namestring "_index")))
453
(with-materialized-repository-database (text-db location) view-db-designator
454
(unless (clsql:table-exists-p sql-table-name :database text-db :owner :all)
455
(clsql:with-transaction (:database text-db)
456
(clsql:create-table sql-table-name
457
(loop for (name lisp sql)
458
in '(("identifier" integer "integer")
459
("string" string "varchar" :unique)
460
("language" string "regconfig")
461
("pattern" string "tsvector GENERATED ALWAYS AS (to_tsvector(language, string)) STORED")
463
collect (list (clsql-sys::sql-expression :attribute name)
465
;; :constraints "UNIQUE (string)"
468
(format nil "CREATE UNIQUE INDEX ~a_string ON ~a (string);"
469
index-name table-namestring)))
470
(clsql-sys:database-execute-command index-command text-db))
472
(format nil "CREATE INDEX ~a_pattern ON ~a USING GIN (pattern);"
473
index-name table-namestring)))
474
(clsql-sys:database-execute-command index-command text-db)))))))
476
(defmethod ensure-materialized-view ((repository internal-text-repository) (view view))
477
"Create the cache database text index for the view repository.
478
A text view creates one index for each text parameter.
479
This is the case also for the text index repository with a named view for the single parameter."
480
(let* ((view-db-designator (repository-view-database repository))
481
(table-namestring (string-downcase (qualified-view-name view)))
482
(view-parameter-combinations (view-parameters view))
483
(view-parameters (sort (loop for combination in view-parameter-combinations when (null (rest combination)) collect (first combination))
485
(projection-dimensions (view-dimensions view))
486
(pattern-definitions ())
488
(append (loop for parameter in view-parameters
489
do (push `(,(format nil "~a_pattern" parameter)
491
,(format nil "tsvector GENERATED ALWAYS AS (to_tsvector(~a_language, ~a_string)) STORED"
492
parameter parameter))
494
append `(;; do not duplicate the term number (,(format nil "~a" parameter) integer "integer")
495
(,(format nil "~a_string" parameter) string "varchar")
496
(,(format nil "~a_language" parameter) string "regconfig")
498
;; do not duplicate those projected dimensions which are also string keys
499
(loop for dimension in projection-dimensions
500
collect `(,(string dimension) integer "integer"))
501
;; they have to go at then end in order to the insert to succeed
502
;; otherwise it tries to set them, which raises a syntax error
503
pattern-definitions)))
504
(flet ((compute-index-name (table-name parameter)
505
(concatenate 'string table-name "_" (string-downcase parameter) "_index")))
506
(with-materialized-repository-database (text-db location) view-db-designator
507
(destructuring-bind (sql-table-name attributes) (rest (compute-materialized-view-definition repository view text-db))
508
(clsql:with-transaction (:database text-db)
509
(unless (when (clsql:table-exists-p sql-table-name :database text-db :owner :all)
510
(cond ((and (every #'string-equal
511
(mapcar #'(lambda (attr-def)
512
(CLSQL-SYS::UNESCAPED-DATABASE-IDENTIFIER (clsql:sql-name (first attr-def))))
514
(clsql:list-attributes sql-table-name :database text-db :owner :all))
515
(let ((indexes (clsql:list-indexes :on sql-table-name :database text-db :owner :all))
516
(parameter-indexes (loop for parameter in view-parameters
517
collect (compute-index-name table-namestring parameter))))
518
(null (set-exclusive-or indexes parameter-indexes :test #'string-equal)))))
520
(clsql:drop-table sql-table-name :database text-db :if-does-not-exist :ignore :owner :all)
522
(ensure-btree-gin text-db)
523
(clsql:create-table sql-table-name
524
(loop for (name lisp sql)
525
in column-definitions
526
collect (list (clsql-sys::sql-expression :attribute name)
529
;; construct single column indexes only
530
(loop for parameter in view-parameters
531
for index-name = (compute-index-name table-namestring parameter)
532
for index-command = (format nil "CREATE INDEX ~a_pattern ON ~a USING GIN (~a_pattern);"
533
index-name table-namestring parameter)
534
do (clsql-sys:database-execute-command index-command text-db)))))))
537
(defmethod update-materialized-view ((repository internal-text-repository) (view t) &rest args)
538
(declare (ignore args))
539
(ensure-materialized-view repository view)
540
(update-repository-text-view repository view))
542
(defgeneric update-repository-text-view (repository view)
543
(:documentation "clear the text index and regenerate it from the base repository.
544
if a 'text' view is present, execute that and use all strings.
545
otherwise scan the repository and use all string objects."))
548
(defmethod update-repository-text-view ((repository internal-text-index-repository) (view null))
549
"When no view is provided, use the free-text table for all strings in the repository"
550
(let* (;;(view-db-designator (repository-view-database repository))
551
(source-repository (repository-source-repository repository))
552
(table-name (repository-view-table-name repository))
553
;(truncate-command (format nil "truncate ~a;" table-name))
558
(term-numbers (make-hash-table :test #'eql))
559
(view (repository-view source-repository "text")))
560
(with-text-index-database (text-db location) repository
561
;;(with-materialized-repository-database (text-db location) view-db-designator
562
(rlmdb::with-string-database (sdb)
563
(clsql:with-transaction (:database text-db)
564
;; the table is defined with a unique string key, but to ignore conflicts.
565
;; do not truncate it.
566
;; this is actually slower by about 25% compared to truncating and rebuilding completely
567
;; (clsql-sys:database-execute-command truncate-command text-db)
568
;; (print (list :truncated rlmdb.i::*string-db-transaction*))
569
(labels ((collect-string-term (term-number)
570
;; deduplicate here across the entire set of string objects
571
;; delta update will require probes
572
(unless (gethash term-number term-numbers)
573
(setf (gethash term-number term-numbers) term-number)
574
(multiple-value-bind (string language) (rlmdb:term-string term-number)
575
;; iff the object is a string, add it to the index
577
;; explicit reading is much slower
578
;; (unless (read-repository-text-view repository string)
581
(let ((psql-lang (iana-language-to-sql language)))
582
(push (list term-number string psql-lang) values)
584
(insert-string-terms))))))
586
(insert-string-terms ()
587
(clsql:insert-values :into table-name :values values
589
:conflict "(string) do nothing")
592
(scan-statements (%quad)
594
(collect-string-term (%quad-object %quad)))
595
(collect-solutions (page)
596
(incf total-count (array-dimension page 0))
597
(loop for solution-index from 0 below (array-dimension page 0)
598
with arity = (array-dimension page 1)
599
do (loop for binding-index from 0 below arity
600
do (collect-string-term (aref page solution-index binding-index))))))
601
(declare (dynamic-extent #'scan-statements #'collect-solutions))
604
:repository-id (repository-id source-repository)
605
:solution-handler #'collect-solutions
606
:agent (system-agent))
607
(rlmdb:map-repository-statements #'scan-statements source-repository #(0 0 0 0)))
609
(insert-string-terms))))
610
(values insert-count total-count)))))
612
(defmethod update-repository-text-view ((repository internal-text-repository) (view string))
613
(update-repository-text-view repository
614
(authorized-repository-view
615
(repository-source-repository repository)
618
(defmethod update-repository-text-view ((repository internal-text-repository) (view view))
619
"When a view is provided, use the free-text table to materialize that view with
620
the indicated key dimension as free-text index columns.
621
This applies to both the text index and text view variants.
622
In the former case, there should be just one project dimension for the index."
623
(let* ((view-db-designator (repository-view-database repository))
624
(source-repository (repository-source-repository repository))
625
(table-name (string-downcase (qualified-view-name view)))
626
(truncate-command (format nil "truncate ~a;" table-name))
631
(operation (view-operation view))
632
(view-parameter-combinations (view-parameters view))
633
(view-parameters (sort (loop for combination in view-parameter-combinations when (null (rest combination)) collect (first combination))
635
(projection-dimensions (view-dimensions view))
636
(text-index-list (reverse (loop for dimension in view-parameters
637
for index = (position dimension projection-dimensions :test #'string=)
638
if index collect index
639
else do (spocq.e:request-error "index dimension not projected: ~a" dimension)))))
640
(unless (and (case operation
641
((spocq.a:|select| spocq.a:|distinct|) t)
644
(spocq.e:request-error "invalid text index view: ~a" (view-name view)))
645
(with-materialized-repository-database (text-db location) view-db-designator
646
(clsql:with-transaction (:database text-db)
647
(clsql-sys:database-execute-command truncate-command text-db)
648
(labels ((collect-solutions (page)
649
(incf total-count (array-dimension page 0))
650
(dotimes (solution-index (array-dimension page 0))
652
;; construct the record. first the term numbers, then the strings for theindices
653
(let ((record (loop for value-index below (array-dimension page 1)
654
collect (aref page solution-index value-index))))
655
(loop for text-index in text-index-list
656
for term-number = (aref page solution-index text-index)
658
do (multiple-value-bind (string language) (rlmdb:term-string term-number)
660
(let ((psql-lang (iana-language-to-sql language)))
661
(push psql-lang record)
662
(push string record)))
664
(push "simple" record)
666
;; treat unbound as a blank string without language to avoid problems with NULL
667
;; see https://www.postgresql.org/docs/9.1/textsearch-controls.html
668
else do (progn (push "simple" record)
670
(push record records)
676
(write-char #\. *trace-output*)
677
(finish-output *trace-output*)
678
(clsql:insert-values :into table-name :values records :database text-db)
679
(incf total-count count)
682
(declare (dynamic-extent #'collect-solutions))
683
(write-char #\> *trace-output*)
684
(run-sparql (view-query view)
685
:repository-id (repository-id source-repository)
686
:solution-handler #'collect-solutions
687
:agent (system-agent))
690
(write-char #\< *trace-output*)
691
(values insert-count total-count))))))
693
(defgeneric read-repository-text-view (repository &rest keys)
694
(:method ((repository internal-text-index-repository) &rest keys)
695
(let* (;(view-db-designator (repository-view-database repository))
696
(table-name (repository-view-table-name repository))
697
;(view (repository-view source-repository "text"))
698
(string-key (CLSQL-SYS:SQL-ESCAPE-QUOTES (first keys)))
700
; (ensure-materialized-view repository view)
702
(let ((query (format nil "SELECT identifier, string, language, pattern from ~a where string = '~a';" table-name string-key))
704
(flet ((collect-result (result)
705
(push result results)))
706
(clsql-sys::map-query-for-effect #'collect-result query *text-index-database* :auto))
708
(if *text-index-database*
710
(with-text-index-database (text-db location) repository
711
(clsql:with-transaction (:database text-db)
712
(rlmdb::with-string-database (sdb)
716
(defmethod compile-query-for-repository ((query query) (repository internal-text-view-repository))
717
"Use the query expression dimensions and its request arguments to control the sql retrieval"
718
(let* ((view (materialized-repository-view repository))
719
(view-name (view-name view))
720
(table-namestring (string-downcase (qualified-view-name view)))
721
(table-name (repository-view-table-name repository))
722
(arguments (query-dynamic-bindings query))
723
(query-dimensions (expression-dimensions (query-sse-expression query)))
724
(view-dimensions (view-dimensions view))
725
(projection-dimensions (sort (if query-dimensions
726
(intersection view-dimensions query-dimensions)
727
(copy-list view-dimensions))
729
(rank-dimensions (first arguments))
730
(rank-projections (loop for dimension in rank-dimensions
731
for dca = (string-downcase dimension)
732
for value in (rest arguments)
733
collect (format nil "ts_rank(~a, to_tsquery('~a_string'))"
735
(text-constraints (loop for dimension in (first arguments)
736
for dca = (string-downcase dimension)
737
for value in (rest arguments)
738
collect (format nil "~a @@ to_tsquery('~a')"
739
dca (CLSQL-SYS:SQL-ESCAPE-QUOTES value))))
740
(offset (shiftf *response-offset* 0)) ;; these should apply just when this is the outermost query
741
(limit (shiftf *response-limit* nil))
742
(sql-text (format nil ;;"SELECT ~{~a, ~} ~{~a~^, ~} from ~a WHERE ~{~a~^AND ~}~@[LIMIT ~a ~]~@[OFFSET ~a ~];"
743
"SELECT ~{~a~^, ~} from ~a WHERE ~{~a~^AND ~}~@[ LIMIT ~a~]~@[ OFFSET ~a~];"
744
projection-dimensions
750
;; wrap the sql query text to be performed eventually by process-materialized-view
751
(view-clause `(spocq.a:|view| (materialized-repository-view (repository ,(repository-id repository)))
752
,sql-text ,(append projection-dimensions rank-dimensions)))
753
(pattern (view-construct-pattern view))
755
;; the initialization function generates just the sql select
756
;; as per the view attributes and the query bindings
758
(with-task-environment (:task query :normal-disposition :continue)
760
`(spocq.a:|construct| ,view-clause ,pattern)
764
(with-task-environment (:task query :normal-disposition :continue)
766
(log-debug "compile-query view ~a: ~s (~s)" (task-id query) view-name table-namestring)
767
(setf (query-patterns query) nil)
768
(setf (task-operator-count query) 1)
769
;; compile the query form and collect the agp instances as a side-effect
770
(let ((initialization-function
771
(spocq-compile lambda-expression)))
772
(setf (task-initialization-function query) initialization-function)
773
(generate-accounting-note :compile :task query)
774
(values initialization-function
775
lambda-expression))))))
777
(defmethod process-materialized-view ((solution-destination array-page-channel) (repository internal-text-view-repository) sql-query)
778
(let* ((view-db-designator (repository-view-database repository))
779
(projection-dimensions (view-dimensions (materialized-repository-view repository)))
780
(solution-dimensions (channel-dimensions solution-destination))
781
;; the rank values do not appear in the quer
782
(term-number-positions (loop for dimension in solution-dimensions
783
collect (position dimension projection-dimensions)))
784
(repository-id (repository-id repository))
785
(revision-id (repository-revision-id *task*))
786
(transaction *transaction*)
787
(channel-dimensions (channel-dimensions solution-destination))
789
(result-page-length (channel-page-length solution-destination))
790
(result-page-width (channel-page-width solution-destination))
791
(result-index result-page-length)
793
(trace-bgp view.match.start repository-id revision-id transaction channel-dimensions)
795
(labels ((collect-solution (solution)
796
(trace-bgp view.collect-solution solution)
797
(next-solution-location)
798
(locally (declare (type (simple-array fixnum (* *)) result-page)
799
(type fixnum result-index)
801
(optimize (SPEED 3) (SAFETY 0)))
802
(loop for value in solution
803
for column-index from 0
804
for is-term-number in term-number-positions
805
for term-number = (if is-term-number value (rlmdb:value-term-number value))
806
do (setf (aref result-page result-index column-index) term-number))))
807
(next-solution-location ()
808
;; return a page (possible newly created) and the next free location in that page
810
(when (>= (incf result-index) result-page-length)
811
(when result-page (put-result result-page))
812
(setf result-page (new-field-page solution-destination result-page-length result-page-width)
814
(values result-page result-index))
815
(complete-solutions ()
816
(trace-bgp view.match.complete-solutions result-count)
817
(incf *match-requests* 1)
818
(incf *match-responses* result-count)
819
(log-debug "view count: ~s: solutions: ~s" repository-id result-count)
821
(let ((page-result-count (1+ result-index)))
822
(when (< page-result-count result-page-length)
824
(adjust-page result-page (list page-result-count result-page-width)))))
825
(put-result result-page))
826
(complete-field solution-destination)
827
(incf-stat *solutions-constructed* result-count)
828
(return-from :view-scan result-count))
830
(trace-bgp view.put solution-destination channel-dimensions page)
831
(put-field-page solution-destination page)
832
(unless (task-active-p *query*)
833
(complete-field solution-destination)
834
(return-from :view-scan result-count))))
835
(declare (dynamic-extent #'collect-solution))
836
(with-materialized-repository-database (view-db location) view-db-designator
837
(clsql-sys::map-query-for-effect #'collect-solution
841
(complete-solutions)))))
844
;;; two options exist for retrieval from a text index
845
;;; the first option recognizes arc:textMatch as the predicate a statement pattern,
846
;;; where that is present the bindings are collected and unified with the results
847
;;; the second option treats the cache as a query target - federation or autonomous.
848
;;; if uses the text values to yield all matching results.
849
;;; the solution field is extended with additional dimensions for the quality results.
851
(defgeneric map-text-index-strings (operator view-repository text-db pattern &key
852
language offset limit)
853
(:method (operator (repository internal-text-index-repository) (text-db clsql:database)
854
;; in this form for symmetry, uses just
855
;; - the object for the text pattern
856
;; - graphs either as a constraint or returned
858
&key language offset limit)
859
"Map over the string values which match the given pattern
860
Yield the string snad the respective rank"
861
(when (and (or (null offset)
862
(and (integerp offset) (>= offset 0)))
864
(and (integerp limit) (>= limit 1))))
865
(let* ((sql-pattern pattern)
866
(table-name (repository-view-table-name repository))
867
(psql-lang (iana-language-to-sql language))
869
(sql-query (format nil "SELECT identifier, ts_rank(pattern, to_tsquery('~a', '~a')) from ~a WHERE pattern @@ to_tsquery('~a', '~a')~@[LIMIT ~a ~]~@[OFFSET ~a ~];"
870
psql-lang sql-pattern table-name psql-lang sql-pattern limit offset))
872
;; if language is present, then need a string dictionary
874
(ensure-materialized-view repository (materialized-repository-view repository))
875
(flet ((collect-solution (solution)
877
(destructuring-bind (term-number rank) solution
878
;;;!!! allow logic hier to apply a rank threshold constraint.
879
(trace-bgp map-repository-strings.collect-solution term-number rank)
880
(funcall operator term-number rank))))
881
(clsql-sys::map-query-for-effect #'collect-solution
887
(defgeneric repository-text-view-repository (repository)
888
(:method ((repository repository))
889
(let ((id (concatenate 'string (repository-id repository) "__text__view")))
890
(repository id :if-does-not-exist nil))))
892
(defgeneric |http://jena.hpl.hp.com/ARQ/property#|:|textMatch| (operator transaction pattern &optional language offset limit &rest extra-args)
893
(:method (operator (transaction transaction) pattern &optional language offset limit &rest extra-args)
894
(let ((repository (transaction-repository transaction)))
895
(apply #'|http://jena.hpl.hp.com/ARQ/property#|:|textMatch| operator repository pattern language offset limit extra-args)))
897
(:method (operator (repository lmdb-repository) pattern &optional language offset limit &rest extra-args)
898
(let ((view-repository (repository-text-view-repository repository)))
899
(when view-repository
900
(apply #'|http://jena.hpl.hp.com/ARQ/property#|:|textMatch| operator view-repository pattern language offset limit extra-args))))
902
(:method (operator (repository internal-text-index-repository) pattern &optional language offset limit &rest extra-args)
903
(declare (ignore extra-args))
904
;; the operation requires the transaction to intern the ranks
905
(flet ((collect-result (term-number rank-value)
906
(let* ((normalized-rank-value (round (* rank-value 100)))
907
(rank-term-number (rlmdb:value-term-number normalized-rank-value)))
908
;;(print (list :textmatch term-number rank-value rank-term-number))
909
(funcall operator term-number rank-term-number)))
910
(coerce-to-string (data)
913
(integer (term-number-object data))
915
(declare (dynamic-extent #'collect-result))
916
(with-text-index-database (text-db location) repository
917
(map-text-index-strings #'collect-result repository text-db (coerce-to-string pattern)
918
:language (iana-language-to-sql (coerce-to-string language))
921
(:generic-function-class function-verb-function))
924
(defmethod compile-query-for-repository ((query query) (repository internal-text-index-repository))
925
"Generate a sql query which applies the request argument to the basic index"
926
(let* ((table-name (repository-view-table-name repository))
927
(arguments (query-dynamic-bindings query))
928
(argument-values (rest arguments))
929
(projection-dimensions '(?::|string| ?::|rank|))
930
(offset (shiftf *response-offset* 0)) ;; these should apply just when this is the outermost query
931
(limit (shiftf *response-limit* nil))
932
(sql-text (format nil "SELECT identifier, ts_rank(pattern, to_tsquery('~a')) from ~a WHERE pattern @@ to_tsquery('~a')~@[LIMIT ~a ~]~@[OFFSET ~a ~];"
933
(first argument-values)
935
(first argument-values)
938
;; wrap the sql query text to be performed eventually by process-materialized-view
939
(view-clause `(spocq.a:|view| (materialized-repository-view (repository ,(repository-id repository)))
940
,sql-text ,projection-dimensions))
942
;; the initialization function generates just the sql select
943
;; as per the view attributes and the query bindings
945
(with-task-environment (:task query :normal-disposition :continue)
948
(with-task-environment (:task query :normal-disposition :continue)
950
(log-debug "compile-query text index ~a: (~s)" (task-id query) table-name)
951
(setf (query-patterns query) nil)
952
(setf (task-operator-count query) 1)
953
;; compile the query form and collect the agp instances as a side-effect
954
(let ((initialization-function
955
(spocq-compile lambda-expression)))
956
(setf (task-initialization-function query) initialization-function)
957
(generate-accounting-note :compile :task query)
958
(values initialization-function
959
lambda-expression))))))
964
(let ((repo (repository "james/test")))
965
(with-text-index-database (text-db location) repo
966
(ensure-repository-text-index repo text-db)))
968
(let ((repo (repository "james/testnew")))
969
(with-text-index-database (text-db location) repo
970
(ensure-repository-text-index repo text-db))))
971
(let ((repo (repository "james/test")))
972
(with-text-index-database (text-db location) repo
973
(update-repository-text-index repo text-db)))
975
(let ((q (parse-sparql "SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer') . }" )))
976
(macroexpand-bgp-phase :functions (rest (second q)) nil))
979
'("SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer') . }"
980
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer') . }"
981
"SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN') . }"
982
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN') . }"
983
"SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0) . }"
984
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0) . }"
985
"SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0 10) . }"
986
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0 10) . }")
987
collect (macroexpand-bgp-phase :functions (rest (second (parse-sparql q))) nil))
991
'("SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer') . }"
992
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer') . }"
993
"SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN') . }"
994
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN') . }"
995
"SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0) . }"
996
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0) . }"
997
"SELECT ?lbl WHERE { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0 10) . }"
998
"SELECT ?lbl WHERE { ( ?lbl ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('printer' 'EN' 0 10) . }")
999
collect (macroexpand-bgp-phase :functions (rest (second (parse-sparql q))) nil))
1000
do (print (bgp-projected-dimensions (rest bgp))))
1004
SELECT ?lbl WHERE { graph ?gg { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('bob') . } }"
1005
:repository-id "james/test")
1008
SELECT ?lbl WHERE { graph ?gg { ( ?lbl ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ('bo:*') . } }"
1009
:repository-id "james/test")
1014
( ;; from lucene index sketch
1015
(defmethod translate-index-expression ((index lucene-index) (expression list) (variable-term-map list))
1016
(apply #'rewrite-to-lucene (sublis variable-term-map expression)))
1019
(defgeneric rewrite-to-lucene (operator &rest args)
1020
(:method ((op (eql 'spocq.a:|strends|)) &rest args)
1021
(destructuring-bind (predicate value) args
1022
(format nil "~a:\"*~a\"" (lucene-field-name predicate) value)))
1024
(:method ((op (eql 'spocq.a:|strstarts|)) &rest args)
1025
(destructuring-bind (predicate value) args
1026
(format nil "~a:\"~a*\"" (lucene-field-name predicate) value)))
1028
(:method ((op (eql 'spocq.a:|\|\||)) &rest args)
1029
(destructuring-bind (expr1 expr2) args
1030
(format nil "~a OR ~a" (apply #'rewrite-to-lucene expr1) (apply #'rewrite-to-lucene expr2))))
1032
(:method ((op (eql 'spocq.a:|&&|)) &rest args)
1033
(destructuring-bind (expr1 expr2) args
1034
(flet ((filter-relation-components (expr)
1035
(let ((op (first expr)))
1036
(when (member op '(spocq.a:< spocq.a:> spocq.a:<= spocq.a:>=))
1037
(cond ((iri-p (second expr))
1038
(values op (second expr) (third expr)))
1039
((iri-p (third expr))
1040
(values (rest (assoc op '((spocq.a:< . spocq.a:>) (spocq.a:> . spocq.a:<)
1041
(spocq.a:<= . spocq.a:>=) (spocq.a:>= . spocq.a:<=))))
1042
(third expr) (second expr)))
1045
(multiple-value-bind (op1 term1 bounds1) (filter-relation-components expr1)
1046
(multiple-value-bind (op2 term2 bounds2) (filter-relation-components expr2)
1047
(cond ((and (equalp term1 term2)
1048
(= (length (set-difference '(spocq.a:< spocq.a:> spocq.a:<= spocq.a:>=) (list op1 op2))) 2))
1050
((spocq.a:|<| spocq.a:|<=|)
1052
((spocq.a:|>| spocq.a:|>=|)
1053
(format nil "~a:~:[(~;[~]~s TO ~s~:[)~;]~]"
1054
(lucene-field-name term1)
1055
(eq op2 'spocq.a:|>=|)
1057
(eq op1 'spocq.a:|<=|)))
1059
(lucene-invalid-expression '&& expr1 expr2))))
1060
((spocq.a:|>| spocq.a:|>=|)
1062
((spocq.a:|<| spocq.a:|<=|)
1063
(format nil "~a:~:[(~;[~]~s TO ~s~:[)~;]~]"
1064
(lucene-field-name term1)
1065
(eq op1 'spocq.a:|>=|)
1067
(eq op2 'spocq.a:|<=|)))
1069
(lucene-invalid-expression '&& expr1 expr2))))
1071
(lucene-invalid-expression '&& expr1 expr2))))
1073
(format nil "( ~a AND ~a )" (apply #'rewrite-to-lucene expr1) (apply #'rewrite-to-lucene expr2)))))))))
1075
(:method ((op (eql 'spocq.a:|!|)) &rest args)
1076
(format nil "NOT (~a)" (apply #'rewrite-to-lucene (first args))))
1078
(:method ((op (eql 'spocq.a:|=|)) &rest args)
1079
(destructuring-bind (expr1 expr2) args
1080
(cond ((iri-p expr1)
1082
(lucene-invalid-expression '= expr1 expr2))
1083
(format nil "~a:~s" (lucene-field-name expr1) expr2))
1086
(lucene-invalid-expression '= expr1 expr2))
1087
(format nil "~a:~s" (lucene-field-name expr2) expr1))
1089
(lucene-invalid-expression '= expr1 expr2)))))
1091
(:method ((op (eql 'spocq.a:|!=|)) &rest args)
1092
(format nil "NOT (~a)" (apply #'rewrite-to-lucene 'spocq.a:|=| args)))
1094
(:method ((op (eql 'spocq.a:|<|)) &rest args)
1095
(destructuring-bind (expr1 expr2) args
1096
(cond ((iri-p expr1)
1098
(lucene-invalid-expression '< expr1 expr2))
1099
(format nil "~a:[* TO ~s)" (lucene-field-name expr1) expr2))
1102
(lucene-invalid-expression '< expr1 expr2))
1103
(format nil "~a:(~s TO *]" (lucene-field-name expr2) expr1))
1105
(lucene-invalid-expression '< expr1 expr2)))))
1107
(:method ((op (eql 'spocq.a:|<=|)) &rest args)
1108
(destructuring-bind (expr1 expr2) args
1109
(cond ((iri-p expr1)
1111
(lucene-invalid-expression '< expr1 expr2))
1112
(format nil "~a:[* TO ~s]" (lucene-field-name expr1) expr2))
1115
(lucene-invalid-expression '< expr1 expr2))
1116
(format nil "~a:[~s TO *]" (lucene-field-name expr2) expr1))
1118
(lucene-invalid-expression '< expr1 expr2)))))
1120
(:method ((op (eql 'spocq.a:|>|)) &rest args)
1121
(destructuring-bind (expr1 expr2) args
1122
(rewrite-to-lucene 'spocq.a:|<| expr2 expr1)))
1124
(:method ((op (eql 'spocq.a:|>=|)) &rest args)
1125
(destructuring-bind (expr1 expr2) args
1126
(rewrite-to-lucene 'spocq.a:|<=| expr2 expr1))))