Quick Search, Advanced Search: "OR" and (explicit) "AND" not working

kinook · #1 04-08-2024, 10:30 PM

Regarding the original topic about AND and OR searches not working, I did find a bug. There is logic to support case-insensitive Unicode searching (see SearchNonAsciiTextNonCaseSensitive), which is enabled by default, and this logic lowercases the indexed text and search terms that are entered, and it was incorrectly lowercasing AND and OR in the search phrase, which prevented the FTS search engine from treating these as AND and OR search clauses. This has been fixed in the latest download (v6.3.0.14). A workaround is to disable that registry setting if you don't need case insensitive Unicode search capability.

Regarding the other topics discussed here, can you boil this down to plain English? I am not quite following what you wrote.

Note: Updated in v6.3.0.15 to also handle NOT and NEAR commands.

Spliff · #2 04-09-2024, 06:19 AM

A)

I hope my previous post is understandable, though, when detailing the problems which arise with navigation in level hierarchy (equivalent to the file system's folder hierarchy, and in which the user doesn't encounter those problems, so that in order to do RELIABLE navigation (which is even more important for moving / copying items), a flattened (or you might call it virtual) hierarchy is preferable.

Btw and inadvertently, I had left out the main argument for that flattened hierarchy (construct b instead of construct a):

Construct a)

'a (all these expanded currently, or even just to be expanded by navi-macro)
__.a
____be it some item here
__.b (the .marker assures you get here, instead of the "b" item before)*
__.d (they can be grouped by subject, no abc-sorting needed)
__etc
''b
__.a
__.c

Now, the user wants to "go" to 'a .d:
> (s)he will enter ad, the macro does rest, and (s)he gets to the .b "in" 'a
> perfect

(And you wanted to go to the "b..." item, you entered aab instead, the macro navigating to 'a..., then to .a..., then to b...)

But the user is mistaken, wants to go to 'a... .d... in fact, but thinks that .d... starts with c ((s)he will have named the .d... by some synonym .c... but fails to remember):
> enters ac, expects to get to 'a .c (but that doesn't exist)
> boom: will arrive at 'b .c instead: Not too much harm for simple navigation, but really awful for moves! (Given the fact that all such navigation is done by character entry within the Data Explorer.)

Now the same situations in construct b):
.a
.aa
__be it some item here
.ab
.ad
.aetc
.b
.ba
.bc

You enter a, the macro goes to .a; you enter ab, the macro goes to .ab; you enter aab, the macro goes to .aa, then to the b-item (child of .aa).

But now, if the user is mistaken, thinks .ad is called .ac, (s)he enters ac or acsomething in order to get even deeper into .ac - and (s)he will NOT "land" somewhere in .b but in .a (since .ac doesn't exist): You "land" (or move something as last item into) the parent range, where you (or the moved item(s) will be much more "at home" than you, or they, will be somewhere within .b.

In the file system, that's no problem, since any path element's (=intermediate folder's) existence will be checked, and there is does not exist, the macro stops, and goes to (or puts into) the nearest parent folder (which does exist): These verifications cost just some ms in the file system, but would cost a multiple of that within a db (here UR), since then they would imply the (even repeated, in case!) use of the clipboard (or repeated sql accesses from the outside, and clipboard access to then process the results: a nightmare for the user!)

And that's why in a db where the user can't access the internals but from the outside - respective internal code would be much more elegant indeed -, construct b is much more preferable: quick navi / moves much less prone to annoying mistakes.

B)

As for the awful placement of the (very valuable otherwise!) AS Grid "Exclude" column, there is sub-key in Ultra Recall/Settings/SearchGrid, (i.e. not in the sub-key "Options"), "Order", with default value 0,1,2,3,4,5, so I have now tried to change that order to what I had intended, by changing the sub-key's value to 5,0,1,2,3,4, then rebooting, but that change does NOT work as hoped by me, the Exclude column remains last one in the grid, and the value has automatically (!) be reset to 0,1,2,3,4,5 - expected behavior? (Or should that have worked as I had hoped?) Would it some negligeable effort to put the Exclude column as first instead? (see above)

C)

Kyle, I'm very pleased you're looking into these problems, and it seems they are even more profound than you have already discovered, there will be some more glitches to be culled! ;-)

From what you have already found, it seems to me that my OR/AND problem started with 6.2 then (but without me making the connection since at the time, I hadn't already been in so fierce need for OR (and, to a lesser degree, the rest: ?/[])).

I had quickly reverted, at the time, to SearchNonAsciiTextNonCaseSensitive (SNA)=0, i.e. to the previous default, in order for the c1content column in table ftsitem_content (=also needed for plain text export, cf. our conversation in this forum at the time) bearing regular text).

I have now re-installed UR, v14 (i.e. replace v13), by just "overwriting" the previous installation (v13).

I have then checked the SNA sub-key, in order to change it to 0 again (from alleged value 1)...

and, it wasn't even there!

So I created it anew (DWord 32bit, with value 0: I know what I'm doing), then rebooted and checked the registry again: It was correctly there, and at the correct "abc" position.

I then opened UR (v14) and made the following tries - the BIG MOMENT indeed!

("contains keywords" is CK, "matches wildcard" is MW)

a OR b > works fine
a AND b > works fine

a b > (ok: implicit AND works as fine as it did before)
diesbezügliches > is found (quite rare word / word form, 4 correct finds), i.e. works fine

diesbez?gliches > (placeholder for the non-ascii-char) does NOT work (=no find)
> checked AS: switch from CK to MW is not done
> did the switch manually, then AS > NO find

diesbezüglic?es > (placeholder for the "h") NO find (instead of 4)
> checked AS, switch not made
> switch to MW made by me, then AS triggered > NO find (instead of 4)

es > 9720 matches (in order to check if the ? cut the search term into two, then just searched for the sub-string after the ?, but that does not seem to be the case

diesbezüglich?s > (placeholder for the "e") 90 WRONG "finds", but none of the 4 to-be-expected-ones (and no highlighting (which is "on") within those "finds"
> checked AS, switch not made (i.e. stayed at CK)
> switch to MW made by me, then triggered AS > NO find (instead of 4)

diesbezüglich* > (should include diesbezüglich, so even more than 4 finds expected) NO find
> checked AS, switch IS (!) done here (but as said, no find)
> triggered AS again, with that correct MW > NO find either

diesbezüglic[h]es > (correct "h" within []) no find
> checked AS, switch not done
> made switch to MW manually, then triggered AS > no find

diesbezüglic[ht]es > (since [] is meant to containr more than just one char) no find (instead of at least 4)
> checked AS, switch not done
> made switch to MW manually, then triggered AS > no find

I then checked the registry again: The SNA key was GONE!

Thus, it seems that when the SNA key is set to 0 (or anyway?), running UR will DELETE the SNA key altogether!

In theory, that could be also be done by something else in my system, but as said, after reboot, the key I had created, was there, then had disappeared after running UR, which might indicate UR itself does do the deletion?

Thus, the next step would be, it seems, to identify WHY the (sub-) key gets deleted, then re-do the search tries listed above, and check if any of them work better than; as said, AND and OR work fine now, but only those.

My next post will contain the UR Options (sub-) key (in its current state: v14, without the SNA key again); you might obviously want to delete it after copying the list.

Spliff · #3 04-09-2024, 06:22 AM

Just the reg part here, see previous post.

kinook · #4 04-09-2024, 07:29 AM

The only time UR removes registry values is when uninstalling and confirming the option to remove them, and then it removes all of them.

Spliff · #5 04-10-2024, 01:00 AM

Registry = My mistake in part:
As said, I had re-installed UR, then searched for the SNA (sub-) key, first visually, then by RegEditor's search func > wasn't there;
I then had added the SNA key by hand, restarted the PC, checked: key was there.
I then had run UR, made the above search tries (with only OR and AND successful), and re-checked the registry, visually, I admit: key wasn't there anymore; then I posted the reg-dump and pretended the key (=which I had had to add before) wasn't there, but in fact, it's there, as the dump shows: at last position, with lots of non-abc reordering even in other parts, quite scrambled. Thus, the key definitely is there, for the time being.

So I continued to try QS:

diesbezügliches > 5 finds (checked: correctly highlighted)
(so 5 finds is the target number the other finds should met)

diesbez?gliches > 1 find
> checked AS: "contains keywords"
> I changed to MW > no find (5 expected)

diesbez[ük]gliches > no find (ditto for diesbez[ü]gliches)
> checked AS: "contains keywords"
> I changed to MW > no find (5 expected)

diesbezüglic?es > 1 find (checked AS: CK (MW expected?!); changed to MW: no find) (5 expected)

diesbezügliche? (= the ? for the end-s here, should NOT also find diesbezügliche since ? is deemed to stand in for exactly 1 char, not for 1 or 0) > 33 "finds", nothing highlighted; local searches then show that those "found" items contain the word diesbezügliche > the ? at the end obviously is discarded from the search

diesbezügliche > 53 finds (correctly highlighted and incl. the 5 occ of diesbezügliches)

diesbezügliche* > no find (* for 1, n or 0 chars, so * should be redundant here anyway)
checked AS: switch to MW is made (but as said, no find anyway, whilst 53 finds expected)

diesbez*gliches > no find
> checked AS: switch to MW is maid (but, as said, no find, with 5 finds expected)

without a non-ascii char:

dergleichen > 71 finds = ok

dergle?chen > no find
> checked AS: CK i.e. switch to MW is not made [edited for typo here]
> made the switch manually > no find (71 finds expected)

Thus, 2 problems:
- occurrence of (un-escaped) *,?,[] should trigger switch from CK to MW, but only * triggers that switch
- even with MW (manually, in AS), all of these placeholders are not correctly processed
(*-?-[]-problems obviously not linked to ascii-non-ascii)

kinook · #6 04-10-2024, 09:25 PM

I made some additional fixes (v6.3.0.16) for wildcard characters and contains keywords vs. matches wildcard search type and including NEAR/number in the non-lowercased FTS search expressions. Note that the wildcard syntax using [ ] and ? are not applicable to FTS search mode and only apply to SQLite non-FTS search.

Spliff · #7 04-11-2024, 03:16 AM

Thank you!

I may have been aware long time ago about ? and [] not being expected to work as wildcard when fts is "on", but then, have forgotten this very important fact in-between; e.g. the "Matches Wildcard" help page doesn't mention fts (neither in the text, nor as link); obviously, almost any UR user would have fts "on", since that's one the reasons for using such a program nowadays? ;-)

Thus, with fts "on", obviously:

? as expected from what you write:

diesbezügliches > correct CK finds

diesbez?gliches > correct, literal CK find for literal diesbez?gliches, and, as expected now (!), no find for diesbezügliches when I run it manually, as MW in AS

[] not very clear though:

jenes > 119 correct CK finds (QS)

jen[e]s > QS CK (checked the AS triggered from that: CK (as expected now)): now (=after reading you) 1 find expected, since I put literal jen[e]s in one item; 9 finds instead (but not 119, obviously), incl. the literal jen[e]s, thus 8 finds too much:

one of these items contains a literal jen[a-z]s (=copy of my UR post), another one contains a literal [some text], three contain a literal [1], and three items don't contain any literal [ or ] (checked by local searches).

This made me fear that even real, literal [some] searches might be faulty, and indeed: QS for [1] (which should be to be found dozens of times since it's often used for end notes in (my, plain-text-only, i.e. no formatting or third-party font problems involved, everything is Arial 12p) downloads from the web )

Thus, for literal [1], some dozen of correct finds expected, perhaps even 200; got near 5,000 "finds" instead (i.e. the - expected - [1] finds were totally buried within thousands of "finds" of which some contain some [some text here], but most of them don't even contain a single [ or ] (but may contain a single 1 that is? I can't verify, since searching for 1 brings almost 12,000 finds (which may not all really contain the digit 1, I don't know, obviously can't check such masses). (No highlighted "finds" here btw.)

You might have introduced placeholders / wildcards ? / [abc] / [a-c] / [^abc] / [^a-c] (obviously derived from regex) before introducing fts (which would also explain why you always update 3 tables with new or changes of item titles), and "now", with fts, those [] chars (which then are NOT wildcards here) are really treated as expected i.e. as literal chars? (Obviously the "fts vs. no fts" processing is not entirely distinct in all lines yet, with the code treating the "escaping" of these chars or not (needed with "no fts" only, whilst with fts, anything is literal anyway) probably interfering here?

Anyway, searching for literal [, ] and [...] does not seem to be reliable currently (always with fts "on"). EDIT: Afterthought: UR users would probably NOT want to do global searches for item's end-notes ([1] and so on), but that would rather be local searches by ^f, but proper (global) search processing of literal [brackets' content of any sort, including even, in case, "tags"], would obviously be welcome. ;-)

NEAR: Will be most helpful for many users I suppose!

a NEAR b (i.e. without number) works more or less, or even exactly, as AND, i.e. it finds items even when a and b are very far away from each other; there is no default value to be entered in Tools - Options - Search (=just for the record, I don't want to imply that would be needed); a NEAR/4 b finds items where a and b are separated by max 4 words (i.e. whatever fts considers "words", and in whichever order, i.e. b NEAR/4 b (or then then same with a bigger number) also finds the same string combination. (Both strings are highlighted everywhere in the found item(s), even where they don't form such a combination, and that's the preferred way of doing it indeed I suppose.)

*: I had some finds...

never with some*other (within) and someother* (* at the end), but with some*other* (i.e. the asterisk within the word AND at the end), but my tries to confirm / replicate them then systematically brought no finds anymore, so I might have done mistakes, by wishful thinking: * to be considered non-available currently.

Thus:

Literal [] processing is faulty, currently, but that's obviously not a problem which would be needed to be resolved on-the-spot; on the other hand, since [number] is quite frequent for end-notes in third-party documents, rectifying the behavior will be of definite interest for many... and while it's not rectified, UR users should be aware that those "finds" are problematic.

It's not realistic to "go back" to non-fts, and that applies to everyone I suppose, so wildcards are simply not available if they can't be implemented. (?)

As said, for tagging, wildcards would be most important though:
1)
xab with x for "it's a tag", a for category, b for the value, then:
QS xa will find all "a" values, from a to z
but it's not possible to find just SOME "a" values, except by
QS xac OR xad OR xae OR xah etc etc, i.e.
QS xa[c-e] OR xah, or then QS xa[cdeh] here, would be most helpful
(I admit that, now that OR in QS is working (again), I can (and have to) write a macro which translates my simili-regex to multiple UR-fts-QS ORs, so this lack can be overcome; having to create / fill in multiple AS rows instead would have been a nightmare, so that's avoided indeed!)
2)
The above indicates that any tags in UR thus have to be distinct; "combine" tags may NOT be used:
We must use xasome for the values of/in category a, and xbsome for the values of/in category b, since the construct (intended by me and much more elegant / easy in most situations), e.g.
xxx = x for "tag", then category 1 (about 40 possible 1-char-values a-z, 0-9, äöü or éàèù, etc.) then category 2 (as category 1, about 40 possible 1-char-values)
is not practicable:
QS xch for cat1-value c and cat2-value h is possible, and even
QS xc for cat1-value c and ANY cat-2 value is possible, but
QS x?h for ANY cat-1-value but value h in cat-2 being not available,
so category-combining tags
(whenever about 40 possible values per category would be sufficient, and that should cover almost all practical situations indeed (and then, the very last category in such a "tag combination in one" could obviously bear any number of values if really needed)
are currently NOT possible in UR, since it would be impossible to find any ranges (incl. "show ALL") of values for any of those categories except the very last one:
QS xch > cat-1 must be of precise (i.e. not any) value if you need to search for a precise value in cat-2, and there is NO search whatsoever, even when you are willing to write a quite complicated AS, which would make you find
xfirst = any value and xsecond = some precise value,
except, of course, listing ALL possible combinations, like
QS xah OR xbh OR xch (etc, down to z for cat-1),
which would make 26 (or more) ORs, just for finding ONE cat-2 value...
and when you try to combine 3 categories, you either get nuts or get a blue screen, whichever arrives first.

Thus, even with UR's AS, avoid (easy, simple, elegant for most situations) combined-tags at all cost...
or then, would it be possible to combine ?-as-placeholder at least with fts? [range-or-list-here] would obviously have been ideal... ;-)

EDIT: For the sake of completeness, my tries seem to indicate that (whilst parentheses are not allowed for grouping, obviously) even implicit AND has precedence over OR, so a OR b c OR d OR d e OR f will list the items containing a or f or (b and c) or (d and e), so that when the user does it right (without the helping hand of ()), UR does it right then, or in other words, the user might use parentheses for their search construction, then only (and necessarily) eliminate them before feeding the search string to the QS.

EDIT 2:

Afterthought: It's known that UR offers "User-defined keywords", and also tagging in some additional "attribute"; what I call "tags" is just a synonym for the former, but in "coordinated", "standardized" way, and as for the latter, those "attributes" are buried within an additional pane in which creating, editing, and retrieval are not available that easy-peasy. Thus, I prefer "text tags", preferable (but not necessarily) even in the title, since tag renames by sql in the (3) sql tables' title columns are very easy, whilst those, within the items' content, would imply lots of fuss = global replace, to be done one item's content pane after the other; then, since for most such tagging, about 40 possible values per category would be amply sufficient in virtually all real-life use cases, the introduction of wildcards, ?/[], even in combination with fts, if technically possible (?), would be immensely helpful, since, within a given db, or a given sub-tree in a bigger db, tags in the form xfur, listing 3 categories at the same time, would amply suffice, whilst the current, mandatory form xaf xbu xcr, for exactly the same information, obviously come way less handy... especially if, for the reasons given above, the user wants to put those tags within the items' titles. ;-)

And, to say it all, I've found FOUR possible "tag codes", available in UR, being more or less readily available by keyboard, depending on the user's language / country settings, obviously - there might be others though -, AND correctly processed by UR's current fts index(ing): ° (but beware of °C, °F and the like), § (but beware of §number without (!) a space between them in case, so if you or your (third-party?) documents use § without a space before the number, §'s tag use may be already excluded in case, but even in such cases, without a space that would follow, your tags would start with an abc char, whilst "paragraphs" would start with a number in almost every case then?, ¦ (not also |), and finally ¬...

whilst in Voidtool's Everything e.g., ANY char is included within the index, so that for tags, a simple ,abc or then a simple .abc is valid = will be found by the search... which currently is NOT the case with UR's fts - when it's perfectly understood by anyone that commas (,) and periods (.) immediately follow the previous text, then are separated from the text that follows, by a space: Thus, including . and , into UR fts index would do NO harm whatsoever, and would not even "blow up" that index, since the ONLY additional occurrences would then be those TAGS indeed, much easier to type than °, §, ¦ or ¬, AND, most of all, MUCH more pleasant to the eye than any of those, ° being the visually most acceptable between the four ones currently available... ;-)

Thus:

Would it be possible to add "," and/or "." to the fts index = to treat them as allowed word-starts? Or even other such non-abc chars while we are at it? (With the dot, admittedly, being the visually most pleasing tag code char in the end.)