Quick Search, Advanced Search: "OR" and (explicit) "AND" not working - Page 2

kinook · #16 04-09-2024, 07:29 AM

The only time UR removes registry values is when uninstalling and confirming the option to remove them, and then it removes all of them.

Spliff · #17 04-10-2024, 01:00 AM

Registry = My mistake in part:
As said, I had re-installed UR, then searched for the SNA (sub-) key, first visually, then by RegEditor's search func > wasn't there;
I then had added the SNA key by hand, restarted the PC, checked: key was there.
I then had run UR, made the above search tries (with only OR and AND successful), and re-checked the registry, visually, I admit: key wasn't there anymore; then I posted the reg-dump and pretended the key (=which I had had to add before) wasn't there, but in fact, it's there, as the dump shows: at last position, with lots of non-abc reordering even in other parts, quite scrambled. Thus, the key definitely is there, for the time being.

So I continued to try QS:

diesbezügliches > 5 finds (checked: correctly highlighted)
(so 5 finds is the target number the other finds should met)

diesbez?gliches > 1 find
> checked AS: "contains keywords"
> I changed to MW > no find (5 expected)

diesbez[ük]gliches > no find (ditto for diesbez[ü]gliches)
> checked AS: "contains keywords"
> I changed to MW > no find (5 expected)

diesbezüglic?es > 1 find (checked AS: CK (MW expected?!); changed to MW: no find) (5 expected)

diesbezügliche? (= the ? for the end-s here, should NOT also find diesbezügliche since ? is deemed to stand in for exactly 1 char, not for 1 or 0) > 33 "finds", nothing highlighted; local searches then show that those "found" items contain the word diesbezügliche > the ? at the end obviously is discarded from the search

diesbezügliche > 53 finds (correctly highlighted and incl. the 5 occ of diesbezügliches)

diesbezügliche* > no find (* for 1, n or 0 chars, so * should be redundant here anyway)
checked AS: switch to MW is made (but as said, no find anyway, whilst 53 finds expected)

diesbez*gliches > no find
> checked AS: switch to MW is maid (but, as said, no find, with 5 finds expected)

without a non-ascii char:

dergleichen > 71 finds = ok

dergle?chen > no find
> checked AS: CK i.e. switch to MW is not made [edited for typo here]
> made the switch manually > no find (71 finds expected)

Thus, 2 problems:
- occurrence of (un-escaped) *,?,[] should trigger switch from CK to MW, but only * triggers that switch
- even with MW (manually, in AS), all of these placeholders are not correctly processed
(*-?-[]-problems obviously not linked to ascii-non-ascii)

kinook · #18 04-10-2024, 09:25 PM

I made some additional fixes (v6.3.0.16) for wildcard characters and contains keywords vs. matches wildcard search type and including NEAR/number in the non-lowercased FTS search expressions. Note that the wildcard syntax using [ ] and ? are not applicable to FTS search mode and only apply to SQLite non-FTS search.

Spliff · #19 04-11-2024, 03:16 AM

Thank you!

I may have been aware long time ago about ? and [] not being expected to work as wildcard when fts is "on", but then, have forgotten this very important fact in-between; e.g. the "Matches Wildcard" help page doesn't mention fts (neither in the text, nor as link); obviously, almost any UR user would have fts "on", since that's one the reasons for using such a program nowadays? ;-)

Thus, with fts "on", obviously:

? as expected from what you write:

diesbezügliches > correct CK finds

diesbez?gliches > correct, literal CK find for literal diesbez?gliches, and, as expected now (!), no find for diesbezügliches when I run it manually, as MW in AS

[] not very clear though:

jenes > 119 correct CK finds (QS)

jen[e]s > QS CK (checked the AS triggered from that: CK (as expected now)): now (=after reading you) 1 find expected, since I put literal jen[e]s in one item; 9 finds instead (but not 119, obviously), incl. the literal jen[e]s, thus 8 finds too much:

one of these items contains a literal jen[a-z]s (=copy of my UR post), another one contains a literal [some text], three contain a literal [1], and three items don't contain any literal [ or ] (checked by local searches).

This made me fear that even real, literal [some] searches might be faulty, and indeed: QS for [1] (which should be to be found dozens of times since it's often used for end notes in (my, plain-text-only, i.e. no formatting or third-party font problems involved, everything is Arial 12p) downloads from the web )

Thus, for literal [1], some dozen of correct finds expected, perhaps even 200; got near 5,000 "finds" instead (i.e. the - expected - [1] finds were totally buried within thousands of "finds" of which some contain some [some text here], but most of them don't even contain a single [ or ] (but may contain a single 1 that is? I can't verify, since searching for 1 brings almost 12,000 finds (which may not all really contain the digit 1, I don't know, obviously can't check such masses). (No highlighted "finds" here btw.)

You might have introduced placeholders / wildcards ? / [abc] / [a-c] / [^abc] / [^a-c] (obviously derived from regex) before introducing fts (which would also explain why you always update 3 tables with new or changes of item titles), and "now", with fts, those [] chars (which then are NOT wildcards here) are really treated as expected i.e. as literal chars? (Obviously the "fts vs. no fts" processing is not entirely distinct in all lines yet, with the code treating the "escaping" of these chars or not (needed with "no fts" only, whilst with fts, anything is literal anyway) probably interfering here?

Anyway, searching for literal [, ] and [...] does not seem to be reliable currently (always with fts "on"). EDIT: Afterthought: UR users would probably NOT want to do global searches for item's end-notes ([1] and so on), but that would rather be local searches by ^f, but proper (global) search processing of literal [brackets' content of any sort, including even, in case, "tags"], would obviously be welcome. ;-)

NEAR: Will be most helpful for many users I suppose!

a NEAR b (i.e. without number) works more or less, or even exactly, as AND, i.e. it finds items even when a and b are very far away from each other; there is no default value to be entered in Tools - Options - Search (=just for the record, I don't want to imply that would be needed); a NEAR/4 b finds items where a and b are separated by max 4 words (i.e. whatever fts considers "words", and in whichever order, i.e. b NEAR/4 b (or then then same with a bigger number) also finds the same string combination. (Both strings are highlighted everywhere in the found item(s), even where they don't form such a combination, and that's the preferred way of doing it indeed I suppose.)

*: I had some finds...

never with some*other (within) and someother* (* at the end), but with some*other* (i.e. the asterisk within the word AND at the end), but my tries to confirm / replicate them then systematically brought no finds anymore, so I might have done mistakes, by wishful thinking: * to be considered non-available currently.

Thus:

Literal [] processing is faulty, currently, but that's obviously not a problem which would be needed to be resolved on-the-spot; on the other hand, since [number] is quite frequent for end-notes in third-party documents, rectifying the behavior will be of definite interest for many... and while it's not rectified, UR users should be aware that those "finds" are problematic.

It's not realistic to "go back" to non-fts, and that applies to everyone I suppose, so wildcards are simply not available if they can't be implemented. (?)

As said, for tagging, wildcards would be most important though:
1)
xab with x for "it's a tag", a for category, b for the value, then:
QS xa will find all "a" values, from a to z
but it's not possible to find just SOME "a" values, except by
QS xac OR xad OR xae OR xah etc etc, i.e.
QS xa[c-e] OR xah, or then QS xa[cdeh] here, would be most helpful
(I admit that, now that OR in QS is working (again), I can (and have to) write a macro which translates my simili-regex to multiple UR-fts-QS ORs, so this lack can be overcome; having to create / fill in multiple AS rows instead would have been a nightmare, so that's avoided indeed!)
2)
The above indicates that any tags in UR thus have to be distinct; "combine" tags may NOT be used:
We must use xasome for the values of/in category a, and xbsome for the values of/in category b, since the construct (intended by me and much more elegant / easy in most situations), e.g.
xxx = x for "tag", then category 1 (about 40 possible 1-char-values a-z, 0-9, äöü or éàèù, etc.) then category 2 (as category 1, about 40 possible 1-char-values)
is not practicable:
QS xch for cat1-value c and cat2-value h is possible, and even
QS xc for cat1-value c and ANY cat-2 value is possible, but
QS x?h for ANY cat-1-value but value h in cat-2 being not available,
so category-combining tags
(whenever about 40 possible values per category would be sufficient, and that should cover almost all practical situations indeed (and then, the very last category in such a "tag combination in one" could obviously bear any number of values if really needed)
are currently NOT possible in UR, since it would be impossible to find any ranges (incl. "show ALL") of values for any of those categories except the very last one:
QS xch > cat-1 must be of precise (i.e. not any) value if you need to search for a precise value in cat-2, and there is NO search whatsoever, even when you are willing to write a quite complicated AS, which would make you find
xfirst = any value and xsecond = some precise value,
except, of course, listing ALL possible combinations, like
QS xah OR xbh OR xch (etc, down to z for cat-1),
which would make 26 (or more) ORs, just for finding ONE cat-2 value...
and when you try to combine 3 categories, you either get nuts or get a blue screen, whichever arrives first.

Thus, even with UR's AS, avoid (easy, simple, elegant for most situations) combined-tags at all cost...
or then, would it be possible to combine ?-as-placeholder at least with fts? [range-or-list-here] would obviously have been ideal... ;-)

EDIT: For the sake of completeness, my tries seem to indicate that (whilst parentheses are not allowed for grouping, obviously) even implicit AND has precedence over OR, so a OR b c OR d OR d e OR f will list the items containing a or f or (b and c) or (d and e), so that when the user does it right (without the helping hand of ()), UR does it right then, or in other words, the user might use parentheses for their search construction, then only (and necessarily) eliminate them before feeding the search string to the QS.

EDIT 2:

Afterthought: It's known that UR offers "User-defined keywords", and also tagging in some additional "attribute"; what I call "tags" is just a synonym for the former, but in "coordinated", "standardized" way, and as for the latter, those "attributes" are buried within an additional pane in which creating, editing, and retrieval are not available that easy-peasy. Thus, I prefer "text tags", preferable (but not necessarily) even in the title, since tag renames by sql in the (3) sql tables' title columns are very easy, whilst those, within the items' content, would imply lots of fuss = global replace, to be done one item's content pane after the other; then, since for most such tagging, about 40 possible values per category would be amply sufficient in virtually all real-life use cases, the introduction of wildcards, ?/[], even in combination with fts, if technically possible (?), would be immensely helpful, since, within a given db, or a given sub-tree in a bigger db, tags in the form xfur, listing 3 categories at the same time, would amply suffice, whilst the current, mandatory form xaf xbu xcr, for exactly the same information, obviously come way less handy... especially if, for the reasons given above, the user wants to put those tags within the items' titles. ;-)

And, to say it all, I've found FOUR possible "tag codes", available in UR, being more or less readily available by keyboard, depending on the user's language / country settings, obviously - there might be others though -, AND correctly processed by UR's current fts index(ing): ° (but beware of °C, °F and the like), § (but beware of §number without (!) a space between them in case, so if you or your (third-party?) documents use § without a space before the number, §'s tag use may be already excluded in case, but even in such cases, without a space that would follow, your tags would start with an abc char, whilst "paragraphs" would start with a number in almost every case then?, ¦ (not also |), and finally ¬...

whilst in Voidtool's Everything e.g., ANY char is included within the index, so that for tags, a simple ,abc or then a simple .abc is valid = will be found by the search... which currently is NOT the case with UR's fts - when it's perfectly understood by anyone that commas (,) and periods (.) immediately follow the previous text, then are separated from the text that follows, by a space: Thus, including . and , into UR fts index would do NO harm whatsoever, and would not even "blow up" that index, since the ONLY additional occurrences would then be those TAGS indeed, much easier to type than °, §, ¦ or ¬, AND, most of all, MUCH more pleasant to the eye than any of those, ° being the visually most acceptable between the four ones currently available... ;-)

Thus:

Would it be possible to add "," and/or "." to the fts index = to treat them as allowed word-starts? Or even other such non-abc chars while we are at it? (With the dot, admittedly, being the visually most pleasing tag code char in the end.)