Thread: Export problems
View Single Post
  #4  
Old 07-07-2021, 03:38 AM
Spliff Spliff is offline
Registered User
 
Join Date: 04-07-2021
Posts: 192
So I had a thorough look into the above, and wrote the necessary script, not yet for transfer into the target xml format, but for the export from the above UR export (XML OPML, all attributes, children recursive, html "None" (which is not true since all the html folders and files will be created by this, additionally to the text-only xml file)).

As mentioned above, you do a loop which delimits and processes the items, one by one, then deciding for them, e.g. by the flag number, if they are to be included into the final output or not, and if yes, you also analyze the Item Text data, for further discarding parts into that, e.g. "comment lines" (starting with a special character), or discarding anything within the Item Text data below some "code", e.g. a separator line, between "text to be exported" and data which is not; also, within the Item Text, you may "code" some special data, again with leading special characters, to be then deleted from the "text" part, but to be written into special variables for further use.

Then (i.e. if you hadn't discarded the item altogether), you write those variables into the target variable (append), e.g. like

|tThe item Title
|iThe item ID if needed, etc.
|oSome Other data retrieved from the Item Text
|cThe item text / Content

and finally you write this variable into a file, or process it the way you need, replacing the above codes (|o and the like) by the corresponding, needed XML or other notation.

I can confirm that this works as expected, i.e. this UR export is reliable, including correct rendering of newlines and blanklines, etc., so yes, you can use this as output from the above-mentioned markdown and similar; this is a very positive finding since just for feeding the search index, UR's redundant text-only content storage would NOT have needed to preserve the correct newlines, etc.

On the other hand, the above-mentioned UR export produces a quite incredible overhead, since all the described selection work is just done in my script, AFTER UR export, and if I only want to use 10 p.c. of my items within the target application (by discarding about 90 p.c. of them via their flag number), UR produces, within the xml file alone, tenfold what I need, since there is no possibility to discard flag numbers within the UR export dialogue already, and worse, whilst I don't even need the html folders and files even for my 10 p.c. of items, UR, at every single export as described above, creates them though within its "raw" export, i.e. I then have to delete not just 100, but 1,000 newly created html folders and files.

Thus, I would suggest to implement another variant in that XML export dialogue, and which would be very easy, since it would be just a subset of the current "Html: none" variant, but without the html indeed, this time. (As said, the additional html files are needed if you want to preserve text formatting, and are not needed if not.)
_______________

For my means, and abhorring the extent of the overhead I got, I then tried csv export, just for ItemTitle, Flag and ItemText, and here again, I got some overhead since here again, I just can make my selection, via Flag, after the export (i.e. I get tenfold the number of csv "records" I really need), and here, the Flag NAME is exported, not the flag number, but whatever, I'll rename my flags to 1-character "names", then select by those.

I don't like the fact that the csv export is strict csv, i.e. with commata as only possible field separators, and thus "" as field starters and endings, with "" for " within text - having had the possibility to chose from tab or | or such as field separator would have come so much more handy -, BUT here again, I can confirm that the export is without fault, including preservation of newlines, blanklines and all, and thankfully, the Text/Content field is the last one (Title - Flag - Text), which facilitates the visual checking in some (good) text editor (e.g. EmEditor), and there are also specialized csv editors like Ron's Editor - I checked the UR csv output in both, and it's without fault.

(Btw, the "indent level" here is just that, a single number, it's not in the form 1, 1.1, 1.2 ... 18.3.45 or such, but for your individual use case, it may be of help indeed, since it preserves thus what the leadings tabs are within the above-described xml export.)

Thus, it's obvious that if you don't have a need for preserving text formatting, you will use csv export instead of xml export, thus getting incredibly less overhead, and in both cases, you will have to recreate the necessary xml (or other) notation for your target application from scratch anyway.


EDIT: It just occurred to me that the "Indent Level" info is extremely helpful indeed: Instead of the need to "flag" even the child items of "not to be exported" parent items, or then to have to flag all the "exportable" items as "exportable", you just "flag" as "non-exportable" the concerned sub-sub-trees' parent items, and then in your "export-from-export" script, for every "non-exportable" item, you get the indent-level, then check the following items for "indent level number greater than that given number, and while "yes", you discard those as well. This is a big facilitation for your general work as well, so yes, the "indent level" attribute is to be considered a core element in UR export.

Last edited by Spliff; 07-07-2021 at 04:12 AM.
Reply With Quote