github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/dogsheep/evernote-to-sqlite/issues/6#issuecomment-706785086 | https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/6 | 706785086 | MDEyOklzc3VlQ29tbWVudDcwNjc4NTA4Ng== | 9599 | 2020-10-11T23:28:50Z | 2020-10-11T23:28:50Z | MEMBER | The XML for the OCR stuff is a bit weird. Currently I'm doing this to it: https://github.com/dogsheep/evernote-to-sqlite/blob/c33d7b043a45eb3e88676e5fa3ce31755199d9f8/evernote_to_sqlite/utils.py#L70-L78 This can produce some odd results, for example: > Sure 'Sure, 'Sure. Sure, Sure. sure sure. sure ? If you If Yau [you live jive In m 1n an area devoid of natural wonders, wanders, wonders ? wonders wonders. your mind will be blown, blown' blown. blown ? -e i ? ,1 IL it ? at ? KY ? fl ft bat at Which came from this image: ![image](https://user-images.githubusercontent.com/9599/95692952-5dd7c880-0bde-11eb-939a-d10b800a4105.png) The XML for that is: ```xml <recoIndex docType="unknown" objType="image" objID="05ffb72b307bf495f064243c7099d94f" engineVersion="6.5.17.7" recoType="service" lang="en" objWidth="1000" objHeight="1504"> <item x="68" y="75" w="104" h="37"> <t w="60">Sure</t> <t w="52">'Sure,</t> <t w="47">'Sure.</t> <t w="33">Sure,</t> <t w="26">Sure.</t> </item> <item x="182" y="83" w="92" h="26"> <t w="62">sure</t> <t w="58">sure.</t> <t w="46">sure ?</t> </item> <item x="69" y="132" w="107" h="45"> <t w="81">If you</t> <t w="64">If Yau</t> <t w="31">[you</t> </item> <item x="186" y="132" w="67" h="35"> <t w="85">live</t> <t w="51">jive</t> </item> <item x="263" y="140" w="36" h="27"> <t w="82">In</t> <t w="56">m</t> <t w="53">1n</t> </item> <item x="309" y="140" w="53" h="27"> <t w="82">an</t> </item> <item x="372" y="141" w="90" h="26"> <t w="94">area</t> </item> <item x="472" y="132" w="138" h="35"> <t w="85">devoid</t> </item> <item x="620" y="132" w="43" h="35"> <t w="82">of</t> </item> <item x="68" y="190" w="137" h="35"> <t w="87">natural</t> </item> <item x="215" y="190" w="187" h="39"> <t w="57">wonders,</t> <t w="55">wanders,</t> <t w="52">wonders ?</t> <t w="45">wonders</t> <t w="42">won… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 718949182 | |
https://github.com/dogsheep/evernote-to-sqlite/issues/6#issuecomment-706785201 | https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/6 | 706785201 | MDEyOklzc3VlQ29tbWVudDcwNjc4NTIwMQ== | 9599 | 2020-10-11T23:29:39Z | 2020-10-11T23:29:39Z | MEMBER | It looks to me like each of those `<item>` blocks has a number of guesses in order of confidence: ```xml <item x="215" y="190" w="187" h="39"> <t w="57">wonders,</t> <t w="55">wanders,</t> <t w="52">wonders ?</t> <t w="45">wonders</t> <t w="42">wonders.</t> </item> ``` So maybe the best approach here is to just take the first `t` element within each `item`. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 718949182 |