issue_comments
9,947 rows sorted by issue_url
This data as json, CSV (advanced)
id | html_url | issue_url ▼ | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
790312268 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA== | simonw 9599 | 2021-03-04T05:48:16Z | 2021-03-04T05:48:16Z | MEMBER | Wow, my mbox is a 10.35 GB download! | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790369076 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng== | simonw 9599 | 2021-03-04T06:54:46Z | 2021-03-04T06:54:46Z | MEMBER | The Rich-powered progress bar is pretty: ![rich](https://user-images.githubusercontent.com/9599/109923307-71f69200-7c73-11eb-9ee2-8f0a240f3994.gif) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790370485 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ== | simonw 9599 | 2021-03-04T06:57:25Z | 2021-03-04T06:57:48Z | MEMBER | The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790372621 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ== | simonw 9599 | 2021-03-04T07:01:18Z | 2021-03-04T07:01:18Z | MEMBER | I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in `healthkit-to-sqlite` - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file. https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the `progress_callback()` bit) is where that happens. It can be a bit of a convoluted pattern, and I'm not at all sure it would work for `mbox` files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790373024 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA== | simonw 9599 | 2021-03-04T07:01:58Z | 2021-03-04T07:04:06Z | MEMBER | I got 9 warnings that look like this: ``` Errors: 1 Traceback (most recent call last): File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 103, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 167, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the `mbox` and see what was going on. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790378658 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA== | simonw 9599 | 2021-03-04T07:12:48Z | 2021-03-04T07:12:48Z | MEMBER | It looks like the `body` is being loaded into a BLOB column - so in Datasette default it looks like this: <img width="1650" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109924808-b4b96980-7c75-11eb-8c9e-307f2ae32d5a.png"> If I `datasette install datasette-render-binary` and then try again I get this: <img width="1487" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109924944-ea5e5280-7c75-11eb-9a32-404f3d68455f.png"> It would be great if we could store the `body` as unicode text instead. May have to do something clever to decode it based on some kind of charset header? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790379629 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ== | simonw 9599 | 2021-03-04T07:14:41Z | 2021-03-04T07:14:41Z | MEMBER | Confirmed: removing the `len()` call does not speed things up, so it's reading through the entire file for some other purpose too. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790380839 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ== | simonw 9599 | 2021-03-04T07:17:05Z | 2021-03-04T07:17:05Z | MEMBER | Looks like you're doing this: ```python elif message.get_content_type() == "text/plain": body = message.get_payload(decode=True) ``` So presumably that decodes to a unicode string? I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790389335 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790389335 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM4OTMzNQ== | UtahDave 306240 | 2021-03-04T07:32:04Z | 2021-03-04T07:32:04Z | NONE | > The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: > > https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 > > I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long. The wait is from python loading the mbox file. This happens regardless if you're getting the length of the mbox. The mbox module is on the slow side. It is possible to do one's own parsing of the mbox, but I kind of wanted to avoid doing that. | {"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790391711 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ== | UtahDave 306240 | 2021-03-04T07:36:24Z | 2021-03-04T07:36:24Z | NONE | > Looks like you're doing this: > > ```python > elif message.get_content_type() == "text/plain": > body = message.get_payload(decode=True) > ``` > > So presumably that decodes to a unicode string? > > I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that. Also, I noticed after I opened this PR that the `message.get_payload()` is being deprecated in favor of `message.get_content()` or something like that. I'll see if that handles the decoding better, too. Thanks for the feedback. I should have time tomorrow to put together some improvements. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790668263 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw== | simonw 9599 | 2021-03-04T14:43:58Z | 2021-03-04T14:43:58Z | MEMBER | I added this code to output a message ID on errors: ```diff print("Errors: {}".format(num_errors)) print(traceback.format_exc()) + print("Message-Id: {}".format(email.get("Message-Id", "None"))) continue ``` Having found a message ID that had an error, I ran this command to see the context: rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox This was for the following error: ``` File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 102, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 178, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` Here's what I spotted in the `ripgrep` output: ``` 177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI> 177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) 177133572-X-Mailer: IncrediMail (5002253) ``` So it could it be that `_parsedate_tz` is having trouble with that `Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)` string. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790669767 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw== | simonw 9599 | 2021-03-04T14:46:06Z | 2021-03-04T14:46:06Z | MEMBER | Solution could be to pre-process that string by splitting on `(` and dropping everything afterwards, assuming that the `(...)` bit isn't necessary for correctly parsing the date. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790693674 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA== | simonw 9599 | 2021-03-04T15:18:36Z | 2021-03-04T15:18:36Z | MEMBER | I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790695126 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg== | simonw 9599 | 2021-03-04T15:20:42Z | 2021-03-04T15:20:42Z | MEMBER | I'm not sure why but my most recent import, when displayed in Datasette, looks like this: <img width="574" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109985836-0ab00080-7cba-11eb-97d5-0631a0835b61.png"> Sorting by `id` in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
791089881 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791089881 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MTA4OTg4MQ== | maxhawkins 28565 | 2021-03-05T02:03:19Z | 2021-03-05T02:03:19Z | NONE | I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
791530093 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791530093 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MTUzMDA5Mw== | UtahDave 306240 | 2021-03-05T16:28:07Z | 2021-03-05T16:28:07Z | NONE | > I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. > > Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once? @maxhawkins a limitation of the python mbox module is it loads the entire mbox into memory. I did find another approach to this problem that didn't use the builtin python mbox module and created a generator so that it didn't have to load the whole mbox into memory. I was hoping to use standard library modules, but this might be a good reason to investigate that approach a bit more. My worry is making sure a custom processor handles all the ins and outs of the mbox format correctly. Hm. As I'm writing this, I thought of something. I think I can parse each message one at a time, and then use an mbox function to load each message using the python mbox module. That way the mbox module can still deal with the specifics of the mbox format, but I can use a generator. I'll give that a try. Thanks for the feedback @maxhawkins and @simonw. I'll give that a try. @simonw can we hold off on merging this until I can test this new approach? | {"total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
849708617 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-849708617 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDg0OTcwODYxNw== | maxhawkins 28565 | 2021-05-27T15:01:42Z | 2021-05-27T15:01:42Z | NONE | Any updates? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
884672647 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-884672647 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40uwiH | maxhawkins 28565 | 2021-07-22T05:56:31Z | 2021-07-22T14:03:08Z | NONE | How does this commit look? https://github.com/maxhawkins/google-takeout-to-sqlite/commit/72802a83fee282eb5d02d388567731ba4301050d It seems that Takeout's mbox format is pretty simple, so we can get away with just splitting the file on lines begining with `From `. My commit just splits the file every time a line starts with `From ` and uses `email.message_from_bytes` to parse each chunk. I was able to load a 12GB takeout mbox without the program using more than a couple hundred MB of memory during the import process. It does make us lose the progress bar, but maybe I can add that back in a later commit. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
885022230 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885022230 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40wF4W | maxhawkins 28565 | 2021-07-22T15:51:46Z | 2021-07-22T15:51:46Z | NONE | One thing I noticed is this importer doesn't save attachments along with the body of the emails. It would be nice if those got stored as blobs in a separate attachments table so attachments can be included while fetching search results. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
885094284 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885094284 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40wXeM | maxhawkins 28565 | 2021-07-22T17:41:32Z | 2021-07-22T17:41:32Z | NONE | I added a follow-up commit that deals with emails that don't have a `Date` header: https://github.com/maxhawkins/google-takeout-to-sqlite/commit/4bc70103582c10802c85a523ef1e99a8a2154aa9 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
885098025 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885098025 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40wYYp | UtahDave 306240 | 2021-07-22T17:47:50Z | 2021-07-22T17:47:50Z | NONE | Hi @maxhawkins , I'm sorry, I haven't had any time to work on this. I'll have some time tomorrow to test your commits. I think they look great. I'm great with your commits superseding my initial attempt here. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
888075098 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-888075098 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs407vNa | maxhawkins 28565 | 2021-07-28T07:18:56Z | 2021-07-28T07:18:56Z | NONE | > I'm not sure why but my most recent import, when displayed in Datasette, looks like this: > > <img alt="mbox__mbox_emails__753_446_rows" width="574" src="https://user-images.githubusercontent.com/9599/109985836-0ab00080-7cba-11eb-97d5-0631a0835b61.png"> I did some investigation into this issue and made a fix [here](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8ee555c2889a38ff42b95664ee074b4a01a82f06). The problem was that some messages (like gchat logs) don't have a `Message-Id` and we need to use `X-GM-THRID` as the pkey instead. @simonw While looking into this I found something unexpected about how sqlite_utils handles upserts if the pkey column is `None`. When the pkey is NULL I'd expect the function to either use rowid or throw an exception. Instead, it seems upsert_all creates a row where all columns are NULL instead of using the values provided as parameters. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | WIP: Add Gmail takeout mbox import 813880401 | |
790384087 | https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6 | MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw== | simonw 9599 | 2021-03-04T07:22:51Z | 2021-03-04T07:22:51Z | MEMBER | #3 also mentions the conflicting version with other tools. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Upgrade to latest sqlite-utils 821841046 | |
894581223 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-894581223 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | IC_kwDODFE5qs41Ujnn | maxhawkins 28565 | 2021-08-07T00:57:48Z | 2021-08-07T00:57:48Z | NONE | Just added two more fixes: * Added parsing for rfc 2047 encoded unicode headers * Body is now stored as TEXT rather than a BLOB regardless of what order the messages are parsed in. I was able to run this on my Takeout export and everything seems to work fine. @simonw let me know if this looks good to merge. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add Gmail takeout mbox import (v2) 954546309 | |
896378525 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-896378525 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | IC_kwDODFE5qs41baad | maxhawkins 28565 | 2021-08-10T23:28:45Z | 2021-08-10T23:28:45Z | NONE | I added parsing of text/html emails using BeautifulSoup. Around half of the emails in my archive don't include a text/plain payload so adding html parsing makes a good chunk of them searchable. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add Gmail takeout mbox import (v2) 954546309 | |
1002735370 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1002735370 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | IC_kwDODFE5qs47xIcK | Btibert3 203343 | 2021-12-29T18:58:23Z | 2021-12-29T18:58:23Z | NONE | @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add Gmail takeout mbox import (v2) 954546309 | |
1003437288 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1003437288 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | IC_kwDODFE5qs47zzzo | maxhawkins 28565 | 2021-12-31T19:06:20Z | 2021-12-31T19:06:20Z | NONE | > @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text. Shouldn't be hard. The easiest way is probably to remove the `if body.content_type == "text/html"` clause from [utils.py:254](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8e6d487b697ce2e8ad885acf613a157bfba84c59#diff-25ad9dd1ced1b8bfc37fda8444819c803232c08891e4af3d4064aa205d8174eaR254) and just return content directly without parsing. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add Gmail takeout mbox import (v2) 954546309 | |
602094386 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/1#issuecomment-602094386 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/1 | MDEyOklzc3VlQ29tbWVudDYwMjA5NDM4Ng== | simonw 9599 | 2020-03-21T19:47:46Z | 2020-03-21T19:47:46Z | MEMBER | Released as 0.3. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Set up full text search 585526292 | |
886135562 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886135562 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2 | IC_kwDODtX3eM400VsK | simonw 9599 | 2021-07-25T02:01:11Z | 2021-07-25T02:01:11Z | MEMBER | That page doesn't have an API but does look easy to scrape. The other option here is the HN Search API powered by Algolia, documented at https://hn.algolia.com/api | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Command for fetching Hacker News threads from the search API 952179830 | |
886135922 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886135922 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2 | IC_kwDODtX3eM400Vxy | simonw 9599 | 2021-07-25T02:06:20Z | 2021-07-25T02:06:20Z | MEMBER | https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url looks like it does what I want. https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url&hitsPerPage=1000 - returns 1000 at once. Otherwise you have to paginate using `&page=2` etc - up to `nbPages` pages. https://www.algolia.com/doc/api-reference/api-parameters/hitsPerPage/ says 1000 is the maximum. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Command for fetching Hacker News threads from the search API 952179830 | |
886136224 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886136224 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2 | IC_kwDODtX3eM400V2g | simonw 9599 | 2021-07-25T02:08:29Z | 2021-07-25T02:08:29Z | MEMBER | Prototype: curl "https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url&hitsPerPage=1000" | \ jq .hits | sqlite-utils insert hn.db items - --pk objectID --alter | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Command for fetching Hacker News threads from the search API 952179830 | |
886140431 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886140431 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2 | IC_kwDODtX3eM400W4P | simonw 9599 | 2021-07-25T03:12:57Z | 2021-07-25T03:12:57Z | MEMBER | I'm going to build a general-purpose `hacker-new-to-sqlite search ...` command, where one of the options is to search within the URL. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Command for fetching Hacker News threads from the search API 952179830 | |
886142671 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886142671 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3 | IC_kwDODtX3eM400XbP | simonw 9599 | 2021-07-25T03:51:05Z | 2021-07-25T03:51:05Z | MEMBER | Prototype: curl 'https://hn.algolia.com/api/v1/items/27941108' \ | jq '[recurse(.children[]) | del(.children)]' \ | sqlite-utils insert hn.db items - --pk id | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use HN algolia endpoint to retrieve trees 952189173 | |
886237834 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886237834 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3 | IC_kwDODtX3eM400uqK | simonw 9599 | 2021-07-25T18:05:32Z | 2021-07-25T18:05:32Z | MEMBER | If you hit the endpoint for a comment that's part of a thread you get that comment and its recursive children: https://hn.algolia.com/api/v1/items/27941552 You can tell that it's not the top-level because the `parent_id` isn't `null`. You can use `story_id` to figure out what the top-level item is. ```json { "id": 27941552, "created_at": "2021-07-24T15:08:39.000Z", "created_at_i": 1627139319, "type": "comment", "author": "nine_k", "title": null, "url": null, "text": "<p>I wish ...", "points": null, "parent_id": 27941108, "story_id": 27941108 } ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use HN algolia endpoint to retrieve trees 952189173 | |
886241674 | https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886241674 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3 | IC_kwDODtX3eM400vmK | simonw 9599 | 2021-07-25T18:41:17Z | 2021-07-25T18:41:17Z | MEMBER | Got a TIL out of this: https://til.simonwillison.net/jq/extracting-objects-recursively | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use HN algolia endpoint to retrieve trees 952189173 | |
1489110168 | https://github.com/dogsheep/hacker-news-to-sqlite/pull/6#issuecomment-1489110168 | https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/6 | IC_kwDODtX3eM5YwgSY | xavdid 1231935 | 2023-03-29T18:36:16Z | 2023-03-29T18:36:16Z | NONE | @simonw can you take a look when you have a chance? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add permalink virtual field to items table 1641117021 | |
513437463 | https://github.com/dogsheep/healthkit-to-sqlite/issues/1#issuecomment-513437463 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/1 | MDEyOklzc3VlQ29tbWVudDUxMzQzNzQ2Mw== | simonw 9599 | 2019-07-20T05:19:59Z | 2019-07-20T05:19:59Z | MEMBER | I ran xml_analyser against the XML HealthKit `export.xml` file and got the following results: ```python { 'ActivitySummary': {'attr_counts': {'activeEnergyBurned': 980, 'activeEnergyBurnedGoal': 980, 'activeEnergyBurnedUnit': 980, 'appleExerciseTime': 980, 'appleExerciseTimeGoal': 980, 'appleStandHours': 980, 'appleStandHoursGoal': 980, 'dateComponents': 980}, 'child_counts': {}, 'count': 980, 'parent_counts': {'HealthData': 980}}, 'Correlation': {'attr_counts': {'creationDate': 1, 'endDate': 1, 'sourceName': 1, 'sourceVersion': 1, 'startDate': 1, 'type': 1}, 'child_counts': {'MetadataEntry': 1, 'Record': 2}, 'count': 1, 'parent_counts': {'HealthData': 1}}, 'ExportDate': {'attr_counts': {'value': 1}, 'child_counts': {}, 'count': 1, 'parent_counts': {'HealthData': 1}}, 'HealthData': {'attr_counts': {'locale': 1}, 'child_counts': {'ActivitySummary': 980, 'Correlation': 1, 'ExportDate': 1, 'Me': 1, 'Record': 2672231, 'Workout': 663}, 'count': 1, 'parent_counts': {}}, 'HeartRateVariabilityMetadataList': {'attr_counts': {}, 'child_counts': {'InstantaneousBeatsPerMinute': 93653}, 'count': 2318, … | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use XML Analyser to figure out the structure of the export XML 470637068 | |
550783316 | https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550783316 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10 | MDEyOklzc3VlQ29tbWVudDU1MDc4MzMxNg== | simonw 9599 | 2019-11-07T05:16:56Z | 2019-11-07T05:34:29Z | MEMBER | It looks like Apple changed the location of these in iOS 13 - they are now in separate `.gpx` files: ![2FF70E95-CDEE-4241-A5C5-EE95A862E519](https://user-images.githubusercontent.com/9599/68362042-be12e000-00da-11ea-8925-7397410332d8.png) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Failed to import workout points 519038979 | |
550806302 | https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550806302 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10 | MDEyOklzc3VlQ29tbWVudDU1MDgwNjMwMg== | simonw 9599 | 2019-11-07T05:33:31Z | 2019-11-07T05:33:31Z | MEMBER | The XML now includes references to these new files: ![CBBA54FC-51FB-4BB3-927C-C2CA99237B04](https://user-images.githubusercontent.com/9599/68362716-121ec400-00dd-11ea-9846-387c7cd64c8b.jpeg) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Failed to import workout points 519038979 | |
550824838 | https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550824838 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10 | MDEyOklzc3VlQ29tbWVudDU1MDgyNDgzOA== | simonw 9599 | 2019-11-07T05:47:07Z | 2019-11-07T05:47:07Z | MEMBER | Relevant code: https://github.com/dogsheep/healthkit-to-sqlite/blob/d16f45f06fbae6ec8a78cc9ca7b5b7db0413f139/healthkit_to_sqlite/utils.py#L58-L64 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Failed to import workout points 519038979 | |
550828084 | https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550828084 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10 | MDEyOklzc3VlQ29tbWVudDU1MDgyODA4NA== | simonw 9599 | 2019-11-07T05:49:24Z | 2019-11-07T05:49:24Z | MEMBER | So the fix there is going to be to detect the new `FileReference` element and load the corresponding points data from it. This will be a little tricky because that function will need access to the zip file. It probably won't work at all for the mode where the `export.xml` file is passed directly using the `--xml` option. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Failed to import workout points 519038979 | |
711074031 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711074031 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA3NDAzMQ== | simonw 9599 | 2020-10-17T20:14:01Z | 2020-10-17T20:14:01Z | MEMBER | I'd be happy to teach the tool to look for `export.xml` or `eksport.xml` - and then expand that list to other languages. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
711074306 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711074306 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA3NDMwNg== | simonw 9599 | 2020-10-17T20:16:22Z | 2020-10-17T20:16:22Z | MEMBER | The "first XML file in the root" solution is probably easier though! | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
711078917 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711078917 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA3ODkxNw== | simonw 9599 | 2020-10-17T20:51:55Z | 2020-10-17T20:52:03Z | MEMBER | I switched my phone to Spanish and ran an export - I got a file called `exportar.zip`. Unzipped I still got a `apple_ health_export` folder but the root contained: ``` electrocardiograms/ export_cda.xml exportar.xml workout-routes/ ``` It looks like `export_cda.xml` does not have a translated name, so maybe I can ignore it and look for the _other_ `.xml` file in that directory. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
711079056 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711079056 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA3OTA1Ng== | simonw 9599 | 2020-10-17T20:53:00Z | 2020-10-17T20:53:00Z | MEMBER | I think the safest thing is to sniff the first few lines of the file. Those should be the same no matter the language that was used: ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE HealthData [ ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
711079760 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711079760 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA3OTc2MA== | simonw 9599 | 2020-10-17T21:00:05Z | 2020-10-17T21:00:05Z | MEMBER | Checking for either `<!DOCTYPE HealthData` or `<HealthData` in the first 1000 bytes should do it. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
711081703 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711081703 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA4MTcwMw== | simonw 9599 | 2020-10-17T21:18:35Z | 2020-10-17T21:18:35Z | MEMBER | OK, if you upgrade to the just-released 1.0 this should work (it worked against my Spanish export). | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
711083698 | https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711083698 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcxMTA4MzY5OA== | jarib 572 | 2020-10-17T21:39:15Z | 2020-10-17T21:39:15Z | NONE | Nice! Works perfectly. Thanks for the quick response and great tooling in general. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | export.xml file name varies with different language settings 723838331 | |
877805513 | https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-877805513 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 | MDEyOklzc3VlQ29tbWVudDg3NzgwNTUxMw== | Mjboothaus 956433 | 2021-07-11T14:03:01Z | 2021-07-11T14:03:01Z | NONE | Hi Simon -- just experimenting with your excellent software! Up to this point in time I have been using the (paid) [HealthFit App](https://apps.apple.com/au/app/healthfit/id1202650514) to export my workouts from my Apple Watch, one walk at the time into either .GPX or .FIT format and then using another library to suck it into Python and eventually here to my "Emmaus Walking" app: https://share.streamlit.io/mjboothaus/emmaus_walking/emmaus_walking/app.py I just used `healthkit-to-sqlite` to convert my export.zip file and it all "just worked". I did notice the issue with various numeric fields being stored in the SQLite db as TEXT for now and just thought I'd flag it - but you're already self-reported this issue. Keep up the great work! I was curious if you have any thoughts about periodically exporting "export.zip" and how to just update the SQLite file instead of re-creating it each time. Hopefully Apple will give some thought to managing this data in a more sensible fashion as it grows over time. Ideally one could pull it from iCloud (where it is allegedly being backed up). | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Some workout columns should be float, not text 727848625 | |
877874117 | https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-877874117 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 | MDEyOklzc3VlQ29tbWVudDg3Nzg3NDExNw== | Mjboothaus 956433 | 2021-07-11T23:03:37Z | 2021-07-11T23:03:37Z | NONE | P.s. wondering if you have explored using the spatialite functionality with the location data in workouts? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Some workout columns should be float, not text 727848625 | |
879477586 | https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-879477586 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 | MDEyOklzc3VlQ29tbWVudDg3OTQ3NzU4Ng== | simonw 9599 | 2021-07-13T23:50:06Z | 2021-07-13T23:50:06Z | MEMBER | Unfortunately I don't think updating the database is practical, because the export doesn't include unique identifiers which can be used to update existing records and create new ones. Recreating from scratch works around that limitation. I've not explored workouts with SpatiaLite but that's a really good idea. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Some workout columns should be float, not text 727848625 | |
1163917719 | https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-1163917719 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 | IC_kwDOC8tyDs5FX_mX | Mjboothaus 956433 | 2022-06-23T04:35:02Z | 2022-06-23T04:35:02Z | NONE | In terms of unique identifiers - could you use values stored in `HKMetadataKeySyncIdentifier`? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Some workout columns should be float, not text 727848625 | |
904642396 | https://github.com/dogsheep/healthkit-to-sqlite/pull/13#issuecomment-904642396 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/13 | IC_kwDOC8tyDs41679c | FabianHertwig 32016596 | 2021-08-24T13:27:40Z | 2021-08-24T13:28:26Z | NONE | This would fix #21 and make #22 obsolete. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | SQLite does not have case sensitive columns 743071410 | |
798436026 | https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-798436026 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14 | MDEyOklzc3VlQ29tbWVudDc5ODQzNjAyNg== | n8henrie 1234956 | 2021-03-13T14:23:16Z | 2021-03-13T14:23:16Z | NONE | This PR allows my import to succeed. It looks like some events don't have an `id`, but do have `HKExternalUUID` (which gets turned into `metadata_HKExternalUUID`), so I use this as a fallback. If a record has neither of these, I changed it to just print the record (for debugging) and `return`. For some odd reason this ran fine at first, and now (after removing the generated db and trying again) I'm getting a different error (duplicate column name). Looks like it may have run when I had two successive runs without remembering to delete the db in between. Will try to refactor. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | UNIQUE constraint failed: workouts.id 771608692 | |
798468572 | https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-798468572 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14 | MDEyOklzc3VlQ29tbWVudDc5ODQ2ODU3Mg== | n8henrie 1234956 | 2021-03-13T14:47:31Z | 2021-03-13T14:47:31Z | NONE | Ok, new PR works. I'm not `git` enough so I just force-pushed over the old one. I still end up with a lot of activities that are missing an `id` and therefore skipped (since this is used as the primary key). For example: ``` {'workoutActivityType': 'HKWorkoutActivityTypeRunning', 'duration': '35.31666666666667', 'durationUnit': 'min', 'totalDistance': '4.010870267636999', 'totalDistanceUnit': 'mi', 'totalEnergyBurned': '660.3516235351562', 'totalEnergyBurnedUnit': 'Cal', 'sourceName': 'Strava', 'sourceVersion': '22810', 'creationDate': '2020-07-16 13:38:26 -0700', 'startDate': '2020-07-16 06:38:26 -0700', 'endDate': '2020-07-16 07:13:45 -0700'} ``` I also end up with some unhappy characters (in the skipped events), such as: `'sourceName': 'Nathan’s Apple\xa0Watch',`. But it's successfully making it through the file, and the resulting db opens in datasette, so I'd call that progress. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | UNIQUE constraint failed: workouts.id 771608692 | |
1073123231 | https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-1073123231 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14 | IC_kwDOC8tyDs4_9o-f | lchski 343884 | 2022-03-19T22:39:29Z | 2022-03-19T22:39:29Z | NONE | I have this issue, too, with a fresh export. None of my `Workout` entries in `export.xml` have an `id` key, though [the sample `export.xml` in the tests folder doesn’t either](https://github.com/dogsheep/healthkit-to-sqlite/blob/main/tests/zip_contents/apple_health_export/export.xml#L14-L21), so I don’t think this is the culprit. Indeed, it seems @simonw is using the [`hash_id` function from `sqlite_utils`](https://sqlite-utils.datasette.io/en/stable/python-api.html#setting-an-id-based-on-the-hash-of-the-row-contents), which creates a column (`id`, in this case) based on a hash of the row’s contents. When I run the script, a `workouts` table is created, with one entry: my first workout. No `workout_points` table is created, as [I’d expect from `utils.py`](https://github.com/dogsheep/healthkit-to-sqlite/blob/main/healthkit_to_sqlite/utils.py#L89-L90). I then get essentially the same error as noted in this thread: ```Importing from HealthKit [###################################-] 98% 00:00:01 Traceback (most recent call last): File "/Users/lchski/.pyenv/versions/3.10.3/bin/healthkit-to-sqlite", line 8, in <module> sys.exit(cli()) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/healthkit_to_sqlite/cli.py", line 57, in cli convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/health… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | UNIQUE constraint failed: workouts.id 771608692 | |
1073139067 | https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-1073139067 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14 | IC_kwDOC8tyDs4_9s17 | lchski 343884 | 2022-03-20T00:54:18Z | 2022-03-20T00:54:18Z | NONE | Update: this appears to be because of running the command twice without clearing the DB in between. Tries to insert a Workout that already exists, causing a collision on the (auto-generated) `id` column. Had a different error with a clean DB, likely due to the workout points format; will make a new issue for that. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | UNIQUE constraint failed: workouts.id 771608692 | |
1629123734 | https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-1629123734 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14 | IC_kwDOC8tyDs5hGnSW | philipp-heinrich 44622670 | 2023-07-10T14:46:52Z | 2023-07-10T14:46:52Z | NONE | @simonw any chance to get this fixed soon? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | UNIQUE constraint failed: workouts.id 771608692 | |
513439411 | https://github.com/dogsheep/healthkit-to-sqlite/issues/2#issuecomment-513439411 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/2 | MDEyOklzc3VlQ29tbWVudDUxMzQzOTQxMQ== | simonw 9599 | 2019-07-20T05:58:57Z | 2019-07-20T05:58:57Z | MEMBER | ```python 'Workout': {'attr_counts': {'creationDate': 663, 'device': 230, 'duration': 663, 'durationUnit': 663, 'endDate': 663, 'sourceName': 663, 'sourceVersion': 663, 'startDate': 663, 'totalDistance': 663, 'totalDistanceUnit': 663, 'totalEnergyBurned': 663, 'totalEnergyBurnedUnit': 663, 'workoutActivityType': 663}, 'child_counts': {'MetadataEntry': 1928, 'WorkoutEvent': 2094, 'WorkoutRoute': 340}, 'count': 663, 'parent_counts': {'HealthData': 663}}, 'WorkoutEvent': {'attr_counts': {'date': 2094, 'duration': 837, 'durationUnit': 837, 'type': 2094}, 'child_counts': {}, 'count': 2094, 'parent_counts': {'Workout': 2094}}, 'WorkoutRoute': {'attr_counts': {'creationDate': 340, 'endDate': 340, 'sourceName': 340, 'sourceVersion': 340, 'startDate': 340}, 'child_counts': {'Location': 398683, 'MetadataEntry': 546}, 'count': 340, 'parent_counts': {'Workout': 340}}} ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import workouts 470637152 | |
902355471 | https://github.com/dogsheep/healthkit-to-sqlite/issues/20#issuecomment-902355471 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/20 | IC_kwDOC8tyDs41yNoP | simonw 9599 | 2021-08-20T01:09:07Z | 2021-08-20T01:09:07Z | MEMBER | Workaround: sqlite-utils create-index healthkit.db workout_points -- -date See https://sqlite-utils.datasette.io/en/stable/cli.html#creating-indexes | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add index on workout_points.date 975166271 | |
902356871 | https://github.com/dogsheep/healthkit-to-sqlite/issues/20#issuecomment-902356871 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/20 | IC_kwDOC8tyDs41yN-H | simonw 9599 | 2021-08-20T01:12:48Z | 2021-08-20T01:12:48Z | MEMBER | Also on `workout_points.workout_id` to speed up queries to show all points in a specific workout. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add index on workout_points.date 975166271 | |
903950096 | https://github.com/dogsheep/healthkit-to-sqlite/issues/21#issuecomment-903950096 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/21 | IC_kwDOC8tyDs414S8Q | FabianHertwig 32016596 | 2021-08-23T17:00:59Z | 2021-08-23T17:00:59Z | NONE | I think the issue is that I have records like these: ```xml <Record type="HKQuantityTypeIdentifierDietaryCholesterol" sourceName="MyFitnessPal" sourceVersion="35120" unit="mg" creationDate="2021-07-04 20:55:27 +0200" startDate="2021-07-04 20:55:00 +0200" endDate="2021-07-04 20:55:00 +0200" value="124"> <MetadataEntry key="meal" value="Dinner"/> <MetadataEntry key="Meal" value="Dinner"/> </Record> ``` And if sqlite is case insensitive, then `metadata_meal` and `metadata_Meal` result in the same column. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Duplicate Column 977128935 | |
904641261 | https://github.com/dogsheep/healthkit-to-sqlite/pull/22#issuecomment-904641261 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/22 | IC_kwDOC8tyDs4167rt | FabianHertwig 32016596 | 2021-08-24T13:26:20Z | 2021-08-24T13:26:20Z | NONE | Did not see that #13 fixes the same issue in a similar way. You can decide which one to merge ;) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Make sure that case-insensitive column names are unique 978086284 | |
1464786643 | https://github.com/dogsheep/healthkit-to-sqlite/issues/24#issuecomment-1464786643 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/24 | IC_kwDOC8tyDs5XTt7T | Mjboothaus 956433 | 2023-03-11T02:01:27Z | 2023-03-11T02:01:27Z | NONE | Thanks for reporting this and providing a solution -- I was puzzled by this error when I revisited my walking data and experienced this issues. I haven't tried the fix yet. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | DOC: xml.etree.ElementTree.ParseError due to healthkit version 12 1515883470 | |
1464796494 | https://github.com/dogsheep/healthkit-to-sqlite/issues/24#issuecomment-1464796494 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/24 | IC_kwDOC8tyDs5XTwVO | Mjboothaus 956433 | 2023-03-11T02:23:42Z | 2023-03-11T02:23:42Z | NONE | @simonw - maybe put in some error handling to trap for poorly formed XML (from Apple engineers) so that it suggests that there are problems with export.zip rather than odd looking Python errors :) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | DOC: xml.etree.ElementTree.ParseError due to healthkit version 12 1515883470 | |
513440090 | https://github.com/dogsheep/healthkit-to-sqlite/issues/4#issuecomment-513440090 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/4 | MDEyOklzc3VlQ29tbWVudDUxMzQ0MDA5MA== | simonw 9599 | 2019-07-20T06:11:50Z | 2019-07-20T06:11:50Z | MEMBER | Some examples: https://github.com/dogsheep/healthkit-to-sqlite/blob/d016e70c31cf84ba0f5ec3102546db54a51aaffb/tests/export.xml#L4-L13 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import Records 470640505 | |
513514978 | https://github.com/dogsheep/healthkit-to-sqlite/issues/5#issuecomment-513514978 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDUxMzUxNDk3OA== | simonw 9599 | 2019-07-21T02:55:12Z | 2019-07-21T02:55:12Z | MEMBER | I'm going to show this by default. Users can pass `-s` or `--silent` to disable the progress bar. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add progress bar 470691622 | |
513625406 | https://github.com/dogsheep/healthkit-to-sqlite/issues/5#issuecomment-513625406 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDUxMzYyNTQwNg== | simonw 9599 | 2019-07-22T03:20:16Z | 2019-07-22T03:20:16Z | MEMBER | It now renders like this: ``` Importing from HealthKit [#-----------------------------------] 5% 00:01:33 ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Add progress bar 470691622 | |
513626742 | https://github.com/dogsheep/healthkit-to-sqlite/issues/6#issuecomment-513626742 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/6 | MDEyOklzc3VlQ29tbWVudDUxMzYyNjc0Mg== | simonw 9599 | 2019-07-22T03:28:55Z | 2019-07-22T03:28:55Z | MEMBER | Here's what it looks like now as separate tables: <img width="1050" alt="hello9_and_Populate__endpoint__key_in_ASGI_scope_·_Issue__537_·_simonw_datasette" src="https://user-images.githubusercontent.com/9599/61604322-eba05000-abf5-11e9-8f8b-d0204e9a705a.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Break up records into different tables for each type 470856782 | |
514496725 | https://github.com/dogsheep/healthkit-to-sqlite/issues/7#issuecomment-514496725 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/7 | MDEyOklzc3VlQ29tbWVudDUxNDQ5NjcyNQ== | simonw 9599 | 2019-07-24T06:20:59Z | 2019-07-24T06:20:59Z | MEMBER | I'm using https://pypi.org/project/memory-profiler/ to explore this in more detail: ``` $ pip install memory-profiler matplotlib ``` Then: ``` $ mprof run healthkit-to-sqlite ~/Downloads/healthkit-export.zip healthkit.db $ mprof plot ``` <img width="1128" alt="Screen Shot 2019-07-24 at 8 17 06 AM" src="https://user-images.githubusercontent.com/9599/61769849-f3294a00-adeb-11e9-9026-02c24209530f.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Script uses a lot of RAM 472097220 | |
514498221 | https://github.com/dogsheep/healthkit-to-sqlite/issues/7#issuecomment-514498221 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/7 | MDEyOklzc3VlQ29tbWVudDUxNDQ5ODIyMQ== | simonw 9599 | 2019-07-24T06:26:49Z | 2019-07-24T06:26:49Z | MEMBER | Adding `el.clear()` got me a huge improvement: <img width="1128" alt="Screen Shot 2019-07-24 at 8 23 26 AM" src="https://user-images.githubusercontent.com/9599/61770130-c3c70d00-adec-11e9-886a-93a2cea5bf4b.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Script uses a lot of RAM 472097220 | |
514500253 | https://github.com/dogsheep/healthkit-to-sqlite/issues/7#issuecomment-514500253 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/7 | MDEyOklzc3VlQ29tbWVudDUxNDUwMDI1Mw== | simonw 9599 | 2019-07-24T06:34:28Z | 2019-07-24T06:34:28Z | MEMBER | Clearing the root element each time saved even more: <img width="1128" alt="Screen Shot 2019-07-24 at 8 30 38 AM" src="https://user-images.githubusercontent.com/9599/61770555-d3932100-aded-11e9-8ffe-bebd682f94ed.png"> | {"total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Script uses a lot of RAM 472097220 | |
514745798 | https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-514745798 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9 | MDEyOklzc3VlQ29tbWVudDUxNDc0NTc5OA== | tholo 166463 | 2019-07-24T18:25:36Z | 2019-07-24T18:25:36Z | NONE | This is on macOS 10.14.6, with Python 3.7.4, packages in the virtual environment: ``` Package Version ------------------- ------- aiofiles 0.4.0 Click 7.0 click-default-group 1.2.1 datasette 0.29.2 h11 0.8.1 healthkit-to-sqlite 0.3.1 httptools 0.0.13 hupper 1.8.1 importlib-metadata 0.18 Jinja2 2.10.1 MarkupSafe 1.1.1 Pint 0.8.1 pip 19.2.1 pluggy 0.12.0 setuptools 41.0.1 sqlite-utils 1.7 tabulate 0.8.3 uvicorn 0.8.4 uvloop 0.12.2 websockets 7.0 zipp 0.5.2 ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Too many SQL variables 472429048 | |
515226724 | https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-515226724 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9 | MDEyOklzc3VlQ29tbWVudDUxNTIyNjcyNA== | simonw 9599 | 2019-07-25T21:46:01Z | 2019-07-25T21:46:01Z | MEMBER | I can work around this here (prior to the fix in sqlite-utils) by setting the batch size to something a bit lower here. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Too many SQL variables 472429048 | |
515322294 | https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-515322294 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9 | MDEyOklzc3VlQ29tbWVudDUxNTMyMjI5NA== | simonw 9599 | 2019-07-26T06:07:12Z | 2019-07-26T06:07:12Z | MEMBER | @tholo this should be fixed in just-released version 0.3.2 - could you run a `pip install -U healthkit-to-sqlite` and let me know if it works for you now? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Too many SQL variables 472429048 | |
515370687 | https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-515370687 | https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9 | MDEyOklzc3VlQ29tbWVudDUxNTM3MDY4Nw== | tholo 166463 | 2019-07-26T09:01:19Z | 2019-07-26T09:01:19Z | NONE | Yes, that did fix the issue I was seeing — it will now import my complete HealthKit data. Thorsten > On Jul 25, 2019, at 23:07, Simon Willison <notifications@github.com> wrote: > > @tholo <https://github.com/tholo> this should be fixed in just-released version 0.3.2 - could you run a pip install -U healthkit-to-sqlite and let me know if it works for you now? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub <https://github.com/dogsheep/healthkit-to-sqlite/issues/9?email_source=notifications&email_token=AABIUPYTWYOYSLEAFS7TLMLQBKIBBA5CNFSM4IGSXNNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23TDNQ#issuecomment-515322294>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABIUP7DCBQ37SESQL7D4WTQBKIBBANCNFSM4IGSXNNA>. > | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Too many SQL variables 472429048 | |
538847446 | https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-538847446 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1 | MDEyOklzc3VlQ29tbWVudDUzODg0NzQ0Ng== | simonw 9599 | 2019-10-07T05:41:17Z | 2019-10-07T05:41:17Z | MEMBER | Prototype code: ```python offset = 0 fetched = [] size = 500 while True: page = requests.get("https://getpocket.com/v3/get", { "consumer_key": consumer_key, "access_token": access_token, "sort": "oldest", "detailType": "complete", "count": size, "offset": offset, }).json() print(offset) fetched.append(page) offset += size if not len(page["list"]): break ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use better pagination (and implement progress bar) 503233021 | |
605316146 | https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-605316146 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1 | MDEyOklzc3VlQ29tbWVudDYwNTMxNjE0Ng== | simonw 9599 | 2020-03-27T21:09:15Z | 2020-03-27T21:09:22Z | MEMBER | For a progress bar I need to know how many total items there are. I found an undocumented API for this! `/v3/stats` which returns: ```json { "count_list": 7394, "count_read": 1016, "count_unread": 6378, "status": 1 } ``` I guessed this based on the documented v2 API: https://getpocket.com/api/v2_docs/#stats | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use better pagination (and implement progress bar) 503233021 | |
605325897 | https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-605325897 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1 | MDEyOklzc3VlQ29tbWVudDYwNTMyNTg5Nw== | simonw 9599 | 2020-03-27T21:37:26Z | 2020-03-27T21:38:37Z | MEMBER | I keep getting 503 errors even though I appear to be staying within the rate limit: ``` {'Date': 'Fri, 27 Mar 2020 21:35:57 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'Apache/2.4.25 (Debian)', 'Content-Location': 'get.php', 'Vary': 'negotiate', 'TCN': 'choice', 'Set-Cookie': '...; httponly', 'X-Frame-Options': 'SAMEORIGIN', 'Status': '200 OK', 'X-Limit-Key-Limit': '10000', 'X-Limit-Key-Remaining': '9960', 'X-Limit-Key-Reset': '282', 'X-Source': 'Pocket', 'P3P': 'policyref="/w3c/p3p.xml", CP="ALL CURa ADMa DEVa OUR IND UNI COM NAV INT STA PRE"'} [##----------------------------------] 6% 06:49:27 {'Date': 'Fri, 27 Mar 2020 21:36:06 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '23', 'Connection': 'keep-alive', 'Server': 'Apache/2.4.25 (Debian)', 'Content-Location': 'get.php', 'Vary': 'negotiate', 'TCN': 'choice', 'Set-Cookie': '...', 'X-Frame-Options': 'SAMEORIGIN', 'X-Error': 'Pocket is currently under heavy load. Please wait a moment and try again.', 'X-Error-Code': '199', 'Status': '503 Service Unavailable', 'X-Source': 'Pocket', 'P3P': 'policyref="/w3c/p3p.xml", CP="ALL CURa ADMa DEVa OUR IND UNI COM NAV INT STA PRE"'} ``` I'm going to try doing a few automatic retries any time I see a 503 error. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use better pagination (and implement progress bar) 503233021 | |
605327655 | https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-605327655 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1 | MDEyOklzc3VlQ29tbWVudDYwNTMyNzY1NQ== | simonw 9599 | 2020-03-27T21:42:49Z | 2020-03-27T21:42:49Z | MEMBER | Or maybe it was because of the current Google Cloud outage? https://news.ycombinator.com/item?id=22706677 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Use better pagination (and implement progress bar) 503233021 | |
1221622873 | https://github.com/dogsheep/pocket-to-sqlite/issues/10#issuecomment-1221622873 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/10 | IC_kwDODLZ_YM5I0HxZ | simonw 9599 | 2022-08-21T21:19:25Z | 2022-08-21T21:19:25Z | MEMBER | Agreed, that would be a much better implementation. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | When running `auth` command, don't overwrite an existing auth.json file 1246826792 | |
1221623052 | https://github.com/dogsheep/pocket-to-sqlite/issues/10#issuecomment-1221623052 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/10 | IC_kwDODLZ_YM5I0H0M | simonw 9599 | 2022-08-21T21:20:33Z | 2022-08-21T21:20:33Z | MEMBER | That was clearly the intention from the description of this issue: - #4 | {"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | When running `auth` command, don't overwrite an existing auth.json file 1246826792 | |
1239516561 | https://github.com/dogsheep/pocket-to-sqlite/issues/10#issuecomment-1239516561 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/10 | IC_kwDODLZ_YM5J4YWR | ashanan 11887 | 2022-09-07T15:07:38Z | 2022-09-07T15:07:38Z | NONE | Thanks! | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | When running `auth` command, don't overwrite an existing auth.json file 1246826792 | |
1221521377 | https://github.com/dogsheep/pocket-to-sqlite/issues/11#issuecomment-1221521377 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/11 | IC_kwDODLZ_YM5Izu_h | fernand0 2467 | 2022-08-21T10:51:37Z | 2022-08-21T10:51:37Z | NONE | I didn't see there is a PR about this: https://github.com/dogsheep/pocket-to-sqlite/pull/7 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | -a option is used for "--auth" and for "--all" 1345452427 | |
1221621466 | https://github.com/dogsheep/pocket-to-sqlite/issues/11#issuecomment-1221621466 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/11 | IC_kwDODLZ_YM5I0Hba | simonw 9599 | 2022-08-21T21:09:47Z | 2022-08-21T21:09:47Z | MEMBER | Great catch, thanks. I'm going to use it to mean `--auth` - since other tools in the Dogsheep family have the same convention. `--all` will be the only way to specify all. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | -a option is used for "--auth" and for "--all" 1345452427 | |
1221621529 | https://github.com/dogsheep/pocket-to-sqlite/issues/11#issuecomment-1221621529 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/11 | IC_kwDODLZ_YM5I0HcZ | simonw 9599 | 2022-08-21T21:10:15Z | 2022-08-21T21:11:26Z | MEMBER | Just saw that's what's implemented here already! - #7 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | -a option is used for "--auth" and for "--all" 1345452427 | |
1627563202 | https://github.com/dogsheep/pocket-to-sqlite/issues/12#issuecomment-1627563202 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/12 | IC_kwDODLZ_YM5hAqTC | simonw 9599 | 2023-07-09T01:14:27Z | 2023-07-09T01:14:27Z | MEMBER | I tested this locally with `python -m build` and then `pip install ...whl` in a fresh virtual environment. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Switch to pyproject.toml 1795187493 | |
1627564127 | https://github.com/dogsheep/pocket-to-sqlite/issues/12#issuecomment-1627564127 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/12 | IC_kwDODLZ_YM5hAqhf | simonw 9599 | 2023-07-09T01:19:42Z | 2023-07-09T01:19:42Z | MEMBER | https://github.com/dogsheep/pocket-to-sqlite/tree/0.2.3 and https://pypi.org/project/pocket-to-sqlite/0.2.3/ | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Switch to pyproject.toml 1795187493 | |
538847796 | https://github.com/dogsheep/pocket-to-sqlite/issues/2#issuecomment-538847796 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/2 | MDEyOklzc3VlQ29tbWVudDUzODg0Nzc5Ng== | simonw 9599 | 2019-10-07T05:43:30Z | 2019-10-07T05:43:30Z | MEMBER | We can persist the `since` value in its own single-row table. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Track and use the 'since' value 503234169 | |
605337941 | https://github.com/dogsheep/pocket-to-sqlite/issues/2#issuecomment-605337941 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/2 | MDEyOklzc3VlQ29tbWVudDYwNTMzNzk0MQ== | simonw 9599 | 2020-03-27T22:16:32Z | 2020-03-27T22:16:32Z | MEMBER | Need to test this. I have 7,394 items in my database right now. I'm going to save a new thing. Then I ran this: ``` pocket-to-sqlite fetch pocket-simon.db ``` And it worked! | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Track and use the 'since' value 503234169 | |
605338322 | https://github.com/dogsheep/pocket-to-sqlite/issues/2#issuecomment-605338322 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/2 | MDEyOklzc3VlQ29tbWVudDYwNTMzODMyMg== | simonw 9599 | 2020-03-27T22:18:02Z | 2020-03-27T22:18:02Z | MEMBER | Just needs documentation now. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Track and use the 'since' value 503234169 | |
684424396 | https://github.com/dogsheep/pocket-to-sqlite/issues/3#issuecomment-684424396 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/3 | MDEyOklzc3VlQ29tbWVudDY4NDQyNDM5Ng== | simonw 9599 | 2020-09-01T06:17:45Z | 2020-09-01T06:17:45Z | MEMBER | It looks like I could ignore the `image` column and synthesize a unique key from the data in the `images` column using `$item_id/$image_id`. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Extract images into separate tables 503243784 | |
684425714 | https://github.com/dogsheep/pocket-to-sqlite/issues/5#issuecomment-684425714 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDY4NDQyNTcxNA== | simonw 9599 | 2020-09-01T06:18:32Z | 2020-09-01T06:18:32Z | MEMBER | Good suggestion, I'll setup a demo somewhere. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Set up a demo 629473827 | |
1221621700 | https://github.com/dogsheep/pocket-to-sqlite/pull/7#issuecomment-1221621700 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/7 | IC_kwDODLZ_YM5I0HfE | simonw 9599 | 2022-08-21T21:11:12Z | 2022-08-21T21:11:12Z | MEMBER | I thought this might need a documentation update but `--all` is already covered: https://github.com/dogsheep/pocket-to-sqlite/blob/0.2.1/README.md | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Fixed conflicting CLI flags 750141615 | |
774726123 | https://github.com/dogsheep/pocket-to-sqlite/issues/9#issuecomment-774726123 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/9 | MDEyOklzc3VlQ29tbWVudDc3NDcyNjEyMw== | jfeiwell 12669260 | 2021-02-07T18:21:08Z | 2021-02-07T18:21:08Z | NONE | @simonw any ideas here? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | SSL Error 801780625 | |
774730656 | https://github.com/dogsheep/pocket-to-sqlite/issues/9#issuecomment-774730656 | https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/9 | MDEyOklzc3VlQ29tbWVudDc3NDczMDY1Ng== | merwok 635179 | 2021-02-07T18:45:04Z | 2021-02-07T18:45:04Z | NONE | That URL uses TLS 1.3, but maybe only if the client supports it. It could be your Python version or your SSL library that’s not recent enough. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | SSL Error 801780625 | |
707326192 | https://github.com/dogsheep/swarm-to-sqlite/pull/10#issuecomment-707326192 | https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/10 | MDEyOklzc3VlQ29tbWVudDcwNzMyNjE5Mg== | mattiaborsoi 29426418 | 2020-10-12T20:20:02Z | 2020-10-12T20:20:02Z | CONTRIBUTOR | This closes issue #8 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Update utils.py to fix sqlite3.OperationalError 719637258 | |
727692413 | https://github.com/dogsheep/swarm-to-sqlite/issues/11#issuecomment-727692413 | https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDcyNzY5MjQxMw== | simonw 9599 | 2020-11-16T02:15:22Z | 2020-11-16T02:15:22Z | MEMBER | Thanks, I'll look into this. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Error thrown: sqlite3.OperationalError: table users has no column named lastName 743400216 | |
761967094 | https://github.com/dogsheep/swarm-to-sqlite/issues/11#issuecomment-761967094 | https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/11 | MDEyOklzc3VlQ29tbWVudDc2MTk2NzA5NA== | simonw 9599 | 2021-01-18T04:11:13Z | 2021-01-18T04:11:13Z | MEMBER | I just got a similar error: ``` File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/swarm_to_sqlite/utils.py", line 79, in save_checkin checkins_table.m2m("users", user, m2m_table="with", pk="id") File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 2048, in m2m id = other_table.insert(record, pk=pk, replace=True).last_pk File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1781, in insert return self.insert_all( File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1899, in insert_all self.insert_chunk( File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1709, in insert_chunk result = self.db.execute(query, params) File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 226, in execute return self.conn.execute(sql, parameters) pysqlite3.dbapi2.OperationalError: table users has no column named countryCode ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Error thrown: sqlite3.OperationalError: table users has no column named lastName 743400216 | |
941274088 | https://github.com/dogsheep/swarm-to-sqlite/issues/12#issuecomment-941274088 | https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/12 | IC_kwDODD6af844GrPo | fs111 33631 | 2021-10-12T18:31:57Z | 2021-10-12T18:31:57Z | NONE | I am running into the same problem. Is there any workaround? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 403 when getting token 951817328 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
author_association 4 ✖