github: issue_comments: 9,947 rows sorted by issue

9,947 rows sorted by issue_url

Search:

descending

id	html_url	issue_url ▼	node_id	user	created_at	updated_at	author_association	body	reactions	issue
790312268	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA==	simonw 9599	2021-03-04T05:48:16Z	2021-03-04T05:48:16Z	MEMBER	Wow, my mbox is a 10.35 GB download!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790369076	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng==	simonw 9599	2021-03-04T06:54:46Z	2021-03-04T06:54:46Z	MEMBER	The Rich-powered progress bar is pretty: ![rich](https://user-images.githubusercontent.com/9599/109923307-71f69200-7c73-11eb-9ee2-8f0a240f3994.gif)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790370485	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ==	simonw 9599	2021-03-04T06:57:25Z	2021-03-04T06:57:48Z	MEMBER	The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790372621	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ==	simonw 9599	2021-03-04T07:01:18Z	2021-03-04T07:01:18Z	MEMBER	I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in `healthkit-to-sqlite` - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file. https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the `progress_callback()` bit) is where that happens. It can be a bit of a convoluted pattern, and I'm not at all sure it would work for `mbox` files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790373024	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA==	simonw 9599	2021-03-04T07:01:58Z	2021-03-04T07:04:06Z	MEMBER	I got 9 warnings that look like this: ``` Errors: 1 Traceback (most recent call last): File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 103, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 167, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the `mbox` and see what was going on.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790378658	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA==	simonw 9599	2021-03-04T07:12:48Z	2021-03-04T07:12:48Z	MEMBER	It looks like the `body` is being loaded into a BLOB column - so in Datasette default it looks like this: <img width="1650" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109924808-b4b96980-7c75-11eb-8c9e-307f2ae32d5a.png"> If I `datasette install datasette-render-binary` and then try again I get this: <img width="1487" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109924944-ea5e5280-7c75-11eb-9a32-404f3d68455f.png"> It would be great if we could store the `body` as unicode text instead. May have to do something clever to decode it based on some kind of charset header?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790379629	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ==	simonw 9599	2021-03-04T07:14:41Z	2021-03-04T07:14:41Z	MEMBER	Confirmed: removing the `len()` call does not speed things up, so it's reading through the entire file for some other purpose too.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790380839	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ==	simonw 9599	2021-03-04T07:17:05Z	2021-03-04T07:17:05Z	MEMBER	Looks like you're doing this: ```python elif message.get_content_type() == "text/plain": body = message.get_payload(decode=True) ``` So presumably that decodes to a unicode string? I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790389335	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790389335	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM4OTMzNQ==	UtahDave 306240	2021-03-04T07:32:04Z	2021-03-04T07:32:04Z	NONE	> The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: > > https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 > > I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long. The wait is from python loading the mbox file. This happens regardless if you're getting the length of the mbox. The mbox module is on the slow side. It is possible to do one's own parsing of the mbox, but I kind of wanted to avoid doing that.	{"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790391711	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ==	UtahDave 306240	2021-03-04T07:36:24Z	2021-03-04T07:36:24Z	NONE	> Looks like you're doing this: > > ```python > elif message.get_content_type() == "text/plain": > body = message.get_payload(decode=True) > ``` > > So presumably that decodes to a unicode string? > > I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that. Also, I noticed after I opened this PR that the `message.get_payload()` is being deprecated in favor of `message.get_content()` or something like that. I'll see if that handles the decoding better, too. Thanks for the feedback. I should have time tomorrow to put together some improvements.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790668263	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw==	simonw 9599	2021-03-04T14:43:58Z	2021-03-04T14:43:58Z	MEMBER	I added this code to output a message ID on errors: ```diff print("Errors: {}".format(num_errors)) print(traceback.format_exc()) + print("Message-Id: {}".format(email.get("Message-Id", "None"))) continue ``` Having found a message ID that had an error, I ran this command to see the context: rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox This was for the following error: ``` File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 102, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 178, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` Here's what I spotted in the `ripgrep` output: ``` 177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI> 177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) 177133572-X-Mailer: IncrediMail (5002253) ``` So it could it be that `_parsedate_tz` is having trouble with that `Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)` string.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790669767	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw==	simonw 9599	2021-03-04T14:46:06Z	2021-03-04T14:46:06Z	MEMBER	Solution could be to pre-process that string by splitting on `(` and dropping everything afterwards, assuming that the `(...)` bit isn't necessary for correctly parsing the date.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790693674	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA==	simonw 9599	2021-03-04T15:18:36Z	2021-03-04T15:18:36Z	MEMBER	I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790695126	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg==	simonw 9599	2021-03-04T15:20:42Z	2021-03-04T15:20:42Z	MEMBER	I'm not sure why but my most recent import, when displayed in Datasette, looks like this: <img width="574" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109985836-0ab00080-7cba-11eb-97d5-0631a0835b61.png"> Sorting by `id` in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
791089881	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791089881	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MTA4OTg4MQ==	maxhawkins 28565	2021-03-05T02:03:19Z	2021-03-05T02:03:19Z	NONE	I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
791530093	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791530093	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDc5MTUzMDA5Mw==	UtahDave 306240	2021-03-05T16:28:07Z	2021-03-05T16:28:07Z	NONE	> I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. > > Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once? @maxhawkins a limitation of the python mbox module is it loads the entire mbox into memory. I did find another approach to this problem that didn't use the builtin python mbox module and created a generator so that it didn't have to load the whole mbox into memory. I was hoping to use standard library modules, but this might be a good reason to investigate that approach a bit more. My worry is making sure a custom processor handles all the ins and outs of the mbox format correctly. Hm. As I'm writing this, I thought of something. I think I can parse each message one at a time, and then use an mbox function to load each message using the python mbox module. That way the mbox module can still deal with the specifics of the mbox format, but I can use a generator. I'll give that a try. Thanks for the feedback @maxhawkins and @simonw. I'll give that a try. @simonw can we hold off on merging this until I can test this new approach?	{"total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
849708617	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-849708617	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDg0OTcwODYxNw==	maxhawkins 28565	2021-05-27T15:01:42Z	2021-05-27T15:01:42Z	NONE	Any updates?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
884672647	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-884672647	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	IC_kwDODFE5qs40uwiH	maxhawkins 28565	2021-07-22T05:56:31Z	2021-07-22T14:03:08Z	NONE	How does this commit look? https://github.com/maxhawkins/google-takeout-to-sqlite/commit/72802a83fee282eb5d02d388567731ba4301050d It seems that Takeout's mbox format is pretty simple, so we can get away with just splitting the file on lines begining with `From `. My commit just splits the file every time a line starts with `From ` and uses `email.message_from_bytes` to parse each chunk. I was able to load a 12GB takeout mbox without the program using more than a couple hundred MB of memory during the import process. It does make us lose the progress bar, but maybe I can add that back in a later commit.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
885022230	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885022230	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	IC_kwDODFE5qs40wF4W	maxhawkins 28565	2021-07-22T15:51:46Z	2021-07-22T15:51:46Z	NONE	One thing I noticed is this importer doesn't save attachments along with the body of the emails. It would be nice if those got stored as blobs in a separate attachments table so attachments can be included while fetching search results.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
885094284	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885094284	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	IC_kwDODFE5qs40wXeM	maxhawkins 28565	2021-07-22T17:41:32Z	2021-07-22T17:41:32Z	NONE	I added a follow-up commit that deals with emails that don't have a `Date` header: https://github.com/maxhawkins/google-takeout-to-sqlite/commit/4bc70103582c10802c85a523ef1e99a8a2154aa9	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
885098025	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885098025	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	IC_kwDODFE5qs40wYYp	UtahDave 306240	2021-07-22T17:47:50Z	2021-07-22T17:47:50Z	NONE	Hi @maxhawkins , I'm sorry, I haven't had any time to work on this. I'll have some time tomorrow to test your commits. I think they look great. I'm great with your commits superseding my initial attempt here.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
888075098	https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-888075098	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5	IC_kwDODFE5qs407vNa	maxhawkins 28565	2021-07-28T07:18:56Z	2021-07-28T07:18:56Z	NONE	> I'm not sure why but my most recent import, when displayed in Datasette, looks like this: > > <img alt="mbox__mbox_emails__753_446_rows" width="574" src="https://user-images.githubusercontent.com/9599/109985836-0ab00080-7cba-11eb-97d5-0631a0835b61.png"> I did some investigation into this issue and made a fix [here](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8ee555c2889a38ff42b95664ee074b4a01a82f06). The problem was that some messages (like gchat logs) don't have a `Message-Id` and we need to use `X-GM-THRID` as the pkey instead. @simonw While looking into this I found something unexpected about how sqlite_utils handles upserts if the pkey column is `None`. When the pkey is NULL I'd expect the function to either use rowid or throw an exception. Instead, it seems upsert_all creates a row where all columns are NULL instead of using the values provided as parameters.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	WIP: Add Gmail takeout mbox import 813880401
790384087	https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6	MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw==	simonw 9599	2021-03-04T07:22:51Z	2021-03-04T07:22:51Z	MEMBER	#3 also mentions the conflicting version with other tools.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Upgrade to latest sqlite-utils 821841046
894581223	https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-894581223	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8	IC_kwDODFE5qs41Ujnn	maxhawkins 28565	2021-08-07T00:57:48Z	2021-08-07T00:57:48Z	NONE	Just added two more fixes: * Added parsing for rfc 2047 encoded unicode headers * Body is now stored as TEXT rather than a BLOB regardless of what order the messages are parsed in. I was able to run this on my Takeout export and everything seems to work fine. @simonw let me know if this looks good to merge.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add Gmail takeout mbox import (v2) 954546309
896378525	https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-896378525	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8	IC_kwDODFE5qs41baad	maxhawkins 28565	2021-08-10T23:28:45Z	2021-08-10T23:28:45Z	NONE	I added parsing of text/html emails using BeautifulSoup. Around half of the emails in my archive don't include a text/plain payload so adding html parsing makes a good chunk of them searchable.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add Gmail takeout mbox import (v2) 954546309
1002735370	https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1002735370	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8	IC_kwDODFE5qs47xIcK	Btibert3 203343	2021-12-29T18:58:23Z	2021-12-29T18:58:23Z	NONE	@maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add Gmail takeout mbox import (v2) 954546309
1003437288	https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1003437288	https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8	IC_kwDODFE5qs47zzzo	maxhawkins 28565	2021-12-31T19:06:20Z	2021-12-31T19:06:20Z	NONE	> @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text. Shouldn't be hard. The easiest way is probably to remove the `if body.content_type == "text/html"` clause from [utils.py:254](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8e6d487b697ce2e8ad885acf613a157bfba84c59#diff-25ad9dd1ced1b8bfc37fda8444819c803232c08891e4af3d4064aa205d8174eaR254) and just return content directly without parsing.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add Gmail takeout mbox import (v2) 954546309
602094386	https://github.com/dogsheep/hacker-news-to-sqlite/issues/1#issuecomment-602094386	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/1	MDEyOklzc3VlQ29tbWVudDYwMjA5NDM4Ng==	simonw 9599	2020-03-21T19:47:46Z	2020-03-21T19:47:46Z	MEMBER	Released as 0.3.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Set up full text search 585526292
886135562	https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886135562	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2	IC_kwDODtX3eM400VsK	simonw 9599	2021-07-25T02:01:11Z	2021-07-25T02:01:11Z	MEMBER	That page doesn't have an API but does look easy to scrape. The other option here is the HN Search API powered by Algolia, documented at https://hn.algolia.com/api	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Command for fetching Hacker News threads from the search API 952179830
886135922	https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886135922	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2	IC_kwDODtX3eM400Vxy	simonw 9599	2021-07-25T02:06:20Z	2021-07-25T02:06:20Z	MEMBER	https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url looks like it does what I want. https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url&hitsPerPage=1000 - returns 1000 at once. Otherwise you have to paginate using `&page=2` etc - up to `nbPages` pages. https://www.algolia.com/doc/api-reference/api-parameters/hitsPerPage/ says 1000 is the maximum.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Command for fetching Hacker News threads from the search API 952179830
886136224	https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886136224	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2	IC_kwDODtX3eM400V2g	simonw 9599	2021-07-25T02:08:29Z	2021-07-25T02:08:29Z	MEMBER	Prototype: curl "https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url&hitsPerPage=1000" \| \ jq .hits \| sqlite-utils insert hn.db items - --pk objectID --alter	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Command for fetching Hacker News threads from the search API 952179830
886140431	https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886140431	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2	IC_kwDODtX3eM400W4P	simonw 9599	2021-07-25T03:12:57Z	2021-07-25T03:12:57Z	MEMBER	I'm going to build a general-purpose `hacker-new-to-sqlite search ...` command, where one of the options is to search within the URL.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Command for fetching Hacker News threads from the search API 952179830
886142671	https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886142671	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3	IC_kwDODtX3eM400XbP	simonw 9599	2021-07-25T03:51:05Z	2021-07-25T03:51:05Z	MEMBER	Prototype: curl 'https://hn.algolia.com/api/v1/items/27941108' \ \| jq '[recurse(.children[]) \| del(.children)]' \ \| sqlite-utils insert hn.db items - --pk id	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use HN algolia endpoint to retrieve trees 952189173
886237834	https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886237834	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3	IC_kwDODtX3eM400uqK	simonw 9599	2021-07-25T18:05:32Z	2021-07-25T18:05:32Z	MEMBER	If you hit the endpoint for a comment that's part of a thread you get that comment and its recursive children: https://hn.algolia.com/api/v1/items/27941552 You can tell that it's not the top-level because the `parent_id` isn't `null`. You can use `story_id` to figure out what the top-level item is. ```json { "id": 27941552, "created_at": "2021-07-24T15:08:39.000Z", "created_at_i": 1627139319, "type": "comment", "author": "nine_k", "title": null, "url": null, "text": "<p>I wish ...", "points": null, "parent_id": 27941108, "story_id": 27941108 } ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use HN algolia endpoint to retrieve trees 952189173
886241674	https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886241674	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3	IC_kwDODtX3eM400vmK	simonw 9599	2021-07-25T18:41:17Z	2021-07-25T18:41:17Z	MEMBER	Got a TIL out of this: https://til.simonwillison.net/jq/extracting-objects-recursively	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use HN algolia endpoint to retrieve trees 952189173
1489110168	https://github.com/dogsheep/hacker-news-to-sqlite/pull/6#issuecomment-1489110168	https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/6	IC_kwDODtX3eM5YwgSY	xavdid 1231935	2023-03-29T18:36:16Z	2023-03-29T18:36:16Z	NONE	@simonw can you take a look when you have a chance?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add permalink virtual field to items table 1641117021
513437463	https://github.com/dogsheep/healthkit-to-sqlite/issues/1#issuecomment-513437463	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/1	MDEyOklzc3VlQ29tbWVudDUxMzQzNzQ2Mw==	simonw 9599	2019-07-20T05:19:59Z	2019-07-20T05:19:59Z	MEMBER	I ran xml_analyser against the XML HealthKit `export.xml` file and got the following results: ```python { 'ActivitySummary': {'attr_counts': {'activeEnergyBurned': 980, 'activeEnergyBurnedGoal': 980, 'activeEnergyBurnedUnit': 980, 'appleExerciseTime': 980, 'appleExerciseTimeGoal': 980, 'appleStandHours': 980, 'appleStandHoursGoal': 980, 'dateComponents': 980}, 'child_counts': {}, 'count': 980, 'parent_counts': {'HealthData': 980}}, 'Correlation': {'attr_counts': {'creationDate': 1, 'endDate': 1, 'sourceName': 1, 'sourceVersion': 1, 'startDate': 1, 'type': 1}, 'child_counts': {'MetadataEntry': 1, 'Record': 2}, 'count': 1, 'parent_counts': {'HealthData': 1}}, 'ExportDate': {'attr_counts': {'value': 1}, 'child_counts': {}, 'count': 1, 'parent_counts': {'HealthData': 1}}, 'HealthData': {'attr_counts': {'locale': 1}, 'child_counts': {'ActivitySummary': 980, 'Correlation': 1, 'ExportDate': 1, 'Me': 1, 'Record': 2672231, 'Workout': 663}, 'count': 1, 'parent_counts': {}}, 'HeartRateVariabilityMetadataList': {'attr_counts': {}, 'child_counts': {'InstantaneousBeatsPerMinute': 93653}, 'count': 2318, …	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use XML Analyser to figure out the structure of the export XML 470637068
550783316	https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550783316	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10	MDEyOklzc3VlQ29tbWVudDU1MDc4MzMxNg==	simonw 9599	2019-11-07T05:16:56Z	2019-11-07T05:34:29Z	MEMBER	It looks like Apple changed the location of these in iOS 13 - they are now in separate `.gpx` files: ![2FF70E95-CDEE-4241-A5C5-EE95A862E519](https://user-images.githubusercontent.com/9599/68362042-be12e000-00da-11ea-8925-7397410332d8.png)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Failed to import workout points 519038979
550806302	https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550806302	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10	MDEyOklzc3VlQ29tbWVudDU1MDgwNjMwMg==	simonw 9599	2019-11-07T05:33:31Z	2019-11-07T05:33:31Z	MEMBER	The XML now includes references to these new files: ![CBBA54FC-51FB-4BB3-927C-C2CA99237B04](https://user-images.githubusercontent.com/9599/68362716-121ec400-00dd-11ea-9846-387c7cd64c8b.jpeg)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Failed to import workout points 519038979
550824838	https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550824838	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10	MDEyOklzc3VlQ29tbWVudDU1MDgyNDgzOA==	simonw 9599	2019-11-07T05:47:07Z	2019-11-07T05:47:07Z	MEMBER	Relevant code: https://github.com/dogsheep/healthkit-to-sqlite/blob/d16f45f06fbae6ec8a78cc9ca7b5b7db0413f139/healthkit_to_sqlite/utils.py#L58-L64	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Failed to import workout points 519038979
550828084	https://github.com/dogsheep/healthkit-to-sqlite/issues/10#issuecomment-550828084	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/10	MDEyOklzc3VlQ29tbWVudDU1MDgyODA4NA==	simonw 9599	2019-11-07T05:49:24Z	2019-11-07T05:49:24Z	MEMBER	So the fix there is going to be to detect the new `FileReference` element and load the corresponding points data from it. This will be a little tricky because that function will need access to the zip file. It probably won't work at all for the mode where the `export.xml` file is passed directly using the `--xml` option.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Failed to import workout points 519038979
711074031	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711074031	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA3NDAzMQ==	simonw 9599	2020-10-17T20:14:01Z	2020-10-17T20:14:01Z	MEMBER	I'd be happy to teach the tool to look for `export.xml` or `eksport.xml` - and then expand that list to other languages.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
711074306	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711074306	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA3NDMwNg==	simonw 9599	2020-10-17T20:16:22Z	2020-10-17T20:16:22Z	MEMBER	The "first XML file in the root" solution is probably easier though!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
711078917	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711078917	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA3ODkxNw==	simonw 9599	2020-10-17T20:51:55Z	2020-10-17T20:52:03Z	MEMBER	I switched my phone to Spanish and ran an export - I got a file called `exportar.zip`. Unzipped I still got a `apple_ health_export` folder but the root contained: ``` electrocardiograms/ export_cda.xml exportar.xml workout-routes/ ``` It looks like `export_cda.xml` does not have a translated name, so maybe I can ignore it and look for the _other_ `.xml` file in that directory.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
711079056	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711079056	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA3OTA1Ng==	simonw 9599	2020-10-17T20:53:00Z	2020-10-17T20:53:00Z	MEMBER	I think the safest thing is to sniff the first few lines of the file. Those should be the same no matter the language that was used: ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE HealthData [ ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
711079760	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711079760	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA3OTc2MA==	simonw 9599	2020-10-17T21:00:05Z	2020-10-17T21:00:05Z	MEMBER	Checking for either `<!DOCTYPE HealthData` or `<HealthData` in the first 1000 bytes should do it.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
711081703	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711081703	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA4MTcwMw==	simonw 9599	2020-10-17T21:18:35Z	2020-10-17T21:18:35Z	MEMBER	OK, if you upgrade to the just-released 1.0 this should work (it worked against my Spanish export).	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
711083698	https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711083698	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcxMTA4MzY5OA==	jarib 572	2020-10-17T21:39:15Z	2020-10-17T21:39:15Z	NONE	Nice! Works perfectly. Thanks for the quick response and great tooling in general.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	export.xml file name varies with different language settings 723838331
877805513	https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-877805513	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12	MDEyOklzc3VlQ29tbWVudDg3NzgwNTUxMw==	Mjboothaus 956433	2021-07-11T14:03:01Z	2021-07-11T14:03:01Z	NONE	Hi Simon -- just experimenting with your excellent software! Up to this point in time I have been using the (paid) [HealthFit App](https://apps.apple.com/au/app/healthfit/id1202650514) to export my workouts from my Apple Watch, one walk at the time into either .GPX or .FIT format and then using another library to suck it into Python and eventually here to my "Emmaus Walking" app: https://share.streamlit.io/mjboothaus/emmaus_walking/emmaus_walking/app.py I just used `healthkit-to-sqlite` to convert my export.zip file and it all "just worked". I did notice the issue with various numeric fields being stored in the SQLite db as TEXT for now and just thought I'd flag it - but you're already self-reported this issue. Keep up the great work! I was curious if you have any thoughts about periodically exporting "export.zip" and how to just update the SQLite file instead of re-creating it each time. Hopefully Apple will give some thought to managing this data in a more sensible fashion as it grows over time. Ideally one could pull it from iCloud (where it is allegedly being backed up).	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Some workout columns should be float, not text 727848625
877874117	https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-877874117	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12	MDEyOklzc3VlQ29tbWVudDg3Nzg3NDExNw==	Mjboothaus 956433	2021-07-11T23:03:37Z	2021-07-11T23:03:37Z	NONE	P.s. wondering if you have explored using the spatialite functionality with the location data in workouts?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Some workout columns should be float, not text 727848625
879477586	https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-879477586	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12	MDEyOklzc3VlQ29tbWVudDg3OTQ3NzU4Ng==	simonw 9599	2021-07-13T23:50:06Z	2021-07-13T23:50:06Z	MEMBER	Unfortunately I don't think updating the database is practical, because the export doesn't include unique identifiers which can be used to update existing records and create new ones. Recreating from scratch works around that limitation. I've not explored workouts with SpatiaLite but that's a really good idea.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Some workout columns should be float, not text 727848625
1163917719	https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-1163917719	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12	IC_kwDOC8tyDs5FX_mX	Mjboothaus 956433	2022-06-23T04:35:02Z	2022-06-23T04:35:02Z	NONE	In terms of unique identifiers - could you use values stored in `HKMetadataKeySyncIdentifier`?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Some workout columns should be float, not text 727848625
904642396	https://github.com/dogsheep/healthkit-to-sqlite/pull/13#issuecomment-904642396	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/13	IC_kwDOC8tyDs41679c	FabianHertwig 32016596	2021-08-24T13:27:40Z	2021-08-24T13:28:26Z	NONE	This would fix #21 and make #22 obsolete.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	SQLite does not have case sensitive columns 743071410
798436026	https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-798436026	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14	MDEyOklzc3VlQ29tbWVudDc5ODQzNjAyNg==	n8henrie 1234956	2021-03-13T14:23:16Z	2021-03-13T14:23:16Z	NONE	This PR allows my import to succeed. It looks like some events don't have an `id`, but do have `HKExternalUUID` (which gets turned into `metadata_HKExternalUUID`), so I use this as a fallback. If a record has neither of these, I changed it to just print the record (for debugging) and `return`. For some odd reason this ran fine at first, and now (after removing the generated db and trying again) I'm getting a different error (duplicate column name). Looks like it may have run when I had two successive runs without remembering to delete the db in between. Will try to refactor.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	UNIQUE constraint failed: workouts.id 771608692
798468572	https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-798468572	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14	MDEyOklzc3VlQ29tbWVudDc5ODQ2ODU3Mg==	n8henrie 1234956	2021-03-13T14:47:31Z	2021-03-13T14:47:31Z	NONE	Ok, new PR works. I'm not `git` enough so I just force-pushed over the old one. I still end up with a lot of activities that are missing an `id` and therefore skipped (since this is used as the primary key). For example: ``` {'workoutActivityType': 'HKWorkoutActivityTypeRunning', 'duration': '35.31666666666667', 'durationUnit': 'min', 'totalDistance': '4.010870267636999', 'totalDistanceUnit': 'mi', 'totalEnergyBurned': '660.3516235351562', 'totalEnergyBurnedUnit': 'Cal', 'sourceName': 'Strava', 'sourceVersion': '22810', 'creationDate': '2020-07-16 13:38:26 -0700', 'startDate': '2020-07-16 06:38:26 -0700', 'endDate': '2020-07-16 07:13:45 -0700'} ``` I also end up with some unhappy characters (in the skipped events), such as: `'sourceName': 'Nathan’s Apple\xa0Watch',`. But it's successfully making it through the file, and the resulting db opens in datasette, so I'd call that progress.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	UNIQUE constraint failed: workouts.id 771608692
1073123231	https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-1073123231	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14	IC_kwDOC8tyDs4_9o-f	lchski 343884	2022-03-19T22:39:29Z	2022-03-19T22:39:29Z	NONE	I have this issue, too, with a fresh export. None of my `Workout` entries in `export.xml` have an `id` key, though [the sample `export.xml` in the tests folder doesn’t either](https://github.com/dogsheep/healthkit-to-sqlite/blob/main/tests/zip_contents/apple_health_export/export.xml#L14-L21), so I don’t think this is the culprit. Indeed, it seems @simonw is using the [`hash_id` function from `sqlite_utils`](https://sqlite-utils.datasette.io/en/stable/python-api.html#setting-an-id-based-on-the-hash-of-the-row-contents), which creates a column (`id`, in this case) based on a hash of the row’s contents. When I run the script, a `workouts` table is created, with one entry: my first workout. No `workout_points` table is created, as [I’d expect from `utils.py`](https://github.com/dogsheep/healthkit-to-sqlite/blob/main/healthkit_to_sqlite/utils.py#L89-L90). I then get essentially the same error as noted in this thread: ```Importing from HealthKit [###################################-] 98% 00:00:01 Traceback (most recent call last): File "/Users/lchski/.pyenv/versions/3.10.3/bin/healthkit-to-sqlite", line 8, in <module> sys.exit(cli()) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1128, in __call__ return self.main(args, kwargs) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(args, **kwargs) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/healthkit_to_sqlite/cli.py", line 57, in cli convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf) File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/health…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	UNIQUE constraint failed: workouts.id 771608692
1073139067	https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-1073139067	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14	IC_kwDOC8tyDs4_9s17	lchski 343884	2022-03-20T00:54:18Z	2022-03-20T00:54:18Z	NONE	Update: this appears to be because of running the command twice without clearing the DB in between. Tries to insert a Workout that already exists, causing a collision on the (auto-generated) `id` column. Had a different error with a clean DB, likely due to the workout points format; will make a new issue for that.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	UNIQUE constraint failed: workouts.id 771608692
1629123734	https://github.com/dogsheep/healthkit-to-sqlite/issues/14#issuecomment-1629123734	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/14	IC_kwDOC8tyDs5hGnSW	philipp-heinrich 44622670	2023-07-10T14:46:52Z	2023-07-10T14:46:52Z	NONE	@simonw any chance to get this fixed soon?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	UNIQUE constraint failed: workouts.id 771608692
513439411	https://github.com/dogsheep/healthkit-to-sqlite/issues/2#issuecomment-513439411	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/2	MDEyOklzc3VlQ29tbWVudDUxMzQzOTQxMQ==	simonw 9599	2019-07-20T05:58:57Z	2019-07-20T05:58:57Z	MEMBER	```python 'Workout': {'attr_counts': {'creationDate': 663, 'device': 230, 'duration': 663, 'durationUnit': 663, 'endDate': 663, 'sourceName': 663, 'sourceVersion': 663, 'startDate': 663, 'totalDistance': 663, 'totalDistanceUnit': 663, 'totalEnergyBurned': 663, 'totalEnergyBurnedUnit': 663, 'workoutActivityType': 663}, 'child_counts': {'MetadataEntry': 1928, 'WorkoutEvent': 2094, 'WorkoutRoute': 340}, 'count': 663, 'parent_counts': {'HealthData': 663}}, 'WorkoutEvent': {'attr_counts': {'date': 2094, 'duration': 837, 'durationUnit': 837, 'type': 2094}, 'child_counts': {}, 'count': 2094, 'parent_counts': {'Workout': 2094}}, 'WorkoutRoute': {'attr_counts': {'creationDate': 340, 'endDate': 340, 'sourceName': 340, 'sourceVersion': 340, 'startDate': 340}, 'child_counts': {'Location': 398683, 'MetadataEntry': 546}, 'count': 340, 'parent_counts': {'Workout': 340}}} ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import workouts 470637152
902355471	https://github.com/dogsheep/healthkit-to-sqlite/issues/20#issuecomment-902355471	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/20	IC_kwDOC8tyDs41yNoP	simonw 9599	2021-08-20T01:09:07Z	2021-08-20T01:09:07Z	MEMBER	Workaround: sqlite-utils create-index healthkit.db workout_points -- -date See https://sqlite-utils.datasette.io/en/stable/cli.html#creating-indexes	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add index on workout_points.date 975166271
902356871	https://github.com/dogsheep/healthkit-to-sqlite/issues/20#issuecomment-902356871	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/20	IC_kwDOC8tyDs41yN-H	simonw 9599	2021-08-20T01:12:48Z	2021-08-20T01:12:48Z	MEMBER	Also on `workout_points.workout_id` to speed up queries to show all points in a specific workout.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add index on workout_points.date 975166271
903950096	https://github.com/dogsheep/healthkit-to-sqlite/issues/21#issuecomment-903950096	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/21	IC_kwDOC8tyDs414S8Q	FabianHertwig 32016596	2021-08-23T17:00:59Z	2021-08-23T17:00:59Z	NONE	I think the issue is that I have records like these: ```xml <Record type="HKQuantityTypeIdentifierDietaryCholesterol" sourceName="MyFitnessPal" sourceVersion="35120" unit="mg" creationDate="2021-07-04 20:55:27 +0200" startDate="2021-07-04 20:55:00 +0200" endDate="2021-07-04 20:55:00 +0200" value="124"> <MetadataEntry key="meal" value="Dinner"/> <MetadataEntry key="Meal" value="Dinner"/> </Record> ``` And if sqlite is case insensitive, then `metadata_meal` and `metadata_Meal` result in the same column.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Duplicate Column 977128935
904641261	https://github.com/dogsheep/healthkit-to-sqlite/pull/22#issuecomment-904641261	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/22	IC_kwDOC8tyDs4167rt	FabianHertwig 32016596	2021-08-24T13:26:20Z	2021-08-24T13:26:20Z	NONE	Did not see that #13 fixes the same issue in a similar way. You can decide which one to merge ;)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Make sure that case-insensitive column names are unique 978086284
1464786643	https://github.com/dogsheep/healthkit-to-sqlite/issues/24#issuecomment-1464786643	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/24	IC_kwDOC8tyDs5XTt7T	Mjboothaus 956433	2023-03-11T02:01:27Z	2023-03-11T02:01:27Z	NONE	Thanks for reporting this and providing a solution -- I was puzzled by this error when I revisited my walking data and experienced this issues. I haven't tried the fix yet.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	DOC: xml.etree.ElementTree.ParseError due to healthkit version 12 1515883470
1464796494	https://github.com/dogsheep/healthkit-to-sqlite/issues/24#issuecomment-1464796494	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/24	IC_kwDOC8tyDs5XTwVO	Mjboothaus 956433	2023-03-11T02:23:42Z	2023-03-11T02:23:42Z	NONE	@simonw - maybe put in some error handling to trap for poorly formed XML (from Apple engineers) so that it suggests that there are problems with export.zip rather than odd looking Python errors :)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	DOC: xml.etree.ElementTree.ParseError due to healthkit version 12 1515883470
513440090	https://github.com/dogsheep/healthkit-to-sqlite/issues/4#issuecomment-513440090	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/4	MDEyOklzc3VlQ29tbWVudDUxMzQ0MDA5MA==	simonw 9599	2019-07-20T06:11:50Z	2019-07-20T06:11:50Z	MEMBER	Some examples: https://github.com/dogsheep/healthkit-to-sqlite/blob/d016e70c31cf84ba0f5ec3102546db54a51aaffb/tests/export.xml#L4-L13	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import Records 470640505
513514978	https://github.com/dogsheep/healthkit-to-sqlite/issues/5#issuecomment-513514978	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDUxMzUxNDk3OA==	simonw 9599	2019-07-21T02:55:12Z	2019-07-21T02:55:12Z	MEMBER	I'm going to show this by default. Users can pass `-s` or `--silent` to disable the progress bar.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add progress bar 470691622
513625406	https://github.com/dogsheep/healthkit-to-sqlite/issues/5#issuecomment-513625406	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDUxMzYyNTQwNg==	simonw 9599	2019-07-22T03:20:16Z	2019-07-22T03:20:16Z	MEMBER	It now renders like this: ``` Importing from HealthKit [#-----------------------------------] 5% 00:01:33 ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Add progress bar 470691622
513626742	https://github.com/dogsheep/healthkit-to-sqlite/issues/6#issuecomment-513626742	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/6	MDEyOklzc3VlQ29tbWVudDUxMzYyNjc0Mg==	simonw 9599	2019-07-22T03:28:55Z	2019-07-22T03:28:55Z	MEMBER	Here's what it looks like now as separate tables: <img width="1050" alt="hello9_and_Populate__endpoint__key_in_ASGI_scope_·_Issue__537_·_simonw_datasette" src="https://user-images.githubusercontent.com/9599/61604322-eba05000-abf5-11e9-8f8b-d0204e9a705a.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Break up records into different tables for each type 470856782
514496725	https://github.com/dogsheep/healthkit-to-sqlite/issues/7#issuecomment-514496725	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/7	MDEyOklzc3VlQ29tbWVudDUxNDQ5NjcyNQ==	simonw 9599	2019-07-24T06:20:59Z	2019-07-24T06:20:59Z	MEMBER	I'm using https://pypi.org/project/memory-profiler/ to explore this in more detail: ``` $ pip install memory-profiler matplotlib ``` Then: ``` $ mprof run healthkit-to-sqlite ~/Downloads/healthkit-export.zip healthkit.db $ mprof plot ``` <img width="1128" alt="Screen Shot 2019-07-24 at 8 17 06 AM" src="https://user-images.githubusercontent.com/9599/61769849-f3294a00-adeb-11e9-9026-02c24209530f.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Script uses a lot of RAM 472097220
514498221	https://github.com/dogsheep/healthkit-to-sqlite/issues/7#issuecomment-514498221	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/7	MDEyOklzc3VlQ29tbWVudDUxNDQ5ODIyMQ==	simonw 9599	2019-07-24T06:26:49Z	2019-07-24T06:26:49Z	MEMBER	Adding `el.clear()` got me a huge improvement: <img width="1128" alt="Screen Shot 2019-07-24 at 8 23 26 AM" src="https://user-images.githubusercontent.com/9599/61770130-c3c70d00-adec-11e9-886a-93a2cea5bf4b.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Script uses a lot of RAM 472097220
514500253	https://github.com/dogsheep/healthkit-to-sqlite/issues/7#issuecomment-514500253	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/7	MDEyOklzc3VlQ29tbWVudDUxNDUwMDI1Mw==	simonw 9599	2019-07-24T06:34:28Z	2019-07-24T06:34:28Z	MEMBER	Clearing the root element each time saved even more: <img width="1128" alt="Screen Shot 2019-07-24 at 8 30 38 AM" src="https://user-images.githubusercontent.com/9599/61770555-d3932100-aded-11e9-8ffe-bebd682f94ed.png">	{"total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Script uses a lot of RAM 472097220
514745798	https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-514745798	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9	MDEyOklzc3VlQ29tbWVudDUxNDc0NTc5OA==	tholo 166463	2019-07-24T18:25:36Z	2019-07-24T18:25:36Z	NONE	This is on macOS 10.14.6, with Python 3.7.4, packages in the virtual environment: ``` Package Version ------------------- ------- aiofiles 0.4.0 Click 7.0 click-default-group 1.2.1 datasette 0.29.2 h11 0.8.1 healthkit-to-sqlite 0.3.1 httptools 0.0.13 hupper 1.8.1 importlib-metadata 0.18 Jinja2 2.10.1 MarkupSafe 1.1.1 Pint 0.8.1 pip 19.2.1 pluggy 0.12.0 setuptools 41.0.1 sqlite-utils 1.7 tabulate 0.8.3 uvicorn 0.8.4 uvloop 0.12.2 websockets 7.0 zipp 0.5.2 ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Too many SQL variables 472429048
515226724	https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-515226724	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9	MDEyOklzc3VlQ29tbWVudDUxNTIyNjcyNA==	simonw 9599	2019-07-25T21:46:01Z	2019-07-25T21:46:01Z	MEMBER	I can work around this here (prior to the fix in sqlite-utils) by setting the batch size to something a bit lower here.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Too many SQL variables 472429048
515322294	https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-515322294	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9	MDEyOklzc3VlQ29tbWVudDUxNTMyMjI5NA==	simonw 9599	2019-07-26T06:07:12Z	2019-07-26T06:07:12Z	MEMBER	@tholo this should be fixed in just-released version 0.3.2 - could you run a `pip install -U healthkit-to-sqlite` and let me know if it works for you now?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Too many SQL variables 472429048
515370687	https://github.com/dogsheep/healthkit-to-sqlite/issues/9#issuecomment-515370687	https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/9	MDEyOklzc3VlQ29tbWVudDUxNTM3MDY4Nw==	tholo 166463	2019-07-26T09:01:19Z	2019-07-26T09:01:19Z	NONE	Yes, that did fix the issue I was seeing — it will now import my complete HealthKit data. Thorsten > On Jul 25, 2019, at 23:07, Simon Willison <notifications@github.com> wrote: > > @tholo <https://github.com/tholo> this should be fixed in just-released version 0.3.2 - could you run a pip install -U healthkit-to-sqlite and let me know if it works for you now? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub <https://github.com/dogsheep/healthkit-to-sqlite/issues/9?email_source=notifications&email_token=AABIUPYTWYOYSLEAFS7TLMLQBKIBBA5CNFSM4IGSXNNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23TDNQ#issuecomment-515322294>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABIUP7DCBQ37SESQL7D4WTQBKIBBANCNFSM4IGSXNNA>. >	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Too many SQL variables 472429048
538847446	https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-538847446	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1	MDEyOklzc3VlQ29tbWVudDUzODg0NzQ0Ng==	simonw 9599	2019-10-07T05:41:17Z	2019-10-07T05:41:17Z	MEMBER	Prototype code: ```python offset = 0 fetched = [] size = 500 while True: page = requests.get("https://getpocket.com/v3/get", { "consumer_key": consumer_key, "access_token": access_token, "sort": "oldest", "detailType": "complete", "count": size, "offset": offset, }).json() print(offset) fetched.append(page) offset += size if not len(page["list"]): break ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use better pagination (and implement progress bar) 503233021
605316146	https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-605316146	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1	MDEyOklzc3VlQ29tbWVudDYwNTMxNjE0Ng==	simonw 9599	2020-03-27T21:09:15Z	2020-03-27T21:09:22Z	MEMBER	For a progress bar I need to know how many total items there are. I found an undocumented API for this! `/v3/stats` which returns: ```json { "count_list": 7394, "count_read": 1016, "count_unread": 6378, "status": 1 } ``` I guessed this based on the documented v2 API: https://getpocket.com/api/v2_docs/#stats	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use better pagination (and implement progress bar) 503233021
605325897	https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-605325897	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1	MDEyOklzc3VlQ29tbWVudDYwNTMyNTg5Nw==	simonw 9599	2020-03-27T21:37:26Z	2020-03-27T21:38:37Z	MEMBER	I keep getting 503 errors even though I appear to be staying within the rate limit: ``` {'Date': 'Fri, 27 Mar 2020 21:35:57 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'Apache/2.4.25 (Debian)', 'Content-Location': 'get.php', 'Vary': 'negotiate', 'TCN': 'choice', 'Set-Cookie': '...; httponly', 'X-Frame-Options': 'SAMEORIGIN', 'Status': '200 OK', 'X-Limit-Key-Limit': '10000', 'X-Limit-Key-Remaining': '9960', 'X-Limit-Key-Reset': '282', 'X-Source': 'Pocket', 'P3P': 'policyref="/w3c/p3p.xml", CP="ALL CURa ADMa DEVa OUR IND UNI COM NAV INT STA PRE"'} [##----------------------------------] 6% 06:49:27 {'Date': 'Fri, 27 Mar 2020 21:36:06 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '23', 'Connection': 'keep-alive', 'Server': 'Apache/2.4.25 (Debian)', 'Content-Location': 'get.php', 'Vary': 'negotiate', 'TCN': 'choice', 'Set-Cookie': '...', 'X-Frame-Options': 'SAMEORIGIN', 'X-Error': 'Pocket is currently under heavy load. Please wait a moment and try again.', 'X-Error-Code': '199', 'Status': '503 Service Unavailable', 'X-Source': 'Pocket', 'P3P': 'policyref="/w3c/p3p.xml", CP="ALL CURa ADMa DEVa OUR IND UNI COM NAV INT STA PRE"'} ``` I'm going to try doing a few automatic retries any time I see a 503 error.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use better pagination (and implement progress bar) 503233021
605327655	https://github.com/dogsheep/pocket-to-sqlite/issues/1#issuecomment-605327655	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/1	MDEyOklzc3VlQ29tbWVudDYwNTMyNzY1NQ==	simonw 9599	2020-03-27T21:42:49Z	2020-03-27T21:42:49Z	MEMBER	Or maybe it was because of the current Google Cloud outage? https://news.ycombinator.com/item?id=22706677	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Use better pagination (and implement progress bar) 503233021
1221622873	https://github.com/dogsheep/pocket-to-sqlite/issues/10#issuecomment-1221622873	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/10	IC_kwDODLZ_YM5I0HxZ	simonw 9599	2022-08-21T21:19:25Z	2022-08-21T21:19:25Z	MEMBER	Agreed, that would be a much better implementation.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	When running `auth` command, don't overwrite an existing auth.json file 1246826792
1221623052	https://github.com/dogsheep/pocket-to-sqlite/issues/10#issuecomment-1221623052	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/10	IC_kwDODLZ_YM5I0H0M	simonw 9599	2022-08-21T21:20:33Z	2022-08-21T21:20:33Z	MEMBER	That was clearly the intention from the description of this issue: - #4	{"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	When running `auth` command, don't overwrite an existing auth.json file 1246826792
1239516561	https://github.com/dogsheep/pocket-to-sqlite/issues/10#issuecomment-1239516561	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/10	IC_kwDODLZ_YM5J4YWR	ashanan 11887	2022-09-07T15:07:38Z	2022-09-07T15:07:38Z	NONE	Thanks!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	When running `auth` command, don't overwrite an existing auth.json file 1246826792
1221521377	https://github.com/dogsheep/pocket-to-sqlite/issues/11#issuecomment-1221521377	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/11	IC_kwDODLZ_YM5Izu_h	fernand0 2467	2022-08-21T10:51:37Z	2022-08-21T10:51:37Z	NONE	I didn't see there is a PR about this: https://github.com/dogsheep/pocket-to-sqlite/pull/7	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	-a option is used for "--auth" and for "--all" 1345452427
1221621466	https://github.com/dogsheep/pocket-to-sqlite/issues/11#issuecomment-1221621466	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/11	IC_kwDODLZ_YM5I0Hba	simonw 9599	2022-08-21T21:09:47Z	2022-08-21T21:09:47Z	MEMBER	Great catch, thanks. I'm going to use it to mean `--auth` - since other tools in the Dogsheep family have the same convention. `--all` will be the only way to specify all.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	-a option is used for "--auth" and for "--all" 1345452427
1221621529	https://github.com/dogsheep/pocket-to-sqlite/issues/11#issuecomment-1221621529	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/11	IC_kwDODLZ_YM5I0HcZ	simonw 9599	2022-08-21T21:10:15Z	2022-08-21T21:11:26Z	MEMBER	Just saw that's what's implemented here already! - #7	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	-a option is used for "--auth" and for "--all" 1345452427
1627563202	https://github.com/dogsheep/pocket-to-sqlite/issues/12#issuecomment-1627563202	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/12	IC_kwDODLZ_YM5hAqTC	simonw 9599	2023-07-09T01:14:27Z	2023-07-09T01:14:27Z	MEMBER	I tested this locally with `python -m build` and then `pip install ...whl` in a fresh virtual environment.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Switch to pyproject.toml 1795187493
1627564127	https://github.com/dogsheep/pocket-to-sqlite/issues/12#issuecomment-1627564127	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/12	IC_kwDODLZ_YM5hAqhf	simonw 9599	2023-07-09T01:19:42Z	2023-07-09T01:19:42Z	MEMBER	https://github.com/dogsheep/pocket-to-sqlite/tree/0.2.3 and https://pypi.org/project/pocket-to-sqlite/0.2.3/	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Switch to pyproject.toml 1795187493
538847796	https://github.com/dogsheep/pocket-to-sqlite/issues/2#issuecomment-538847796	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/2	MDEyOklzc3VlQ29tbWVudDUzODg0Nzc5Ng==	simonw 9599	2019-10-07T05:43:30Z	2019-10-07T05:43:30Z	MEMBER	We can persist the `since` value in its own single-row table.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Track and use the 'since' value 503234169
605337941	https://github.com/dogsheep/pocket-to-sqlite/issues/2#issuecomment-605337941	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/2	MDEyOklzc3VlQ29tbWVudDYwNTMzNzk0MQ==	simonw 9599	2020-03-27T22:16:32Z	2020-03-27T22:16:32Z	MEMBER	Need to test this. I have 7,394 items in my database right now. I'm going to save a new thing. Then I ran this: ``` pocket-to-sqlite fetch pocket-simon.db ``` And it worked!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Track and use the 'since' value 503234169
605338322	https://github.com/dogsheep/pocket-to-sqlite/issues/2#issuecomment-605338322	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/2	MDEyOklzc3VlQ29tbWVudDYwNTMzODMyMg==	simonw 9599	2020-03-27T22:18:02Z	2020-03-27T22:18:02Z	MEMBER	Just needs documentation now.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Track and use the 'since' value 503234169
684424396	https://github.com/dogsheep/pocket-to-sqlite/issues/3#issuecomment-684424396	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/3	MDEyOklzc3VlQ29tbWVudDY4NDQyNDM5Ng==	simonw 9599	2020-09-01T06:17:45Z	2020-09-01T06:17:45Z	MEMBER	It looks like I could ignore the `image` column and synthesize a unique key from the data in the `images` column using `$item_id/$image_id`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Extract images into separate tables 503243784
684425714	https://github.com/dogsheep/pocket-to-sqlite/issues/5#issuecomment-684425714	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/5	MDEyOklzc3VlQ29tbWVudDY4NDQyNTcxNA==	simonw 9599	2020-09-01T06:18:32Z	2020-09-01T06:18:32Z	MEMBER	Good suggestion, I'll setup a demo somewhere.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Set up a demo 629473827
1221621700	https://github.com/dogsheep/pocket-to-sqlite/pull/7#issuecomment-1221621700	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/7	IC_kwDODLZ_YM5I0HfE	simonw 9599	2022-08-21T21:11:12Z	2022-08-21T21:11:12Z	MEMBER	I thought this might need a documentation update but `--all` is already covered: https://github.com/dogsheep/pocket-to-sqlite/blob/0.2.1/README.md	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Fixed conflicting CLI flags 750141615
774726123	https://github.com/dogsheep/pocket-to-sqlite/issues/9#issuecomment-774726123	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/9	MDEyOklzc3VlQ29tbWVudDc3NDcyNjEyMw==	jfeiwell 12669260	2021-02-07T18:21:08Z	2021-02-07T18:21:08Z	NONE	@simonw any ideas here?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	SSL Error 801780625
774730656	https://github.com/dogsheep/pocket-to-sqlite/issues/9#issuecomment-774730656	https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/9	MDEyOklzc3VlQ29tbWVudDc3NDczMDY1Ng==	merwok 635179	2021-02-07T18:45:04Z	2021-02-07T18:45:04Z	NONE	That URL uses TLS 1.3, but maybe only if the client supports it. It could be your Python version or your SSL library that’s not recent enough.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	SSL Error 801780625
707326192	https://github.com/dogsheep/swarm-to-sqlite/pull/10#issuecomment-707326192	https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/10	MDEyOklzc3VlQ29tbWVudDcwNzMyNjE5Mg==	mattiaborsoi 29426418	2020-10-12T20:20:02Z	2020-10-12T20:20:02Z	CONTRIBUTOR	This closes issue #8	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Update utils.py to fix sqlite3.OperationalError 719637258
727692413	https://github.com/dogsheep/swarm-to-sqlite/issues/11#issuecomment-727692413	https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDcyNzY5MjQxMw==	simonw 9599	2020-11-16T02:15:22Z	2020-11-16T02:15:22Z	MEMBER	Thanks, I'll look into this.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Error thrown: sqlite3.OperationalError: table users has no column named lastName 743400216
761967094	https://github.com/dogsheep/swarm-to-sqlite/issues/11#issuecomment-761967094	https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/11	MDEyOklzc3VlQ29tbWVudDc2MTk2NzA5NA==	simonw 9599	2021-01-18T04:11:13Z	2021-01-18T04:11:13Z	MEMBER	I just got a similar error: ``` File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/swarm_to_sqlite/utils.py", line 79, in save_checkin checkins_table.m2m("users", user, m2m_table="with", pk="id") File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 2048, in m2m id = other_table.insert(record, pk=pk, replace=True).last_pk File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1781, in insert return self.insert_all( File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1899, in insert_all self.insert_chunk( File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1709, in insert_chunk result = self.db.execute(query, params) File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 226, in execute return self.conn.execute(sql, parameters) pysqlite3.dbapi2.OperationalError: table users has no column named countryCode ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Error thrown: sqlite3.OperationalError: table users has no column named lastName 743400216
941274088	https://github.com/dogsheep/swarm-to-sqlite/issues/12#issuecomment-941274088	https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/12	IC_kwDODD6af844GrPo	fs111 33631	2021-10-12T18:31:57Z	2021-10-12T18:31:57Z	NONE	I am running into the same problem. Is there any workaround?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	403 when getting token 951817328

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);