github: issue_comments: 5 rows where issue = 688668680

5 rows where issue = 688668680

Search:

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
688479163	https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688479163	https://api.github.com/repos/simonw/sqlite-utils/issues/146	MDEyOklzc3VlQ29tbWVudDY4ODQ3OTE2Mw==	simonwiles 96218	2020-09-07T19:10:33Z	2020-09-07T19:11:57Z	CONTRIBUTOR	@simonw -- I've gone ahead updated the documentation to reflect the changes introduced in this PR. IMO it's ready to merge now. In writing the documentation changes, I begin to wonder about the value and role of `batch_size` at all, tbh. May I assume it was originally intended to prevent using the entire row set to determine columns and column types, and that this was a performance consideration? If so, this PR entirely undermines its purpose. I've been passing in excess of 500,000 rows at a time to `insert_all()` with these changes and although I'm sure the performance difference is measurable it's not really noticeable; given #145, I don't know that any performance advantages outweigh the problems doing it this way removes. What do you think about just dropping the argument and defaulting to the maximum `batch_size` permissible given `SQLITE_MAX_VARS`? Are there other reasons one might want to restrict `batch_size` that I've overlooked? I could open a new issue to discuss/implement this. Of course the documentation will need to change again too if/when something is done about #147.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Handle case where subsequent records (after first batch) include extra columns 688668680
688481317	https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688481317	https://api.github.com/repos/simonw/sqlite-utils/issues/146	MDEyOklzc3VlQ29tbWVudDY4ODQ4MTMxNw==	simonwiles 96218	2020-09-07T19:18:55Z	2020-09-07T19:18:55Z	CONTRIBUTOR	Just force-pushed to update d042f9c with more formatting changes to satisfy `black==20.8b1` and pass the GitHub Actions "Test" workflow.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Handle case where subsequent records (after first batch) include extra columns 688668680
688508510	https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688508510	https://api.github.com/repos/simonw/sqlite-utils/issues/146	MDEyOklzc3VlQ29tbWVudDY4ODUwODUxMA==	simonw 9599	2020-09-07T20:56:03Z	2020-09-07T20:56:24Z	OWNER	The problem with this approach is that it requires us to consume the entire iterator before we can start inserting rows into the table - here on line 1052: https://github.com/simonw/sqlite-utils/blob/bb131793feac16bc7181ab997568f941b0220ef2/sqlite_utils/db.py#L1047-L1054 I designed the `.insert_all()` to avoid doing this, because I want to be able to pass it an iterator (or more likely a generator) that could produce potentially millions of records. Doing things one batch of 100 records at a time means that the Python process doesn't need to pull millions of records into memory at once. `db-to-sqlite` is one example of a tool that uses that characteristic, in https://github.com/simonw/db-to-sqlite/blob/63e4ee972f292de13bb11767c0fb64b35339d954/db_to_sqlite/cli.py#L94-L106 So we need to solve this issue without consuming the entire iterator with a `records = list(records)` call. I think one way to do this is to execute each chunk one at a time and watch out for an exception that indicates that we sent too many parameters - then adjust the chunk size down and try again.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Handle case where subsequent records (after first batch) include extra columns 688668680
688573964	https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688573964	https://api.github.com/repos/simonw/sqlite-utils/issues/146	MDEyOklzc3VlQ29tbWVudDY4ODU3Mzk2NA==	simonwiles 96218	2020-09-08T01:55:07Z	2020-09-08T01:55:07Z	CONTRIBUTOR	Okay, I've rewritten this PR to preserve the batching behaviour but still fix #145, and rebased the branch to account for the `db.execute()` api change. It's not terribly sophisticated -- if it attempts to insert a batch which has too many variables, the exception is caught, the batch is split in two and each half is inserted separately, and then it carries on as before with the same `batch_size`. In the edge case where this gets triggered, subsequent batches will all be inserted in two groups too if they continue to have the same number of columns (which is presumably reasonably likely). Do you reckon this is acceptable when set against the awkwardness of recalculating the `batch_size` on the fly?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Handle case where subsequent records (after first batch) include extra columns 688668680
689185393	https://github.com/simonw/sqlite-utils/pull/146#issuecomment-689185393	https://api.github.com/repos/simonw/sqlite-utils/issues/146	MDEyOklzc3VlQ29tbWVudDY4OTE4NTM5Mw==	simonw 9599	2020-09-08T23:17:42Z	2020-09-08T23:17:42Z	OWNER	That seems like a reasonable approach to me, especially since this is going to be a pretty rare edge-case.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Handle case where subsequent records (after first batch) include extra columns 688668680

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);