{"html_url": "https://github.com/simonw/sqlite-utils/issues/148#issuecomment-688434226", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/148", "id": 688434226, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQzNDIyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T16:50:33Z", "updated_at": "2020-09-07T16:50:33Z", "author_association": "OWNER", "body": "This may be as easy as applying `textwrap.dedent()` to this: https://github.com/simonw/sqlite-utils/blob/0e62744da9a429093e3409575c1f881376b0361f/sqlite_utils/db.py#L778-L787\r\n\r\nI could apply that to a few other queries in that code as well.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695276328, "label": "More attractive indentation of created FTS table schema"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460729", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688460729, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ2MDcyOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T18:06:44Z", "updated_at": "2020-09-07T18:06:44Z", "author_association": "OWNER", "body": "First posted on SQLite forum here but I'm pretty sure this is a bug in how `sqlite-utils` created those tables: https://sqlite.org/forum/forumpost/51aada1b45", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460865", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688460865, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ2MDg2NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T18:07:14Z", "updated_at": "2020-09-07T18:07:14Z", "author_association": "OWNER", "body": "Another likely culprit: `licenses` has a text primary key, so it's not using `rowid`:\r\n```sql\r\nCREATE TABLE [licenses] (\r\n [key] TEXT PRIMARY KEY,\r\n [name] TEXT,\r\n [spdx_id] TEXT,\r\n [url] TEXT,\r\n [node_id] TEXT\r\n);\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688464181", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688464181, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ2NDE4MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T18:19:54Z", "updated_at": "2020-09-07T18:19:54Z", "author_association": "OWNER", "body": "Even though that table doesn't declare an integer primary key it does have a `rowid` column: https://github-to-sqlite.dogsheep.net/github?sql=select+rowid%2C+%5Bkey%5D%2C+name%2C+spdx_id%2C+url%2C+node_id+from+licenses+order+by+%5Bkey%5D+limit+101\r\n\r\n| rowid | key | name | spdx_id | url | node_id |\r\n| --- | --- | --- | --- | --- | --- |\r\n| 9150 | apache-2.0 | Apache License 2.0 | Apache-2.0 | | MDc6TGljZW5zZTI= |\r\n| 112 | bsd-3-clause | BSD 3-Clause \"New\" or \"Revised\" License | BSD-3-Clause | | MDc6TGljZW5zZTU= |\r\n\r\nhttps://www.sqlite.org/rowidtable.html explains has this clue:\r\n\r\n> If the rowid is not aliased by INTEGER PRIMARY KEY then it is not persistent and might change. In particular the VACUUM command will change rowids for tables that do not declare an INTEGER PRIMARY KEY. Therefore, applications should not normally access the rowid directly, but instead use an INTEGER PRIMARY KEY. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688480665", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688480665, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MDY2NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:16:20Z", "updated_at": "2020-09-07T19:16:20Z", "author_association": "OWNER", "body": "Aha! I have managed to replicate the bug:\r\n```\r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 35},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 16},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9151},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n(github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep \r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 45},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 26},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9161},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n```\r\nNote that the number of records in `licenses_fts_docsize` went from 9151 to 9161.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688481374", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688481374, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MTM3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:19:08Z", "updated_at": "2020-09-07T19:19:08Z", "author_association": "OWNER", "body": "reading through the code for `github-to-sqlite repos` - one of the things it does is calls `save_license` for each repo:\r\n\r\nhttps://github.com/dogsheep/github-to-sqlite/blob/39b2234253096bd579feed4e25104698b8ccd2ba/github_to_sqlite/utils.py#L259-L262\r\n\r\n```python\r\ndef save_license(db, license):\r\n if license is None:\r\n return None\r\n return db[\"licenses\"].insert(license, pk=\"key\", replace=True).last_pk\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482055", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688482055, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MjA1NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:21:42Z", "updated_at": "2020-09-07T19:21:42Z", "author_association": "OWNER", "body": "Using `replace=True` there executes `INSERT OR REPLACE` - and Dan Kennedy (SQLite maintainer) on the SQLite forums said this:\r\n> Are you using \"REPLACE INTO\", or \"UPDATE OR REPLACE\" on the \"licenses\" table without having first executed \"PRAGMA recursive_triggers = 1\"? The docs note that delete triggers will not be fired in this case, which would explain things. Second paragraph under \"REPLACE\" here:\r\n>\r\n> https://www.sqlite.org/lang_conflict.html", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482355", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688482355, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MjM1NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:22:51Z", "updated_at": "2020-09-07T19:22:51Z", "author_association": "OWNER", "body": "And the SQLite documentation says:\r\n> When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, [delete triggers](https://www.sqlite.org/lang_createtrigger.html) fire if and only if [recursive triggers](https://www.sqlite.org/pragma.html#pragma_recursive_triggers) are enabled.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499650", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688499650, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ5OTY1MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:24:35Z", "updated_at": "2020-09-07T20:24:35Z", "author_association": "OWNER", "body": "This replicates the problem:\r\n```\r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 35},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 16},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9151},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n(github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep \r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 45},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 26},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9161},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n```\r\nNote how the number of rows in `licenses_fts_docsize` goes from 9151 to 9161.\r\n\r\nThe number went up by ten. I used tracing from #151 to show that the following SQL executed ten times:\r\n```\r\nINSERT OR REPLACE INTO [licenses] ([key], [name], [node_id], [spdx_id], [url]) VALUES \r\n (?, ?, ?, ?, ?);\r\n```\r\nThen I tried executing `PRAGMA recursive_triggers=on;` at the start of the script. This fixed the problem - running the script did not increase the number of rows in `licenses_fts_docsize`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499924", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688499924, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ5OTkyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:25:40Z", "updated_at": "2020-09-07T20:25:50Z", "author_association": "OWNER", "body": "https://www.sqlite.org/pragma.html#pragma_recursive_triggers says:\r\n\r\n> Prior to SQLite [version 3.6.18](https://www.sqlite.org/releaselog/3_6_18.html) (2009-09-11), recursive triggers were not supported. The behavior of SQLite was always as if this pragma was set to OFF. Support for recursive triggers was added in version 3.6.18 but was initially turned OFF by default, for compatibility. Recursive triggers may be turned on by default in future versions of SQLite.\r\n\r\nSo I think the fix is to turn on `recursive_triggers` globally by default for `sqlite-utils`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688501064", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688501064, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUwMTA2NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:30:15Z", "updated_at": "2020-09-07T20:30:38Z", "author_association": "OWNER", "body": "The second challenge here is cleaning up all of those junk rows in existing `*_fts_docsize` tables. Doing that just to the demo database from https://github-to-sqlite.dogsheep.net/github.db dropped its size from 22MB to 16MB! Here's the SQL:\r\n```sql\r\nDELETE FROM [licenses_fts_docsize] WHERE id NOT IN (\r\n SELECT rowid FROM [licenses_fts]);\r\n```\r\nI can do that as part of the existing `table.optimize()` method, which optimizes FTS tables.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/152#issuecomment-688500294", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/152", "id": 688500294, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUwMDI5NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:27:07Z", "updated_at": "2020-09-07T20:27:07Z", "author_association": "OWNER", "body": "I'm going to make this an argument to the `Database()` class constructor which defaults to `True`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695376054, "label": "Turn on recursive_triggers by default"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/152#issuecomment-688500704", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/152", "id": 688500704, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUwMDcwNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:28:45Z", "updated_at": "2020-09-07T21:17:48Z", "author_association": "OWNER", "body": "The principle reason to turn these on - at least so far - is that without it weird things happen where FTS tables (in particular `*_fts_docsize`) grow without limit over time, because calls to `INSERT OR REPLACE` against the parent table cause additional rows to be inserted into `*_fts_docsize` even if the row was replaced rather than being inserted.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695376054, "label": "Turn on recursive_triggers by default"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/153#issuecomment-688506015", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/153", "id": 688506015, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUwNjAxNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:46:58Z", "updated_at": "2020-09-07T20:46:58Z", "author_association": "OWNER", "body": "Writing a test for this will be a tiny bit tricky. I think I'll use a test that replicates the bug in #149.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695377804, "label": "table.optimize() should delete junk rows from *_fts_docsize"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/153#issuecomment-688511161", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/153", "id": 688511161, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUxMTE2MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T21:07:20Z", "updated_at": "2020-09-07T21:07:29Z", "author_association": "OWNER", "body": "FTS4 uses a different column name here: https://datasette-sqlite-fts4.datasette.io/24ways-fts4/articles_fts_docsize\r\n\r\n```\r\nCREATE TABLE 'articles_fts_docsize'(docid INTEGER PRIMARY KEY, size BLOB);\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695377804, "label": "table.optimize() should delete junk rows from *_fts_docsize"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/154#issuecomment-688543128", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/154", "id": 688543128, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODU0MzEyOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T23:43:10Z", "updated_at": "2020-09-07T23:43:10Z", "author_association": "OWNER", "body": "Running this against the same file works:\r\n```\r\n$ sqlite3 beta.db \r\nSQLite version 3.31.1 2020-01-27 19:55:54\r\nEnter \".help\" for usage hints.\r\nsqlite> PRAGMA journal_mode=wal;\r\nwal\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695441530, "label": "OperationalError: cannot change into wal mode from within a transaction"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/154#issuecomment-688544156", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/154", "id": 688544156, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODU0NDE1Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T23:47:10Z", "updated_at": "2020-09-07T23:47:10Z", "author_association": "OWNER", "body": "This is already covered in the tests though: https://github.com/simonw/sqlite-utils/blob/deb2eb013ff85bbc828ebc244a9654f0d9c3139e/tests/test_cli.py#L1300-L1328", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695441530, "label": "OperationalError: cannot change into wal mode from within a transaction"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688479163", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146", "id": 688479163, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ3OTE2Mw==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-09-07T19:10:33Z", "updated_at": "2020-09-07T19:11:57Z", "author_association": "CONTRIBUTOR", "body": "@simonw -- I've gone ahead updated the documentation to reflect the changes introduced in this PR. IMO it's ready to merge now.\r\n\r\nIn writing the documentation changes, I begin to wonder about the value and role of `batch_size` at all, tbh. May I assume it was originally intended to prevent using the entire row set to determine columns and column types, and that this was a performance consideration? If so, this PR entirely undermines its purpose. I've been passing in excess of 500,000 rows at a time to `insert_all()` with these changes and although I'm sure the performance difference is measurable it's not really noticeable; given #145, I don't know that any performance advantages outweigh the problems doing it this way removes. What do you think about just dropping the argument and defaulting to the maximum `batch_size` permissible given `SQLITE_MAX_VARS`? Are there other reasons one might want to restrict `batch_size` that I've overlooked? I could open a new issue to discuss/implement this.\r\n\r\nOf course the documentation will need to change again too if/when something is done about #147.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688668680, "label": "Handle case where subsequent records (after first batch) include extra columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688481317", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146", "id": 688481317, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MTMxNw==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-09-07T19:18:55Z", "updated_at": "2020-09-07T19:18:55Z", "author_association": "CONTRIBUTOR", "body": "Just force-pushed to update d042f9c with more formatting changes to satisfy `black==20.8b1` and pass the GitHub Actions \"Test\" workflow.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688668680, "label": "Handle case where subsequent records (after first batch) include extra columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688508510", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146", "id": 688508510, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUwODUxMA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:56:03Z", "updated_at": "2020-09-07T20:56:24Z", "author_association": "OWNER", "body": "The problem with this approach is that it requires us to consume the entire iterator before we can start inserting rows into the table - here on line 1052:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/bb131793feac16bc7181ab997568f941b0220ef2/sqlite_utils/db.py#L1047-L1054\r\n\r\nI designed the `.insert_all()` to avoid doing this, because I want to be able to pass it an iterator (or more likely a generator) that could produce potentially millions of records. Doing things one batch of 100 records at a time means that the Python process doesn't need to pull millions of records into memory at once.\r\n\r\n`db-to-sqlite` is one example of a tool that uses that characteristic, in https://github.com/simonw/db-to-sqlite/blob/63e4ee972f292de13bb11767c0fb64b35339d954/db_to_sqlite/cli.py#L94-L106\r\n\r\nSo we need to solve this issue without consuming the entire iterator with a `records = list(records)` call.\r\n\r\nI think one way to do this is to execute each chunk one at a time and watch out for an exception that indicates that we sent too many parameters - then adjust the chunk size down and try again.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688668680, "label": "Handle case where subsequent records (after first batch) include extra columns"}, "performed_via_github_app": null}