html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460729,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688460729,MDEyOklzc3VlQ29tbWVudDY4ODQ2MDcyOQ==,9599,2020-09-07T18:06:44Z,2020-09-07T18:06:44Z,OWNER,First posted on SQLite forum here but I'm pretty sure this is a bug in how `sqlite-utils` created those tables: https://sqlite.org/forum/forumpost/51aada1b45,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460865,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688460865,MDEyOklzc3VlQ29tbWVudDY4ODQ2MDg2NQ==,9599,2020-09-07T18:07:14Z,2020-09-07T18:07:14Z,OWNER,"Another likely culprit: `licenses` has a text primary key, so it's not using `rowid`: ```sql CREATE TABLE [licenses] ( [key] TEXT PRIMARY KEY, [name] TEXT, [spdx_id] TEXT, [url] TEXT, [node_id] TEXT ); ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688464181,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688464181,MDEyOklzc3VlQ29tbWVudDY4ODQ2NDE4MQ==,9599,2020-09-07T18:19:54Z,2020-09-07T18:19:54Z,OWNER,"Even though that table doesn't declare an integer primary key it does have a `rowid` column: https://github-to-sqlite.dogsheep.net/github?sql=select+rowid%2C+%5Bkey%5D%2C+name%2C+spdx_id%2C+url%2C+node_id+from+licenses+order+by+%5Bkey%5D+limit+101 | rowid | key | name | spdx_id | url | node_id | | --- | --- | --- | --- | --- | --- | | 9150 | apache-2.0 | Apache License 2.0 | Apache-2.0 | | MDc6TGljZW5zZTI= | | 112 | bsd-3-clause | BSD 3-Clause ""New"" or ""Revised"" License | BSD-3-Clause | | MDc6TGljZW5zZTU= | https://www.sqlite.org/rowidtable.html explains has this clue: > If the rowid is not aliased by INTEGER PRIMARY KEY then it is not persistent and might change. In particular the VACUUM command will change rowids for tables that do not declare an INTEGER PRIMARY KEY. Therefore, applications should not normally access the rowid directly, but instead use an INTEGER PRIMARY KEY. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688480665,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688480665,MDEyOklzc3VlQ29tbWVudDY4ODQ4MDY2NQ==,9599,2020-09-07T19:16:20Z,2020-09-07T19:16:20Z,OWNER,"Aha! I have managed to replicate the bug: ``` (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {""table"": ""licenses"", ""count"": 7}, {""table"": ""licenses_fts_data"", ""count"": 35}, {""table"": ""licenses_fts_idx"", ""count"": 16}, {""table"": ""licenses_fts_docsize"", ""count"": 9151}, {""table"": ""licenses_fts_config"", ""count"": 1}, {""table"": ""licenses_fts"", ""count"": 7}, (github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {""table"": ""licenses"", ""count"": 7}, {""table"": ""licenses_fts_data"", ""count"": 45}, {""table"": ""licenses_fts_idx"", ""count"": 26}, {""table"": ""licenses_fts_docsize"", ""count"": 9161}, {""table"": ""licenses_fts_config"", ""count"": 1}, {""table"": ""licenses_fts"", ""count"": 7}, ``` Note that the number of records in `licenses_fts_docsize` went from 9151 to 9161.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688481374,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688481374,MDEyOklzc3VlQ29tbWVudDY4ODQ4MTM3NA==,9599,2020-09-07T19:19:08Z,2020-09-07T19:19:08Z,OWNER,"reading through the code for `github-to-sqlite repos` - one of the things it does is calls `save_license` for each repo: https://github.com/dogsheep/github-to-sqlite/blob/39b2234253096bd579feed4e25104698b8ccd2ba/github_to_sqlite/utils.py#L259-L262 ```python def save_license(db, license): if license is None: return None return db[""licenses""].insert(license, pk=""key"", replace=True).last_pk ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482055,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688482055,MDEyOklzc3VlQ29tbWVudDY4ODQ4MjA1NQ==,9599,2020-09-07T19:21:42Z,2020-09-07T19:21:42Z,OWNER,"Using `replace=True` there executes `INSERT OR REPLACE` - and Dan Kennedy (SQLite maintainer) on the SQLite forums said this: > Are you using ""REPLACE INTO"", or ""UPDATE OR REPLACE"" on the ""licenses"" table without having first executed ""PRAGMA recursive_triggers = 1""? The docs note that delete triggers will not be fired in this case, which would explain things. Second paragraph under ""REPLACE"" here: > > https://www.sqlite.org/lang_conflict.html","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482355,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688482355,MDEyOklzc3VlQ29tbWVudDY4ODQ4MjM1NQ==,9599,2020-09-07T19:22:51Z,2020-09-07T19:22:51Z,OWNER,"And the SQLite documentation says: > When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, [delete triggers](https://www.sqlite.org/lang_createtrigger.html) fire if and only if [recursive triggers](https://www.sqlite.org/pragma.html#pragma_recursive_triggers) are enabled.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499650,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688499650,MDEyOklzc3VlQ29tbWVudDY4ODQ5OTY1MA==,9599,2020-09-07T20:24:35Z,2020-09-07T20:24:35Z,OWNER,"This replicates the problem: ``` (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {""table"": ""licenses"", ""count"": 7}, {""table"": ""licenses_fts_data"", ""count"": 35}, {""table"": ""licenses_fts_idx"", ""count"": 16}, {""table"": ""licenses_fts_docsize"", ""count"": 9151}, {""table"": ""licenses_fts_config"", ""count"": 1}, {""table"": ""licenses_fts"", ""count"": 7}, (github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {""table"": ""licenses"", ""count"": 7}, {""table"": ""licenses_fts_data"", ""count"": 45}, {""table"": ""licenses_fts_idx"", ""count"": 26}, {""table"": ""licenses_fts_docsize"", ""count"": 9161}, {""table"": ""licenses_fts_config"", ""count"": 1}, {""table"": ""licenses_fts"", ""count"": 7}, ``` Note how the number of rows in `licenses_fts_docsize` goes from 9151 to 9161. The number went up by ten. I used tracing from #151 to show that the following SQL executed ten times: ``` INSERT OR REPLACE INTO [licenses] ([key], [name], [node_id], [spdx_id], [url]) VALUES (?, ?, ?, ?, ?); ``` Then I tried executing `PRAGMA recursive_triggers=on;` at the start of the script. This fixed the problem - running the script did not increase the number of rows in `licenses_fts_docsize`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499924,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688499924,MDEyOklzc3VlQ29tbWVudDY4ODQ5OTkyNA==,9599,2020-09-07T20:25:40Z,2020-09-07T20:25:50Z,OWNER,"https://www.sqlite.org/pragma.html#pragma_recursive_triggers says: > Prior to SQLite [version 3.6.18](https://www.sqlite.org/releaselog/3_6_18.html) (2009-09-11), recursive triggers were not supported. The behavior of SQLite was always as if this pragma was set to OFF. Support for recursive triggers was added in version 3.6.18 but was initially turned OFF by default, for compatibility. Recursive triggers may be turned on by default in future versions of SQLite. So I think the fix is to turn on `recursive_triggers` globally by default for `sqlite-utils`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258, https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688501064,https://api.github.com/repos/simonw/sqlite-utils/issues/149,688501064,MDEyOklzc3VlQ29tbWVudDY4ODUwMTA2NA==,9599,2020-09-07T20:30:15Z,2020-09-07T20:30:38Z,OWNER,"The second challenge here is cleaning up all of those junk rows in existing `*_fts_docsize` tables. Doing that just to the demo database from https://github-to-sqlite.dogsheep.net/github.db dropped its size from 22MB to 16MB! Here's the SQL: ```sql DELETE FROM [licenses_fts_docsize] WHERE id NOT IN ( SELECT rowid FROM [licenses_fts]); ``` I can do that as part of the existing `table.optimize()` method, which optimizes FTS tables.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",695319258,