html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/dogsheep/github-to-sqlite/issues/33#issuecomment-622279374,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/33,622279374,MDEyOklzc3VlQ29tbWVudDYyMjI3OTM3NA==,2029,2020-05-01T07:12:47Z,2020-05-01T07:12:47Z,NONE,"I also go it working with: ```yaml run: echo ${{ secrets.github_token }} | github-to-sqlite auth ```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",609950090, https://github.com/dogsheep/github-to-sqlite/issues/35#issuecomment-622213950,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/35,622213950,MDEyOklzc3VlQ29tbWVudDYyMjIxMzk1MA==,9599,2020-05-01T02:09:04Z,2020-05-01T02:09:04Z,MEMBER,"It sped up this query a lot - 2.5s down to 300ms: ```sql select repos.full_name, json_object( 'href', 'https://github.com/' || repos.full_name || '/issues/' || issues.number, 'label', '#' || issues.number ) as issue, issues.title, users.login, users.id, issues.state, issues.locked, issues.assignee, issues.milestone, issues.comments, issues.created_at, issues.updated_at, issues.closed_at, issues.author_association, issues.pull_request, issues.repo, issues.type from issues join repos on repos.id = issues.repo join users on issues.user = users.id where issues.state = 'open' and issues.user not in (9599, 27856297) and not exists ( select id from issue_comments where issue_comments.user = 9599 and issues.id = issue_comments.issue ) order by issues.updated_at desc; ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610511450, https://github.com/dogsheep/github-to-sqlite/issues/35#issuecomment-622214262,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/35,622214262,MDEyOklzc3VlQ29tbWVudDYyMjIxNDI2Mg==,9599,2020-05-01T02:10:32Z,2020-05-01T02:11:19Z,MEMBER,"This sped that query up even more - down to 4ms. ```sql create index issue_comments_issue on issue_comments(issue); ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610511450, https://github.com/simonw/datasette/issues/749#issuecomment-622450636,https://api.github.com/repos/simonw/datasette/issues/749,622450636,MDEyOklzc3VlQ29tbWVudDYyMjQ1MDYzNg==,9599,2020-05-01T16:08:46Z,2020-05-01T16:08:46Z,OWNER,"Proposed solution: on Cloud Run don't show the ""download database"" link if the database file is larger than 32MB. I can do this with a new config setting, `max_db_mb`, which is automatically set by the `publish cloudrun` command. This is consistent with the existing `max_csv_mb` setting: https://datasette.readthedocs.io/en/stable/config.html#max-csv-mb I should set `max_csv_mb` to 32MB on Cloud Run deploys as well.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610829227, https://github.com/dogsheep/github-to-sqlite/issues/36#issuecomment-622461025,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/36,622461025,MDEyOklzc3VlQ29tbWVudDYyMjQ2MTAyNQ==,9599,2020-05-01T16:34:24Z,2020-05-01T16:34:24Z,MEMBER,Blocked on #37,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610842926, https://github.com/dogsheep/github-to-sqlite/issues/10#issuecomment-622461122,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/10,622461122,MDEyOklzc3VlQ29tbWVudDYyMjQ2MTEyMg==,9599,2020-05-01T16:34:39Z,2020-05-01T16:34:39Z,MEMBER,Blocked on #37,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",516967682, https://github.com/dogsheep/github-to-sqlite/issues/12#issuecomment-622461223,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/12,622461223,MDEyOklzc3VlQ29tbWVudDYyMjQ2MTIyMw==,9599,2020-05-01T16:34:52Z,2020-05-01T16:34:52Z,MEMBER,Blocked on #37,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",520756546, https://github.com/dogsheep/github-to-sqlite/issues/37#issuecomment-622461537,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/37,622461537,MDEyOklzc3VlQ29tbWVudDYyMjQ2MTUzNw==,9599,2020-05-01T16:35:40Z,2020-05-01T16:35:40Z,MEMBER,"This will check if the view exists and has the exact same matching definition as the one we want. If it doesn't, we will drop it (if it exists) and recreate it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610843136, https://github.com/dogsheep/github-to-sqlite/issues/37#issuecomment-622461948,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/37,622461948,MDEyOklzc3VlQ29tbWVudDYyMjQ2MTk0OA==,9599,2020-05-01T16:36:42Z,2020-05-01T16:36:42Z,MEMBER,It should only create views if the underlying tables exist.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610843136, https://github.com/simonw/sqlite-utils/issues/105#issuecomment-622558889,https://api.github.com/repos/simonw/sqlite-utils/issues/105,622558889,MDEyOklzc3VlQ29tbWVudDYyMjU1ODg4OQ==,9599,2020-05-01T20:40:06Z,2020-05-01T20:40:06Z,OWNER,Documentation: https://sqlite-utils.readthedocs.io/en/latest/cli.html#listing-views,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610853576, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622561585,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622561585,MDEyOklzc3VlQ29tbWVudDYyMjU2MTU4NQ==,9599,2020-05-01T20:46:50Z,2020-05-01T20:46:50Z,OWNER,The varying number of columns thing is interesting - I don't think the tests cover that case much if at all.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622561944,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622561944,MDEyOklzc3VlQ29tbWVudDYyMjU2MTk0NA==,9599,2020-05-01T20:47:51Z,2020-05-01T20:47:51Z,OWNER,"Yup we only take the number of columns in the first record into account at the moment: https://github.com/simonw/sqlite-utils/blob/d56029549acae0b0ea94c5a0f783e3b3895d9218/sqlite_utils/db.py#L1007-L1016","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622563059,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622563059,MDEyOklzc3VlQ29tbWVudDYyMjU2MzA1OQ==,9599,2020-05-01T20:51:01Z,2020-05-01T20:51:01Z,OWNER,"I'm not sure what to do about this. I was thinking the solution would be to look at ALL of the rows in a batch before deciding on the maximum number of columns, but that doesn't work because we calculate batch size based on the number of columns! I think my recommendation here is to manually pass a `batch_size=` argument to `.insert_all()` if you run into this error. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622563188,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622563188,MDEyOklzc3VlQ29tbWVudDYyMjU2MzE4OA==,9599,2020-05-01T20:51:24Z,2020-05-01T20:51:29Z,OWNER,Hopefully anyone who runs into this problem in the future will search for and find this issue thread!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622565276,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622565276,MDEyOklzc3VlQ29tbWVudDYyMjU2NTI3Ng==,9599,2020-05-01T20:57:16Z,2020-05-01T20:57:16Z,OWNER,I'm reconsidering this: I think this is going to happen ANY time someone has at least one row that is wider than the first row. So at the very least I should show a more understandable error message.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622584433,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622584433,MDEyOklzc3VlQ29tbWVudDYyMjU4NDQzMw==,9599,2020-05-01T21:57:52Z,2020-05-01T21:57:52Z,OWNER,@b0b5h4rp13 I'm having trouble creating a test that triggers this bug. Could you share a chunk of code that replicates what you're seeing here?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472, https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622587177,https://api.github.com/repos/simonw/sqlite-utils/issues/103,622587177,MDEyOklzc3VlQ29tbWVudDYyMjU4NzE3Nw==,9599,2020-05-01T22:07:51Z,2020-05-01T22:07:51Z,OWNER,"This is my failed attempt to recreate the bug (plus some extra debugging output): ```diff % git diff diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py index dd49d5c..ea42aea 100644 --- a/sqlite_utils/db.py +++ b/sqlite_utils/db.py @@ -1013,7 +1013,11 @@ class Table(Queryable): assert ( num_columns <= SQLITE_MAX_VARS ), ""Rows can have a maximum of {} columns"".format(SQLITE_MAX_VARS) + print(""default batch_size = "", batch_size) batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns)) + print(""new batch_size = {},num_columns = {}, MAX_VARS // num_columns = {}"".format( + batch_size, num_columns, SQLITE_MAX_VARS // num_columns + )) self.last_rowid = None self.last_pk = None for chunk in chunks(itertools.chain([first_record], records), batch_size): @@ -1124,6 +1128,9 @@ class Table(Queryable): ) flat_values = list(itertools.chain(*values)) queries_and_params = [(sql, flat_values)] + print(sql.count(""?""), len(flat_values)) + + # print(json.dumps(queries_and_params, indent=4)) with self.db.conn: for query, params in queries_and_params: diff --git a/tests/test_create.py b/tests/test_create.py index 5290cd8..52940df 100644 --- a/tests/test_create.py +++ b/tests/test_create.py @@ -853,3 +853,33 @@ def test_create_with_nested_bytes(fresh_db): record = {""id"": 1, ""data"": {""foo"": b""bytes""}} fresh_db[""t""].insert(record) assert [{""id"": 1, ""data"": '{""foo"": ""b\'bytes\'""}'}] == list(fresh_db[""t""].rows) + + +def test_create_throws_useful_error_with_increasing_number_of_columns(fresh_db): + # https://github.com/simonw/sqlite-utils/issues/103 + def rows(): + yield {""name"": 0} + for i in range(1, 1001): + yield { + ""name"": i, + ""age"": i, + ""size"": i, + ""name2"": i, + ""age2"": i, + ""size2"": i, + ""name3"": i, + ""age3"": i, + ""size3"": i, + ""name4"": i, + ""age4"": i, + ""size4"": i, + ""name5"": i, + ""age5"": i, + ""size5"": i, + ""name6"": i, + ""age6"": i, + ""size6"": i, + } + + fresh_db[""t""].insert_all(rows()) + assert 1001 == fresh_db[""t""].count ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",610517472,