issues
6 rows where user = 96218
This data as json, CSV (advanced)
Suggested facets: title, state, comments, closed_at, type, created_at (date), updated_at (date), closed_at (date)
id ▼ | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
686978131 | MDU6SXNzdWU2ODY5NzgxMzE= | 139 | insert_all(..., alter=True) should work for new columns introduced after the first 100 records | simonwiles 96218 | closed | 0 | 7 | 2020-08-27T06:25:25Z | 2020-08-28T22:48:51Z | 2020-08-28T22:30:14Z | CONTRIBUTOR | Is there a way to make `.insert_all()` work properly when new columns are introduced outside the first 100 records (with or without the `alter=True` argument)? I'm using `.insert_all()` to bulk insert ~3-4k records at a time and it is common for records to need to introduce new columns. However, if new columns are introduced after the first 100 records, `sqlite_utils` doesn't even raise the `OperationalError: table ... has no column named ...` exception; it just silently drops the extra data and moves on. It took me a while to find this little snippet in the [documentation for `.insert_all()`](https://sqlite-utils.readthedocs.io/en/stable/python-api.html#bulk-inserts) (it's not mentioned under [Adding columns automatically on insert/update](https://sqlite-utils.readthedocs.io/en/stable/python-api.html#bulk-inserts)): > The column types used in the CREATE TABLE statement are automatically derived from the types of data in that first batch of rows. **_Any additional or missing columns in subsequent batches will be ignored._** I tried changing the `batch_size` argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem. Is there a way around this that you would suggest? It seems like it should raise an exception at least. | sqlite-utils 140912432 | issue | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | completed | ||||||
688386219 | MDExOlB1bGxSZXF1ZXN0NDc1NjY1OTg0 | 142 | insert_all(..., alter=True) should work for new columns introduced after the first 100 records | simonwiles 96218 | closed | 0 | 3 | 2020-08-28T22:22:57Z | 2020-08-30T07:28:23Z | 2020-08-28T22:30:14Z | CONTRIBUTOR | simonw/sqlite-utils/pulls/142 | Closes #139. | sqlite-utils 140912432 | pull | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/142/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 0 | |||||
688659182 | MDU6SXNzdWU2ODg2NTkxODI= | 145 | Bug when first record contains fewer columns than subsequent records | simonwiles 96218 | closed | 0 | 2 | 2020-08-30T05:44:44Z | 2020-09-08T23:21:23Z | 2020-09-08T23:21:23Z | CONTRIBUTOR | `insert_all()` selects the maximum batch size based on the number of fields in the first record. If the first record has fewer fields than subsequent records (and `alter=True` is passed), this can result in SQL statements with more than the maximum permitted number of host parameters. This situation is perhaps unlikely to occur, but could happen if the first record had, say, 10 columns, such that `batch_size` (based on `SQLITE_MAX_VARIABLE_NUMBER = 999`) would be 99. If the next 98 rows had 11 columns, the resulting SQL statement for the first batch would have `10 * 1 + 11 * 98 = 1088` host parameters (and subsequent batches, if the data were consistent from thereon out, would have `99 * 11 = 1089`). I suspect that this bug is masked somewhat by the fact that while: > [`SQLITE_MAX_VARIABLE_NUMBER`](https://www.sqlite.org/limits.html#max_variable_number) ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0. it is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with `SQLITE_MAX_VARIABLE_NUMBER` set to 250,000, and I believe this is the case for homebrew installations too. A test for this issue might look like this: ```python def test_columns_not_in_first_record_should_not_cause_batch_to_be_too_large(fresh_db): # sqlite on homebrew and Debian/Ubuntu etc. is typically compiled with # SQLITE_MAX_VARIABLE_NUMBER set to 250,000, so we need to exceed this value to # trigger the error on these systems. THRESHOLD = 250000 extra_columns = 1 + (THRESHOLD - 1) // 99 records = [ {"c0": "first record"}, # one column in first record -> batch_size = 100 # fill out the batch with 99 records with enough columns to exceed THRESHOLD *[ dict([("c{}".format(i), j) for i in range(extra_columns)]) for j in range(99) ] ] try: fresh_db["too_many_columns"].insert_all(records, a… | sqlite-utils 140912432 | issue | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/145/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | completed | ||||||
688668680 | MDExOlB1bGxSZXF1ZXN0NDc1ODc0NDkz | 146 | Handle case where subsequent records (after first batch) include extra columns | simonwiles 96218 | closed | 0 | 5 | 2020-08-30T07:13:58Z | 2020-09-08T23:20:37Z | 2020-09-08T23:20:37Z | CONTRIBUTOR | simonw/sqlite-utils/pulls/146 | Addresses #145. I think this should do the job. If it meets with your approval I'll update this PR to include an update to the documentation -- I came across this bug while preparing a PR to update the documentation around `batch_size` in any event. | sqlite-utils 140912432 | pull | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 0 | |||||
688670158 | MDU6SXNzdWU2ODg2NzAxNTg= | 147 | SQLITE_MAX_VARS maybe hard-coded too low | simonwiles 96218 | open | 0 | 7 | 2020-08-30T07:26:45Z | 2021-02-15T21:27:55Z | CONTRIBUTOR | I came across this while about to open an issue and PR against the documentation for `batch_size`, which is a bit incomplete. As mentioned in #145, while: > [`SQLITE_MAX_VARIABLE_NUMBER`](https://www.sqlite.org/limits.html#max_variable_number) ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0. it is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with SQLITE_MAX_VARIABLE_NUMBER set to 250,000, and I believe this is the case for homebrew installations too. In working to understand what `batch_size` was actually doing and why, I realized that by setting `SQLITE_MAX_VARS` in `db.py` to match the value my sqlite was compiled with (I'm on Debian), I was able to decrease the time to `insert_all()` my test data set (~128k records across 7 tables) from ~26.5s to ~3.5s. Given that this about .05% of my total dataset, this is time I am keen to save... Unfortunately, it seems that `sqlite3` in the python standard library doesn't expose the `get_limit()` C API (even though `pysqlite` used to), so it's hard to know what value sqlite has been compiled with (note that this could mean, I suppose, that it's less than 999, and even hardcoding `SQLITE_MAX_VARS` to the conservative default might not be adequate. It can also be lowered -- but not raised -- at runtime). The best I could come up with is `echo "" | sqlite3 -cmd ".limits variable_number"` (only available in `sqlite >= 2015-05-07 (3.8.10)`). Obviously this couldn't be relied upon in `sqlite_utils`, but I wonder what your opinion would be about exposing `SQLITE_MAX_VARS` as a user-configurable parameter (with suitable "here be dragons" warnings)? I'm going to go ahead and monkey-patch it for my purposes in any event, but it seems like it might be worth considering. | sqlite-utils 140912432 | issue | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | ||||||||
697030843 | MDExOlB1bGxSZXF1ZXN0NDgzMDI3NTg3 | 156 | Typos in tests | simonwiles 96218 | closed | 0 | 1 | 2020-09-09T18:00:58Z | 2020-09-09T18:24:50Z | 2020-09-09T18:21:23Z | CONTRIBUTOR | simonw/sqlite-utils/pulls/156 | One of these is my fault, and the other is one I just happened to come across. They're harmless, but might as well be fixed. | sqlite-utils 140912432 | pull | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/156/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 0 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [pull_request] TEXT, [body] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT , [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);