{"html_url": "https://github.com/simonw/datasette/issues/1851#issuecomment-1294281451", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1851", "id": 1294281451, "node_id": "IC_kwDOBm6k_c5NJSrr", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-28T00:59:25Z", "updated_at": "2022-10-28T00:59:25Z", "author_association": "OWNER", "body": "I'm going to use this endpoint for bulk inserts too, so I'm closing this issue and continuing the work here:\r\n- #1866", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1421544654, "label": "API to insert a single record into an existing table"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1866#issuecomment-1294282263", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1866", "id": 1294282263, "node_id": "IC_kwDOBm6k_c5NJS4X", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-28T01:00:42Z", "updated_at": "2022-10-28T01:00:42Z", "author_association": "OWNER", "body": "I'm going to set the limit at 1,000 rows inserted at a time. I'll make this configurable using a new `max_insert_rows` setting (for consistency with `max_returned_rows`).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1426001541, "label": "API for bulk inserting records into a table"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1866#issuecomment-1294296767", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1866", "id": 1294296767, "node_id": "IC_kwDOBm6k_c5NJWa_", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-28T01:22:25Z", "updated_at": "2022-10-28T01:23:09Z", "author_association": "OWNER", "body": "Nasty catch on this one: I wanted to return the IDs of the freshly inserted rows. But... the `insert_all()` method I was planning to use from `sqlite-utils` doesn't appear to have a way of doing that:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/529110e7d8c4a6b1bbf5fb61f2e29d72aa95a611/sqlite_utils/db.py#L2813-L2835\r\n\r\nSQLite itself added a `RETURNING` statement which might help, but that is only available from version 3.35 released in March 2021: https://www.sqlite.org/lang_returning.html - which isn't commonly available yet. https://latest.datasette.io/-/versions right now shows 3.34, and https://lite.datasette.io/#/-/versions shows 3.27.2 (from Feb 2019).\r\n\r\nTwo options then:\r\n\r\n1. Even for bulk inserts do one insert at a time so I can use `cursor.lastrowid` to get the ID of the inserted record. This isn't terrible since SQLite is very fast, but it may still be a big performance hit for large inserts.\r\n2. Don't return the list of inserted rows for bulk inserts\r\n3. Default to not returning the list of inserted rows for bulk inserts, but allow the user to request that - in which case we use the slower path\r\n\r\nThat third option might be the way to go here.\r\n\r\nI should benchmark first to figure out how much of a difference this actually makes.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1426001541, "label": "API for bulk inserting records into a table"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1866#issuecomment-1294306071", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1866", "id": 1294306071, "node_id": "IC_kwDOBm6k_c5NJYsX", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-28T01:37:14Z", "updated_at": "2022-10-28T01:37:59Z", "author_association": "OWNER", "body": "Quick crude benchmark:\r\n```python\r\nimport sqlite3\r\n\r\ndb = sqlite3.connect(\":memory:\")\r\n\r\ndef create_table(db, name):\r\n db.execute(f\"create table {name} (id integer primary key, title text)\")\r\n\r\ncreate_table(db, \"single\")\r\ncreate_table(db, \"multi\")\r\ncreate_table(db, \"bulk\")\r\n\r\ndef insert_singles(titles):\r\n inserted = []\r\n for title in titles:\r\n cursor = db.execute(f\"insert into single (title) values (?)\", [title])\r\n inserted.append((cursor.lastrowid, title))\r\n return inserted\r\n\r\n\r\ndef insert_many(titles):\r\n db.executemany(f\"insert into multi (title) values (?)\", ((t,) for t in titles))\r\n\r\n\r\ndef insert_bulk(titles):\r\n db.execute(\"insert into bulk (title) values {}\".format(\r\n \", \".join(\"(?)\" for _ in titles)\r\n ), titles)\r\n\r\ntitles = [\"title {}\".format(i) for i in range(1, 10001)]\r\n```\r\nThen in iPython I ran these:\r\n```\r\nIn [14]: %timeit insert_singles(titles)\r\n23.8 ms \u00b1 535 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10 loops each)\r\n\r\nIn [13]: %timeit insert_many(titles)\r\n12 ms \u00b1 520 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\r\n\r\nIn [12]: %timeit insert_bulk(titles)\r\n2.59 ms \u00b1 25 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\r\n```\r\nSo the bulk insert really is a lot faster - 3ms compared to 24ms for single inserts, so ~8x faster.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1426001541, "label": "API for bulk inserting records into a table"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1866#issuecomment-1294316640", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1866", "id": 1294316640, "node_id": "IC_kwDOBm6k_c5NJbRg", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-28T01:51:40Z", "updated_at": "2022-10-28T01:51:40Z", "author_association": "OWNER", "body": "This needs to support the following:\r\n- Rows do not include a primary key - one is assigned by the database\r\n- Rows provide their own primary key, any clashes are errors\r\n- Rows provide their own primary key, clashes are silently ignored\r\n- Rows provide their own primary key, replacing any existing records", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1426001541, "label": "API for bulk inserting records into a table"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1866#issuecomment-1295200988", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1866", "id": 1295200988, "node_id": "IC_kwDOBm6k_c5NMzLc", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-28T16:29:55Z", "updated_at": "2022-10-28T16:29:55Z", "author_association": "OWNER", "body": "I wonder if there's something clever I could do here within a transaction?\r\n\r\nStart a transaction. Write out a temporary in-memory table with all of the existing primary keys in the table. Run the bulk insert. Then run `select pk from table where pk not in (select pk from old_pks)` to see what has changed.\r\n\r\nI don't think that's going to work well for large tables.\r\n\r\nI'm going to go with not returning inserted rows by default, unless you pass a special option requesting that.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1426001541, "label": "API for bulk inserting records into a table"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1870#issuecomment-1294285471", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1870", "id": 1294285471, "node_id": "IC_kwDOBm6k_c5NJTqf", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-28T01:06:03Z", "updated_at": "2022-10-28T01:06:03Z", "author_association": "CONTRIBUTOR", "body": "as far as i can tell, [this is where the \"immutable\" argument is used](https://github.com/sqlite/sqlite/blob/c97bb14fab566f6fa8d967c8fd1e90f3702d5b73/src/pager.c#L4926-L4931) in sqlite:\r\n\r\n```c\r\n pPager->noLock = sqlite3_uri_boolean(pPager->zFilename, \"nolock\", 0);\r\n if( (iDc & SQLITE_IOCAP_IMMUTABLE)!=0\r\n || sqlite3_uri_boolean(pPager->zFilename, \"immutable\", 0) ){\r\n vfsFlags |= SQLITE_OPEN_READONLY;\r\n goto act_like_temp_file;\r\n }\r\n```\r\n\r\nso it does set the read only flag, but then has a goto.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1426379903, "label": "don't use immutable=1, only mode=ro"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/496#issuecomment-1294408928", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/496", "id": 1294408928, "node_id": "IC_kwDOCGYnMM5NJxzg", "user": {"value": 39538958, "label": "justmars"}, "created_at": "2022-10-28T03:36:56Z", "updated_at": "2022-10-28T03:37:50Z", "author_association": "NONE", "body": "With respect to the typing of Table class itself, my interim solution:\r\n\r\n```python\r\nfrom sqlite_utils.db import Table\r\ndef tbl(self, table_name: str) -> Table:\r\n tbl = self.db[table_name]\r\n if isinstance(tbl, Table):\r\n return tbl\r\n raise Exception(f\"Missing {table_name=}\")\r\n```\r\n\r\nWith respect to @chapmanjacobd concern on the `DEFAULT` being an empty class, have also been using `# type: ignore`, e.g.\r\n\r\n```python\r\n@classmethod\r\ndef insert_list(cls, areas: list[str]):\r\n return meta.tbl(meta.Areas).insert_all(\r\n ({\"area\": a} for a in areas), ignore=True # type: ignore\r\n )\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1393202060, "label": "devrel/python api: Pylance type hinting"}, "performed_via_github_app": null}