github: issue_comments: 12 rows where issue = 1426001541

12 rows where issue = 1426001541

Search:

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
1293887808	https://github.com/simonw/datasette/issues/1866#issuecomment-1293887808	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NHylA	simonw 9599	2022-10-27T18:07:02Z	2022-10-27T18:07:02Z	OWNER	Error handling is really important here. What should happen if you submit 100 records and one of them has some kind of validation error? How should that error be reported back to you? I'm inclined to say that it defaults to all-or-nothing in a transaction - but there should be a `"continue_on_error": true` option (or similar) which causes it to insert the ones that are valid while reporting back the ones that are invalid.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1293890684	https://github.com/simonw/datasette/issues/1866#issuecomment-1293890684	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NHzR8	simonw 9599	2022-10-27T18:09:52Z	2022-10-27T18:09:52Z	OWNER	Should this API accept CSV/TSV etc in addition to JSON? I'm torn on this one. My initial instinct is that it should not - and there should instead be a Datasette client library / CLI tool you can use that knows how to turn CSV into batches of JSON calls for when you want to upload a CSV file. I don't think the usability of `curl https://datasette/db/table -F 'data=@path/to/file.csv' -H 'Authentication: Bearer xxx'` is particularly great compared to something like`datasette client insert https://datasette/ db table file.csv --csv` (where the command version could store API tokens for you too).	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1293891191	https://github.com/simonw/datasette/issues/1866#issuecomment-1293891191	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NHzZ3	simonw 9599	2022-10-27T18:10:22Z	2022-10-27T18:10:22Z	OWNER	So for the moment I'm just going to concentrate on the JSON API. I can consider CSV variants later on, or as plugins, or both.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1293891876	https://github.com/simonw/datasette/issues/1866#issuecomment-1293891876	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NHzkk	simonw 9599	2022-10-27T18:11:05Z	2022-10-27T18:11:05Z	OWNER	Likewise for newline-delimited JSON. While it's tempting to want to accept that as an ingest format (because it's nice to generate and stream) I think it's better to have a client application that can turn a stream of newline-delimited JSON into batched JSON inserts.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1293892818	https://github.com/simonw/datasette/issues/1866#issuecomment-1293892818	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NHzzS	simonw 9599	2022-10-27T18:12:02Z	2022-10-27T18:12:02Z	OWNER	There's one catch with batched inserts: if your CLI tool fails half way through you could end up with a partially populated table - since a bunch of batches will have succeeded first. I think that's OK. In the future I may want to come up with a way to run multiple batches of inserts inside a single transaction, but I can ignore that for the first release of this feature.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1293893789	https://github.com/simonw/datasette/issues/1866#issuecomment-1293893789	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NH0Cd	simonw 9599	2022-10-27T18:13:00Z	2022-10-27T18:13:00Z	OWNER	If people care about that kind of thing they could always push all of their inserts to a table called `_tablename` and then atomically rename that once they've uploaded all of the data (assuming I provide an atomic-rename-this-table mechanism).	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1294282263	https://github.com/simonw/datasette/issues/1866#issuecomment-1294282263	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NJS4X	simonw 9599	2022-10-28T01:00:42Z	2022-10-28T01:00:42Z	OWNER	I'm going to set the limit at 1,000 rows inserted at a time. I'll make this configurable using a new `max_insert_rows` setting (for consistency with `max_returned_rows`).	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1294296767	https://github.com/simonw/datasette/issues/1866#issuecomment-1294296767	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NJWa_	simonw 9599	2022-10-28T01:22:25Z	2022-10-28T01:23:09Z	OWNER	Nasty catch on this one: I wanted to return the IDs of the freshly inserted rows. But... the `insert_all()` method I was planning to use from `sqlite-utils` doesn't appear to have a way of doing that: https://github.com/simonw/sqlite-utils/blob/529110e7d8c4a6b1bbf5fb61f2e29d72aa95a611/sqlite_utils/db.py#L2813-L2835 SQLite itself added a `RETURNING` statement which might help, but that is only available from version 3.35 released in March 2021: https://www.sqlite.org/lang_returning.html - which isn't commonly available yet. https://latest.datasette.io/-/versions right now shows 3.34, and https://lite.datasette.io/#/-/versions shows 3.27.2 (from Feb 2019). Two options then: 1. Even for bulk inserts do one insert at a time so I can use `cursor.lastrowid` to get the ID of the inserted record. This isn't terrible since SQLite is very fast, but it may still be a big performance hit for large inserts. 2. Don't return the list of inserted rows for bulk inserts 3. Default to not returning the list of inserted rows for bulk inserts, but allow the user to request that - in which case we use the slower path That third option might be the way to go here. I should benchmark first to figure out how much of a difference this actually makes.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1294306071	https://github.com/simonw/datasette/issues/1866#issuecomment-1294306071	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NJYsX	simonw 9599	2022-10-28T01:37:14Z	2022-10-28T01:37:59Z	OWNER	Quick crude benchmark: ```python import sqlite3 db = sqlite3.connect(":memory:") def create_table(db, name): db.execute(f"create table {name} (id integer primary key, title text)") create_table(db, "single") create_table(db, "multi") create_table(db, "bulk") def insert_singles(titles): inserted = [] for title in titles: cursor = db.execute(f"insert into single (title) values (?)", [title]) inserted.append((cursor.lastrowid, title)) return inserted def insert_many(titles): db.executemany(f"insert into multi (title) values (?)", ((t,) for t in titles)) def insert_bulk(titles): db.execute("insert into bulk (title) values {}".format( ", ".join("(?)" for _ in titles) ), titles) titles = ["title {}".format(i) for i in range(1, 10001)] ``` Then in iPython I ran these: ``` In [14]: %timeit insert_singles(titles) 23.8 ms ± 535 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [13]: %timeit insert_many(titles) 12 ms ± 520 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [12]: %timeit insert_bulk(titles) 2.59 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` So the bulk insert really is a lot faster - 3ms compared to 24ms for single inserts, so ~8x faster.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1294316640	https://github.com/simonw/datasette/issues/1866#issuecomment-1294316640	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NJbRg	simonw 9599	2022-10-28T01:51:40Z	2022-10-28T01:51:40Z	OWNER	This needs to support the following: - Rows do not include a primary key - one is assigned by the database - Rows provide their own primary key, any clashes are errors - Rows provide their own primary key, clashes are silently ignored - Rows provide their own primary key, replacing any existing records	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1295200988	https://github.com/simonw/datasette/issues/1866#issuecomment-1295200988	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5NMzLc	simonw 9599	2022-10-28T16:29:55Z	2022-10-28T16:29:55Z	OWNER	I wonder if there's something clever I could do here within a transaction? Start a transaction. Write out a temporary in-memory table with all of the existing primary keys in the table. Run the bulk insert. Then run `select pk from table where pk not in (select pk from old_pks)` to see what has changed. I don't think that's going to work well for large tables. I'm going to go with not returning inserted rows by default, unless you pass a special option requesting that.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541
1313128913	https://github.com/simonw/datasette/issues/1866#issuecomment-1313128913	https://api.github.com/repos/simonw/datasette/issues/1866	IC_kwDOBm6k_c5ORMHR	simonw 9599	2022-11-14T05:48:22Z	2022-11-14T05:48:22Z	OWNER	I changed my mind about the `"return_rows": true` option - I'm going to rename it to `"return": true`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	API for bulk inserting records into a table 1426001541

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);