issue_comments: 752257666

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/datasette/issues/1160#issuecomment-752257666	https://api.github.com/repos/simonw/datasette/issues/1160	752257666	MDEyOklzc3VlQ29tbWVudDc1MjI1NzY2Ng==	9599	2020-12-29T22:09:18Z	2020-12-29T22:09:18Z	OWNER	### Figuring out the API design I want to be able to support different formats, and be able to parse them into tables either streaming or in one go depending on if the format supports that. Ideally I want to be able to pull the first 1,024 bytes for the purpose of detecting the format, then replay those bytes again later. I'm considering this a stretch goal though. CSV is easy to parse as a stream - here’s [how sqlite-utils does it](https://github.com/simonw/sqlite-utils/blob/f1277f638f3a54a821db6e03cb980adad2f2fa35/sqlite_utils/cli.py#L630): dialect = "excel-tab" if tsv else "excel" with file_progress(json_file, silent=silent) as json_file: reader = csv_std.reader(json_file, dialect=dialect) headers = next(reader) docs = (dict(zip(headers, row)) for row in reader) Problem: using `db.insert_all()` could block for a long time on a big set of rows. Probably easiest to batch the records before calling `insert_all()` and then run a batch at a time using a `db.execute_write_fn()` call.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	775666296