issue_comments
13 rows where issue = 749283032
This data as json, CSV (advanced)
Suggested facets: user, author_association, reactions, created_at (date), updated_at (date)
id ▼ | html_url | issue_url | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
732543700 | https://github.com/simonw/datasette/issues/1101#issuecomment-732543700 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDczMjU0MzcwMA== | simonw 9599 | 2020-11-24T02:20:30Z | 2020-11-24T02:20:30Z | OWNER | Current design: https://docs.datasette.io/en/stable/plugin_hooks.html#register-output-renderer-datasette ```python @hookimpl def register_output_renderer(datasette): return { "extension": "test", "render": render_demo, "can_render": can_render_demo, # Optional } ``` Where `render_demo` looks something like this: ```python async def render_demo(datasette, columns, rows): db = datasette.get_database() result = await db.execute("select sqlite_version()") first_row = " | ".join(columns) lines = [first_row] lines.append("=" * len(first_row)) for row in rows: lines.append(" | ".join(row)) return Response( "\n".join(lines), content_type="text/plain; charset=utf-8", headers={"x-sqlite-version": result.first()[0]} ) ``` Meanwhile here's where the CSV streaming mode is implemented: https://github.com/simonw/datasette/blob/4bac9f18f9d04e5ed10f072502bcc508e365438e/datasette/views/base.py#L297-L380 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
732544590 | https://github.com/simonw/datasette/issues/1101#issuecomment-732544590 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDczMjU0NDU5MA== | simonw 9599 | 2020-11-24T02:22:55Z | 2020-11-24T02:22:55Z | OWNER | The trick I'm using here is to follow the `next_url` in order to paginate through all of the matching results. The loop calls the `data()` method multiple times, once for each page of results: https://github.com/simonw/datasette/blob/4bac9f18f9d04e5ed10f072502bcc508e365438e/datasette/views/base.py#L304-L307 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
755128038 | https://github.com/simonw/datasette/issues/1101#issuecomment-755128038 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDc1NTEyODAzOA== | simonw 9599 | 2021-01-06T07:10:22Z | 2021-01-06T07:10:22Z | OWNER | Yet another use-case for this: I want to be able to stream newline-delimited JSON in order to better import into Pandas: pandas.read_json("https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_shape=array&_nl=on", lines=True) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
755133937 | https://github.com/simonw/datasette/issues/1101#issuecomment-755133937 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDc1NTEzMzkzNw== | simonw 9599 | 2021-01-06T07:25:48Z | 2021-01-06T07:26:43Z | OWNER | Idea: instead of returning a dictionary, `register_output_renderer` could return an object. The object could have the following properties: - `.extension` - the extension to use - `.can_render(...)` - says if it can render this - `.can_stream(...)` - says if streaming is supported - `async .stream_rows(rows_iterator, send)` - method that loops through all rows and uses `send` to send them to the response in the correct format I can then deprecate the existing `dict` return type for 1.0. | {"total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
755134771 | https://github.com/simonw/datasette/issues/1101#issuecomment-755134771 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDc1NTEzNDc3MQ== | simonw 9599 | 2021-01-06T07:28:01Z | 2021-01-06T07:28:01Z | OWNER | With this structure it will become possible to stream non-newline-delimited JSON array-of-objects too - the `stream_rows()` method could output `[` first, then each row followed by a comma, then `]` after the very last row. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
869191854 | https://github.com/simonw/datasette/issues/1101#issuecomment-869191854 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDg2OTE5MTg1NA== | eyeseast 25778 | 2021-06-27T16:42:14Z | 2021-06-27T16:42:14Z | CONTRIBUTOR | This would really help with this issue: https://github.com/eyeseast/datasette-geojson/issues/7 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
869812567 | https://github.com/simonw/datasette/issues/1101#issuecomment-869812567 | https://api.github.com/repos/simonw/datasette/issues/1101 | MDEyOklzc3VlQ29tbWVudDg2OTgxMjU2Nw== | simonw 9599 | 2021-06-28T16:06:57Z | 2021-06-28T16:07:24Z | OWNER | Relevant blog post: https://simonwillison.net/2021/Jun/25/streaming-large-api-responses/ - including notes on efficiently streaming formats with some kind of separator in between the records (regular JSON). > Some export formats are friendlier for streaming than others. CSV and TSV are pretty easy to stream, as is newline-delimited JSON. > > Regular JSON requires a bit more thought: you can output a `[` character, then output each row in a stream with a comma suffix, then skip the comma for the last row and output a `]`. Doing that requires peeking ahead (looping two at a time) to verify that you haven't yet reached the end. > > Or... Martin De Wulf [pointed out](https://twitter.com/madewulf/status/1405559088994467844) that you can output the first row, then output every other row with a preceeding comma---which avoids the whole "iterate two at a time" problem entirely. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
1105571003 | https://github.com/simonw/datasette/issues/1101#issuecomment-1105571003 | https://api.github.com/repos/simonw/datasette/issues/1101 | IC_kwDOBm6k_c5B5ay7 | simonw 9599 | 2022-04-21T18:10:38Z | 2022-04-21T18:10:46Z | OWNER | Maybe the simplest design for this is to add an optional `can_stream` to the contract: ```python @hookimpl def register_output_renderer(datasette): return { "extension": "tsv", "render": render_tsv, "can_render": lambda: True, "can_stream": lambda: True } ``` When streaming, a new parameter could be passed to the render function - maybe `chunks` - which is an iterator/generator over a sequence of chunks of rows. Or it could use the existing `rows` parameter but treat that as an iterator? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
1105588651 | https://github.com/simonw/datasette/issues/1101#issuecomment-1105588651 | https://api.github.com/repos/simonw/datasette/issues/1101 | IC_kwDOBm6k_c5B5fGr | eyeseast 25778 | 2022-04-21T18:15:39Z | 2022-04-21T18:15:39Z | CONTRIBUTOR | What if you split rendering and streaming into two things: - `render` is a function that returns a response - `stream` is a function that sends chunks, or yields chunks passed to an ASGI `send` callback That way current plugins still work, and streaming is purely additive. A `stream` function could get a cursor or iterator of rows, instead of a list, so it could more efficiently handle large queries. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
1105608964 | https://github.com/simonw/datasette/issues/1101#issuecomment-1105608964 | https://api.github.com/repos/simonw/datasette/issues/1101 | IC_kwDOBm6k_c5B5kEE | simonw 9599 | 2022-04-21T18:26:29Z | 2022-04-21T18:26:29Z | OWNER | I'm questioning if the mechanisms should be separate at all now - a single response rendering is really just a case of a streaming response that only pulls the first N records from the iterator. It probably needs to be an `async for` iterator, which I've not worked with much before. Good opportunity to learn. This actually gets a fair bit more complicated due to the work I'm doing right now to improve the default JSON API: - #1709 I want to do things like make faceting results optionally available to custom renderers - which is a separate concern from streaming rows. I'm going to poke around with a bunch of prototypes and see what sticks. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
1105615625 | https://github.com/simonw/datasette/issues/1101#issuecomment-1105615625 | https://api.github.com/repos/simonw/datasette/issues/1101 | IC_kwDOBm6k_c5B5lsJ | simonw 9599 | 2022-04-21T18:31:41Z | 2022-04-21T18:32:22Z | OWNER | The `datasette-geojson` plugin is actually an interesting case here, because of the way it converts SpatiaLite geometries into GeoJSON: https://github.com/eyeseast/datasette-geojson/blob/602c4477dc7ddadb1c0a156cbcd2ef6688a5921d/datasette_geojson/__init__.py#L61-L66 ```python if isinstance(geometry, bytes): results = await db.execute( "SELECT AsGeoJSON(:geometry)", {"geometry": geometry} ) return geojson.loads(results.single_value()) ``` That actually seems to work really well as-is, but it does worry me a bit that it ends up having to execute an extra `SELECT` query for every single returned row - especially in streaming mode where it might be asked to return 1m rows at once. My PostgreSQL/MySQL engineering brain says that this would be better handled by doing a chunk of these (maybe 100) at once, to avoid the per-query-overhead - but with SQLite that might not be necessary. At any rate, this is one of the reasons I'm interested in "iterate over this sequence of chunks of 100 rows at a time" as a potential option here. Of course, a better solution would be for `datasette-geojson` to have a way to influence the SQL query before it is executed, adding a `AsGeoJSON(geometry)` clause to it - so that's something I'm open to as well. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
1105642187 | https://github.com/simonw/datasette/issues/1101#issuecomment-1105642187 | https://api.github.com/repos/simonw/datasette/issues/1101 | IC_kwDOBm6k_c5B5sLL | eyeseast 25778 | 2022-04-21T18:59:08Z | 2022-04-21T18:59:08Z | CONTRIBUTOR | Ha! That was your idea (and a good one). But it's probably worth measuring to see what overhead it adds. It did require both passing in the database and making the whole thing `async`. Just timing the queries themselves: 1. [Using `AsGeoJSON(geometry) as geometry`](https://alltheplaces-datasette.fly.dev/alltheplaces?sql=select%0D%0A++id%2C%0D%0A++properties%2C%0D%0A++AsGeoJSON%28geometry%29+as+geometry%2C%0D%0A++spider%0D%0Afrom%0D%0A++places%0D%0Aorder+by%0D%0A++id%0D%0Alimit%0D%0A++1000) takes 10.235 ms 2. [Leaving as binary](https://alltheplaces-datasette.fly.dev/alltheplaces?sql=select%0D%0A++id%2C%0D%0A++properties%2C%0D%0A++geometry%2C%0D%0A++spider%0D%0Afrom%0D%0A++places%0D%0Aorder+by%0D%0A++id%0D%0Alimit%0D%0A++1000) takes 8.63 ms Looking at the network panel: 1. Takes about 200 ms for the `fetch` request 2. Takes about 300 ms I'm not sure how best to time the GeoJSON generation, but it would be interesting to check. Maybe I'll write a plugin to add query times to response headers. The other thing to consider with async streaming is that it might be well-suited for a slower response. When I have to get the whole result and send a response in a fixed amount of time, I need the most efficient query possible. If I can hang onto a connection and get things one chunk at a time, maybe it's ok if there's some overhead. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 | |
1399341761 | https://github.com/simonw/datasette/issues/1101#issuecomment-1399341761 | https://api.github.com/repos/simonw/datasette/issues/1101 | IC_kwDOBm6k_c5TaELB | simonw 9599 | 2023-01-21T22:07:19Z | 2023-01-21T22:07:19Z | OWNER | Idea for supporting streaming with the `register_output_renderer` hook: ```python @hookimpl def register_output_renderer(datasette): return { "extension": "test", "render": render_demo, "can_render": can_render_demo, "render_stream": render_demo_stream, # This is new } ``` So there's a new `"render_stream"` key which can be returned, which if present means that the output renderer supports streaming. I'll play around with the design of that function signature in: - #1999 - #1062 | {"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | register_output_renderer() should support streaming data 749283032 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);