html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/datasette/issues/1384#issuecomment-1062124485,https://api.github.com/repos/simonw/datasette/issues/1384,1062124485,IC_kwDOBm6k_c4_TrvF,167160,2022-03-08T19:26:32Z,2022-03-08T19:26:32Z,NONE,"Looks like I'm late to the party here, but wanted to join the convo if there's still time before this interface is solidified in v1.0. My plugin use case is for education / social science data, which is meta-data heavy in the documentation of measurement scales, instruments, collection procedures, etc. that I want to connect to columns, tables, and dbs (and render in static pages, but looks like I can do that with the jinja plugin hook). I'm still digging in and I think @brandonrobertz 's approach will work for me at least for now, but I want to bump this thread in the meantime -- are there still plans for an async metadata hook at some point in the future? (or are you considering other directions?)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",930807135,
https://github.com/simonw/datasette/issues/1384#issuecomment-1065929510,https://api.github.com/repos/simonw/datasette/issues/1384,1065929510,IC_kwDOBm6k_c4_iMsm,167160,2022-03-12T17:49:59Z,2022-03-12T17:49:59Z,NONE,"Ok, I'm taking a slightly different approach, which I think is sort of close to the in-memory _metadata table idea.
I'm using a startup hook to load metadata / other info from the database, which I store in the datasette object for later:
```
@hookimpl
def startup(datasette):
async def inner():
datasette._mypluginmetadata = # await db query
return inner
```
Then, I can use this in other plugins:
```
@hookimpl
def render_cell(value, column, table, database, datasette):
# use datasette._mypluginmetadata
```
For my app I don't need anything to update dynamically so it's fine to pre-populate everything on startup. It's also good to have things precached especially for a hook like render_cell, which would otherwise require a ton of redundant db queries.
Makes me wonder if we could take a sort of similar caching approach with the internal _metadata table. Like have a little watchdog that could query all of the attached dbs for their _metadata tables every 5min or so, which then could be merged into the in memory _metadata table which then could be accessed sync by the plugins, or something like that.
For most the use cases I can think of, live updates don't need to take into effect immediately; refreshing a cache every 5min or on some other trigger (adjustable w a config setting) would be just fine. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",930807135,
https://github.com/simonw/datasette/issues/1384#issuecomment-1065951744,https://api.github.com/repos/simonw/datasette/issues/1384,1065951744,IC_kwDOBm6k_c4_iSIA,167160,2022-03-12T19:47:17Z,2022-03-12T19:47:17Z,NONE,"Awesome, thanks @brandonrobertz !
The plugin is close, but looks like it only grabs remote metadata, is that right? Instead what I'm wanting is to grab metadata embedded in the attached databases. Rather than extending that plugin, at this point I've realized I need a lot more flexibility in metadata for my data model (esp around formatting cell values and custom file exports) so rather than extending that I'll continue working on a plugin specific to my app.
If I'm understanding your plugin code correctly, you query the db using the sync handle every time `get_metdata` is called, right? Won't this become a pretty big bottleneck if a hook into `render_cell` is trying to read metadata / plugin config?
> Making the get_metadata async won't improve the situation by itself as only some of the code paths accessing metadata use that hook. The other paths use the internal metadata dict.
I agree -- because things like `render_cell` will potentially want to read metadata/config, `get_metadata` should really remain sync and lightweight, which we can do with something like the remote-metadata plugin that could also poll metadata tables in attached databases.
That leaves your app, where it sounds like you want changes made by the user in the browser in to be immediately reflected, rather than have to wait for the next metadata refresh. In this case I wonder if you could have your app make a sync write to the datasette object so the change would have the immediate effect, but then have a separate async polling mechanism to eventually write that change out to the database for long-term persistence. Then you'd have the best of both worlds, I think? But probably not worth the trouble if your use cases are small (and/or you're not reading metadata/config from tight loops like render_cell).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",930807135,
https://github.com/simonw/datasette/issues/1384#issuecomment-1066143991,https://api.github.com/repos/simonw/datasette/issues/1384,1066143991,IC_kwDOBm6k_c4_jBD3,167160,2022-03-13T17:13:09Z,2022-03-13T17:13:09Z,NONE,"Thanks for taking the time to reply @brandonrobertz , this is really helpful info.
> See ""Many small queries are efficient in sqlite"" for more information on the rationale here. Also note that in the datasette-live-config reference plugin, the DB connection is cached, so that eliminated most of the performance worries we had.
Ah, that's nifty! Yeah, then caching on the python side is likely a waste :) I'm new to working with sqlite so this is super good to know the many-small-queries is a common pattern
> I tested on very large Datasette deployments (hundreds of DBs, millions of rows).
For my reference, did you include a `render_cell` plugin calling `get_metadata` in those tests? I'm less concerned now that I know a little more about sqlite's caching, but that special situation will jump you to a few orders of magnitude above what the sqlite article describes (e.g. 200 vs 20,000 queries+metadata merges for a page displaying 100 rows of a 200 column table). It wouldn't scale with db size as much as # of visible cells being rendered on the page, although they would be identical queries I suppose so will cache well.
(If you didn't test this specific situation, no worries -- I'm just trying to calibrate my intuition on this and can do my own benchmarks at some point.)
> Simon talked about eventually making something like this a standard feature of Datasette
Yeah, getting metadata (and static pages as well for that matter) from internal tables definitely has my vote for including as a standard feature! Its really nice to be able to distribute a single *.db with all the metadata and static pages bundled. My metadata are sufficiently complex/domain specific that it makes sense to continue on my own plugin for now, but I'll be thinking about more general parts I can spin off as possible contributions to liveconfig (if you're open to them) or other plugins in this ecosystem.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",930807135,
https://github.com/simonw/datasette/issues/1384#issuecomment-1066194130,https://api.github.com/repos/simonw/datasette/issues/1384,1066194130,IC_kwDOBm6k_c4_jNTS,167160,2022-03-13T22:23:04Z,2022-03-13T22:23:04Z,NONE,"Ah, sorry, I didn't get what you were saying you the first time. Using _metadata_local in that way makes total sense -- I agree, refreshing metadata each cell was seeming quite excessive. Now I'm on the same page! :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",930807135,