html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app https://github.com/dogsheep/google-takeout-to-sqlite/issues/2#issuecomment-747126777,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/2,747126777,MDEyOklzc3VlQ29tbWVudDc0NzEyNjc3Nw==,9599,simonw,2020-12-17T00:36:52Z,2020-12-17T00:36:52Z,MEMBER,The memory profiler tricks I used in https://github.com/dogsheep/healthkit-to-sqlite/issues/7 could help figure out what's going on here.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",769376447,killed by oomkiller on large location-history, https://github.com/dogsheep/google-takeout-to-sqlite/issues/2#issuecomment-747130908,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/2,747130908,MDEyOklzc3VlQ29tbWVudDc0NzEzMDkwOA==,231498,khimaros,2020-12-17T00:47:04Z,2020-12-17T00:47:43Z,NONE,"it looks like almost all of the memory consumption is coming from `json.load()`. another direction here may be to use the new ""Semantic Location History"" data which is already broken down by year and month. it also provides much more interesting data, such as estimated address, form of travel, etc.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",769376447,killed by oomkiller on large location-history, https://github.com/simonw/datasette/issues/1149#issuecomment-747207487,https://api.github.com/repos/simonw/datasette/issues/1149,747207487,MDEyOklzc3VlQ29tbWVudDc0NzIwNzQ4Nw==,9599,simonw,2020-12-17T05:05:08Z,2020-12-17T05:05:08Z,OWNER,"I think what I want is for it to be easy to reuse portions of Datasette's CSS - the bit that styles the cog menu for example - without pulling in the whole thing. I tried linking in the `` stylesheet and the page broke, wildly: That's because Datasette's [built-in CSS](https://github.com/simonw/datasette/blob/0.53/datasette/static/app.css) applies styles directly to a whole bunch of different tags - `body`, `header`, `footer` etc - which means that if you import that stylesheet it can play havoc with the site you have already built.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",769520939,Make it easier to theme Datasette with CSS, https://github.com/simonw/datasette/issues/1149#issuecomment-747207787,https://api.github.com/repos/simonw/datasette/issues/1149,747207787,MDEyOklzc3VlQ29tbWVudDc0NzIwNzc4Nw==,9599,simonw,2020-12-17T05:06:16Z,2020-12-17T05:06:16Z,OWNER,"So, an idea: what if Datasette's default CSS applied only to elements with classes - or maybe to childen of a `body class=""datasette""` element? In such a way that you could write your own custom HTML that reused elements of Datasette's CSS - the cog menu styling for example - but only on an opt-in basis?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",769520939,Make it easier to theme Datasette with CSS, https://github.com/simonw/datasette/issues/741#issuecomment-747208543,https://api.github.com/repos/simonw/datasette/issues/741,747208543,MDEyOklzc3VlQ29tbWVudDc0NzIwODU0Mw==,9599,simonw,2020-12-17T05:09:03Z,2020-12-17T05:09:03Z,OWNER,I really like this in `datasette-publish-vercel` - I'm definitely going to bring this to the other publish implementations as well.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",607223136,"Replace ""datasette publish --extra-options"" with ""--setting""", https://github.com/simonw/datasette/issues/1005#issuecomment-747209115,https://api.github.com/repos/simonw/datasette/issues/1005,747209115,MDEyOklzc3VlQ29tbWVudDc0NzIwOTExNQ==,9599,simonw,2020-12-17T05:11:04Z,2020-12-17T05:11:04Z,OWNER,Tracking ticket for the next HTTPX release is https://github.com/encode/httpx/pull/1403,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",718259202,Remove xfail tests when new httpx is released, https://github.com/simonw/datasette/issues/461#issuecomment-747734273,https://api.github.com/repos/simonw/datasette/issues/461,747734273,MDEyOklzc3VlQ29tbWVudDc0NzczNDI3Mw==,9599,simonw,2020-12-17T22:14:46Z,2020-12-17T22:14:46Z,OWNER,"I've been thinking about this a bunch. For Datasette to be useful as a private repository of data (Datasette Library, #417) it's crucial that it can handle a much, much larger number of databases. This makes me worry about how many connections (and open file handles) it makes sense to have open at one time. I realize now that this is much less of a problem for private instances. Public instances on the internet could get traffic to any database at any time, so connections could easily get out of control. A private instance with only a few users could instead get away with only opening connections to databases in ""active use"". This does however make it even more important for Datasette to maintain a cached set of metadata about the tables - which is also needed to power this feature.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",443021509,Paginate + search for databases/tables on the homepage, https://github.com/simonw/datasette/issues/1150#issuecomment-747754082,https://api.github.com/repos/simonw/datasette/issues/1150,747754082,MDEyOklzc3VlQ29tbWVudDc0Nzc1NDA4Mg==,9599,simonw,2020-12-17T23:04:13Z,2020-12-17T23:04:13Z,OWNER,"Pages that need a list of all databases - the index page and /-/databases for example - could trigger a ""check for new directories in the configured directories"" scan. That scan would run at most once every 5 (n) seconds - the check is triggered if it’s run more recently than that it doesn’t run. Hopefully this means it could be done as a blocking operation, rather than trying to run it in a thread. When it runs it scans for *.db or *.sqlite files (maybe one or two other extensions) that it hasn’t seen before. It also checks that the existing list of known database files still exists. If it finds any new ones it connects to them once to run `.schema`. It also runs `PRAGMA schema_version` on each known database so that it can compare the schema version number to the last one it saw. That's how it detects if there are new tables or if the cached schema needs to be updated.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747754229,https://api.github.com/repos/simonw/datasette/issues/1150,747754229,MDEyOklzc3VlQ29tbWVudDc0Nzc1NDIyOQ==,9599,simonw,2020-12-17T23:04:38Z,2020-12-17T23:04:38Z,OWNER,"Open question: will this work for hundreds of database files, or is the overhead of connecting to each of 100 databases in turn to run `PRAGMA schema_version` too high?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747764712,https://api.github.com/repos/simonw/datasette/issues/1150,747764712,MDEyOklzc3VlQ29tbWVudDc0Nzc2NDcxMg==,9599,simonw,2020-12-17T23:16:31Z,2020-12-17T23:16:31Z,OWNER,"Quick micro-benchmark, run against a folder with 46 database files adding up to 1.4GB total: ```python import pathlib, sqlite3, time paths = list(pathlib.Path(""."").glob('*.db')) def schema_version(path): db = sqlite3.connect(path) version = db.execute(""PRAGMA schema_version"").fetchall()[0] db.close() return version def all(): versions = {} for path in paths: versions[path.name] = schema_version(path) return versions start = time.time(); all(); print(time.time() - start) # 0.012346982955932617 ``` So that's 12ms. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747766310,https://api.github.com/repos/simonw/datasette/issues/1150,747766310,MDEyOklzc3VlQ29tbWVudDc0Nzc2NjMxMA==,9599,simonw,2020-12-17T23:20:49Z,2020-12-17T23:20:49Z,OWNER,"I tried against my entire `~/Development/Dropbox` folder - deeply nested with 381 SQLite database files in sub-folders - and it took 25s! But it turned out 23.9s of that was the call to `pathlib.Path(""/Users/simon/Dropbox/Development"").glob('**/*.db')`. So it looks like connecting to a SQLite database file and getting the schema version is extremely fast. Scanning directories is slower.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747767055,https://api.github.com/repos/simonw/datasette/issues/1150,747767055,MDEyOklzc3VlQ29tbWVudDc0Nzc2NzA1NQ==,9599,simonw,2020-12-17T23:22:41Z,2020-12-17T23:22:41Z,OWNER,"It's just recursion that's expensive. I created 380 empty SQLite databases in a folder and timed `list(pathlib.Path(""/tmp"").glob(""*.db""));` and it took 0.002s. So maybe I tell users that all SQLite databases have to be in the root folder.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747767499,https://api.github.com/repos/simonw/datasette/issues/1150,747767499,MDEyOklzc3VlQ29tbWVudDc0Nzc2NzQ5OQ==,9599,simonw,2020-12-17T23:23:44Z,2020-12-17T23:23:44Z,OWNER,Grabbing the schema version of 380 files in the root directory takes 70ms.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747767598,https://api.github.com/repos/simonw/datasette/issues/1150,747767598,MDEyOklzc3VlQ29tbWVudDc0Nzc2NzU5OA==,9599,simonw,2020-12-17T23:24:03Z,2020-12-17T23:24:03Z,OWNER,"I'm going to assume that even the heaviest user will have trouble going beyond a few hundred database files, so this is fine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1150#issuecomment-747768112,https://api.github.com/repos/simonw/datasette/issues/1150,747768112,MDEyOklzc3VlQ29tbWVudDc0Nzc2ODExMg==,9599,simonw,2020-12-17T23:25:21Z,2020-12-17T23:25:21Z,OWNER,"Next challenge: figure out how to use the `Database` class from https://github.com/simonw/datasette/blob/0.53/datasette/database.py for an in-memory database which persists data for the duration of the lifetime of the server, and allows access to that in-memory database from multiple threads in a way that lets them see each other's changes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770436876,Maintain an in-memory SQLite table of connected databases and their tables, https://github.com/simonw/datasette/issues/1151#issuecomment-747769830,https://api.github.com/repos/simonw/datasette/issues/1151,747769830,MDEyOklzc3VlQ29tbWVudDc0Nzc2OTgzMA==,9599,simonw,2020-12-17T23:29:08Z,2020-12-17T23:29:08Z,OWNER,"https://sqlite.org/inmemorydb.html > The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename "":memory:"" will create two independent in-memory databases. > > [...] > > The special `"":memory:""` filename also works when using URI filenames. For example: > > rc = sqlite3_open(""file::memory:"", &db); > > [...] > > However, the same in-memory database can be opened by two or more database connections as follows: > > rc = sqlite3_open(""file::memory:?cache=shared"", &db); > > [...] > If two or more distinct but shareable in-memory databases are needed in a single process, then the mode=memory query parameter can be used with a URI filename to create a named in-memory database: > > rc = sqlite3_open(""file:memdb1?mode=memory&cache=shared"", &db); ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases, https://github.com/simonw/datasette/issues/1151#issuecomment-747770082,https://api.github.com/repos/simonw/datasette/issues/1151,747770082,MDEyOklzc3VlQ29tbWVudDc0Nzc3MDA4Mg==,9599,simonw,2020-12-17T23:29:53Z,2020-12-17T23:29:53Z,OWNER,I'm going to try with `file:datasette?mode=memory&cache=shared`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases, https://github.com/simonw/datasette/issues/1151#issuecomment-747770581,https://api.github.com/repos/simonw/datasette/issues/1151,747770581,MDEyOklzc3VlQ29tbWVudDc0Nzc3MDU4MQ==,9599,simonw,2020-12-17T23:31:18Z,2020-12-17T23:32:07Z,OWNER,"This works in `ipython`: ``` In [1]: import sqlite3 In [2]: c1 = sqlite3.connect(""file:datasette?mode=memory&cache=shared"", uri=True) In [3]: c2 = sqlite3.connect(""file:datasette?mode=memory&cache=shared"", uri=True) In [4]: c1.executescript(""CREATE TABLE hello (world TEXT)"") Out[4]: In [5]: c1.execute(""select * from sqlite_master"").fetchall() Out[5]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')] In [6]: c2.execute(""select * from sqlite_master"").fetchall() Out[6]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')] In [7]: c3 = sqlite3.connect(""file:datasette?mode=memory&cache=shared"", uri=True) In [9]: c3.execute(""select * from sqlite_master"").fetchall() Out[9]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')] In [10]: c4 = sqlite3.connect(""file:datasette?mode=memory"", uri=True) In [11]: c4.execute(""select * from sqlite_master"").fetchall() Out[11]: [] ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases, https://github.com/simonw/datasette/issues/1151#issuecomment-747774855,https://api.github.com/repos/simonw/datasette/issues/1151,747774855,MDEyOklzc3VlQ29tbWVudDc0Nzc3NDg1NQ==,9599,simonw,2020-12-17T23:42:34Z,2020-12-17T23:42:34Z,OWNER,"This worked as a prototype: ```diff diff --git a/datasette/database.py b/datasette/database.py index 412e0c5..a90e617 100644 --- a/datasette/database.py +++ b/datasette/database.py @@ -24,11 +24,12 @@ connections = threading.local() class Database: - def __init__(self, ds, path=None, is_mutable=False, is_memory=False): + def __init__(self, ds, path=None, is_mutable=False, is_memory=False, uri=None): self.ds = ds self.path = path self.is_mutable = is_mutable self.is_memory = is_memory + self.uri = uri self.hash = None self.cached_size = None self.cached_table_counts = None @@ -46,6 +47,8 @@ class Database: } def connect(self, write=False): + if self.uri: + return sqlite3.connect(self.uri, uri=True, check_same_thread=False) if self.is_memory: return sqlite3.connect("":memory:"") # mode=ro or immutable=1? ``` Then in `ipython`: ``` from datasette.app import Datasette from datasette.database import Database ds = Datasette([]) db = Database(ds, uri=""file:datasette?mode=memory&cache=shared"", is_memory=True) await db.execute_write(""create table foo (bar text)"") await db.table_names() # Outputs [""foo""] db2 = Database(ds, uri=""file:datasette?mode=memory&cache=shared"", is_memory=True) await db2.table_names() # Also outputs [""foo""] ``` ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases, https://github.com/simonw/datasette/issues/1151#issuecomment-747775245,https://api.github.com/repos/simonw/datasette/issues/1151,747775245,MDEyOklzc3VlQ29tbWVudDc0Nzc3NTI0NQ==,9599,simonw,2020-12-17T23:43:41Z,2020-12-17T23:56:27Z,OWNER,"I'm going to add an argument to the `Database()` constructor which means ""connect to named in-memory database called X"". ```python db = Database(ds, memory_name=""datasette"") ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases, https://github.com/simonw/datasette/issues/1151#issuecomment-747775792,https://api.github.com/repos/simonw/datasette/issues/1151,747775792,MDEyOklzc3VlQ29tbWVudDc0Nzc3NTc5Mg==,9599,simonw,2020-12-17T23:45:20Z,2020-12-17T23:45:20Z,OWNER,"Do I use the current `is_memory=` boolean anywhere at the moment? https://ripgrep.datasette.io/-/ripgrep?pattern=is_memory - doesn't look like it. I may remove that feature, since it's not actually useful, and replace it with a mechanism for creating shared named memory databases instead.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases, https://github.com/simonw/datasette/issues/1151#issuecomment-747779056,https://api.github.com/repos/simonw/datasette/issues/1151,747779056,MDEyOklzc3VlQ29tbWVudDc0Nzc3OTA1Ng==,9599,simonw,2020-12-17T23:55:57Z,2020-12-17T23:55:57Z,OWNER,Wait I do use it - if you run `datasette --memory` - which is useful for trying things out in SQL that doesn't need to run against a table.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770448622,Database class mechanism for cross-connection in-memory databases,