html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/datasette/pull/672#issuecomment-586067794,https://api.github.com/repos/simonw/datasette/issues/672,586067794,MDEyOklzc3VlQ29tbWVudDU4NjA2Nzc5NA==,9599,2020-02-14T02:29:16Z,2020-02-14T02:29:16Z,OWNER,"One design issue: how to pick neat unique names for database files in a file hierarchy?
Here's what I have so far:
https://github.com/simonw/datasette/blob/fe6f9e6a7397cab2e4bc57745a8da9d824dad218/datasette/app.py#L231-L237
For these files:
```
../travel-old.db
../sf-tree-history/trees.db
../library-of-congress/records-from-df.db
```
It made these names:
```
travel-old
sf-tree-history_trees
library-of-congress_records-from-df
```
Maybe this is good enough? Needs some tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586068095,https://api.github.com/repos/simonw/datasette/issues/672,586068095,MDEyOklzc3VlQ29tbWVudDU4NjA2ODA5NQ==,9599,2020-02-14T02:30:37Z,2020-02-14T02:30:46Z,OWNER,"This can take a LONG time to run, and at the moment it's blocking and prevents Datasette from starting up.
It would be much better if this ran in a thread, or an asyncio task. Probably have to be a thread because there's no easy `async` version of `pathlib.Path.glob()` that I've seen.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586069529,https://api.github.com/repos/simonw/datasette/issues/672,586069529,MDEyOklzc3VlQ29tbWVudDU4NjA2OTUyOQ==,9599,2020-02-14T02:37:17Z,2020-02-14T02:37:17Z,OWNER,"Another problem: if any of the found databases use SpatiaLite then Datasette will fail to start at all.
It should skip them instead.
The `select * from sqlite_master` check apparently isn't quite enough to catch this case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586107989,https://api.github.com/repos/simonw/datasette/issues/672,586107989,MDEyOklzc3VlQ29tbWVudDU4NjEwNzk4OQ==,9599,2020-02-14T05:45:12Z,2020-02-14T05:45:12Z,OWNER,"I tried running the `scan_dirs()` method in a thread and got an interesting error while trying to load the homepage: `RuntimeError: OrderedDict mutated during iteration`
Makes sense - I had a thread that added an item to that dictionary right while the homepage was attempting to run this code:
https://github.com/simonw/datasette/blob/efa54b439fd0394440c302602b919255047b59c5/datasette/views/index.py#L24-L27
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586109032,https://api.github.com/repos/simonw/datasette/issues/672,586109032,MDEyOklzc3VlQ29tbWVudDU4NjEwOTAzMg==,9599,2020-02-14T05:50:15Z,2020-02-14T05:50:15Z,OWNER,"So I need to ensure the `ds.databases` data structure is manipulated in a thread-safe manner.
Mainly I need to ensure that it is locked during iterations over it, then unlocked at the end.
Trickiest part is probably ensuring there is a test that proves this is working - I feel like I got lucky encountering that `RuntimeError` as early as I did.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586109238,https://api.github.com/repos/simonw/datasette/issues/672,586109238,MDEyOklzc3VlQ29tbWVudDU4NjEwOTIzOA==,9599,2020-02-14T05:51:12Z,2020-02-14T05:51:12Z,OWNER,"... or maybe I can cheat and wrap the access to `self.ds.databases.items()` in `list()`, so I'm iterating over an atomically-created list of those things instead? I'll try that first.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586109784,https://api.github.com/repos/simonw/datasette/issues/672,586109784,MDEyOklzc3VlQ29tbWVudDU4NjEwOTc4NA==,9599,2020-02-14T05:53:50Z,2020-02-14T05:54:21Z,OWNER,"... cheating like this seems to work:
```
for name, db in list(self.ds.databases.items()):
```
Python built-in operations are supposedly threadsafe, so in this case I can grab a copy of the list atomically (I think) and then safely iterate over it.
Seems to work in my testing. Wish I could prove it with a unit test though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586111102,https://api.github.com/repos/simonw/datasette/issues/672,586111102,MDEyOklzc3VlQ29tbWVudDU4NjExMTEwMg==,9599,2020-02-14T05:59:24Z,2020-02-14T06:00:36Z,OWNER,"Interesting new problem: hitting Ctrl+C no longer terminates the problem provided that `scan_dirs()` thread is still running.
https://stackoverflow.com/questions/49992329/the-workers-in-threadpoolexecutor-is-not-really-daemon has clues. The workers are only meant to exit when their worker queues are empty.
But... I want to run the worker every 10 seconds. How do I do that without having it loop forever and hence never quit?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586111619,https://api.github.com/repos/simonw/datasette/issues/672,586111619,MDEyOklzc3VlQ29tbWVudDU4NjExMTYxOQ==,9599,2020-02-14T06:01:24Z,2020-02-14T06:01:24Z,OWNER,https://gist.github.com/clchiou/f2608cbe54403edb0b13 might work.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586112662,https://api.github.com/repos/simonw/datasette/issues/672,586112662,MDEyOklzc3VlQ29tbWVudDU4NjExMjY2Mg==,9599,2020-02-14T06:05:27Z,2020-02-14T06:05:27Z,OWNER,It think the fix is to use an old-fashioned `threading` module daemon thread directly. That should exit cleanly when the program exits.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-586441484,https://api.github.com/repos/simonw/datasette/issues/672,586441484,MDEyOklzc3VlQ29tbWVudDU4NjQ0MTQ4NA==,9599,2020-02-14T19:34:25Z,2020-02-14T19:34:25Z,OWNER,"I've figured out how to tell if a database is safe to open or not:
```sql
select sql from sqlite_master where sql like 'CREATE VIRTUAL TABLE%';
```
This returns the SQL definitions for virtual tables. The bit after `using` tells you what they need.
Run this against a SpatiaLite database and you get the following:
```sql
CREATE VIRTUAL TABLE SpatialIndex USING VirtualSpatialIndex()
CREATE VIRTUAL TABLE ElementaryGeometries USING VirtualElementary()
```
Run it against an Apple Photos `photos.db` file (found with `find ~/Library | grep photos.db`) and you get this (partial list):
```sql
CREATE VIRTUAL TABLE RidList_VirtualReader using RidList_VirtualReaderModule
CREATE VIRTUAL TABLE Array_VirtualReader using Array_VirtualReaderModule
CREATE VIRTUAL TABLE LiGlobals_VirtualBufferReader using VirtualBufferReaderModule
CREATE VIRTUAL TABLE RKPlace_RTree using rtree (modelId,minLongitude,maxLongitude,minLatitude,maxLatitude)
```
For a database with FTS4 you get:
```sql
CREATE VIRTUAL TABLE ""docs_fts"" USING FTS4 (
[title], [content], content=""docs""
)
```
FTS5:
```sql
CREATE VIRTUAL TABLE [FARA_All_Registrants_fts] USING FTS5 (
[Name], [Address_1], [Address_2],
content=[FARA_All_Registrants]
)
```
So I can use this to figure out all of the `using` pieces and then compare them to a list of known support ones.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-604561639,https://api.github.com/repos/simonw/datasette/issues/672,604561639,MDEyOklzc3VlQ29tbWVudDYwNDU2MTYzOQ==,9599,2020-03-26T17:22:07Z,2020-03-26T17:22:07Z,OWNER,"Here's the new utility function I should be using to verify database files that I find:
https://github.com/simonw/datasette/blob/6aa516d82dea9885cb4db8d56ec2ccfd4cd9b840/datasette/utils/__init__.py#L773-L787","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-604569063,https://api.github.com/repos/simonw/datasette/issues/672,604569063,MDEyOklzc3VlQ29tbWVudDYwNDU2OTA2Mw==,9599,2020-03-26T17:32:06Z,2020-03-26T17:32:06Z,OWNER,"While running it against a nested directory with a TON of databases I kept seeing errors like this:
```
Traceback (most recent call last):
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 121, in route_path
return await view(new_scope, receive, send)
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 193, in view
request, **scope[""url_route""][""kwargs""]
File ""/Users/simonw/Dropbox/Development/datasette/datasette/views/index.py"", line 58, in get
tables[table][""num_relationships_for_sorting""] = count
KeyError: 'primary-candidates-2018/rep_candidates'
```
And
```
Traceback (most recent call last):
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 121, in route_path
return await view(new_scope, receive, send)
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 193, in view
request, **scope[""url_route""][""kwargs""]
File ""/Users/simonw/Dropbox/Development/datasette/datasette/views/index.py"", line 58, in get
tables[table][""num_relationships_for_sorting""] = count
KeyError: 'space_used'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-604665229,https://api.github.com/repos/simonw/datasette/issues/672,604665229,MDEyOklzc3VlQ29tbWVudDYwNDY2NTIyOQ==,9599,2020-03-26T20:22:48Z,2020-03-26T20:22:48Z,OWNER,"I also eventually get this error:
```
Traceback (most recent call last):
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 121, in route_path
return await view(new_scope, receive, send)
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 336, in inner_static
await asgi_send_file(send, full_path, chunk_size=chunk_size)
File ""/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py"", line 303, in asgi_send_file
async with aiofiles.open(str(filepath), mode=""rb"") as fp:
File ""/Users/simonw/.local/share/virtualenvs/datasette-oJRYYJuA/lib/python3.7/site-packages/aiofiles/base.py"", line 78, in __aenter__
File ""/Users/simonw/.local/share/virtualenvs/datasette-oJRYYJuA/lib/python3.7/site-packages/aiofiles/threadpool/__init__.py"", line 35, in _open
File ""/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py"", line 57, in run
OSError: [Errno 24] Too many open files: '/Users/simonw/Dropbox/Development/datasette/datasette/static/app.css'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,
https://github.com/simonw/datasette/pull/672#issuecomment-604667029,https://api.github.com/repos/simonw/datasette/issues/672,604667029,MDEyOklzc3VlQ29tbWVudDYwNDY2NzAyOQ==,9599,2020-03-26T20:26:46Z,2020-03-26T20:26:46Z,OWNER,"I think I can tell what the current file limit is like so:
```
In [1]: import resource
In [2]: resource.getrlimit(resource.RLIMIT_NOFILE)
Out[2]: (256, 9223372036854775807)
```
So maybe I should have Datasette refuse to open more database files than that number minus 5 (to give me some spare room for opening CSS files etc).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",565064079,