issue_comments
13 rows where issue = 612287234
This data as json, CSV (advanced)
Suggested facets: user, author_association, reactions, created_at (date), updated_at (date)
id ▼ | html_url | issue_url | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
623805823 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623805823 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzgwNTgyMw== | simonw 9599 | 2020-05-05T02:45:56Z | 2020-05-05T02:45:56Z | MEMBER | I filed an issue with `osxphotos` about this here: https://github.com/RhetTbull/osxphotos/issues/121 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623806085 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806085 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzgwNjA4NQ== | simonw 9599 | 2020-05-05T02:47:18Z | 2020-05-05T02:47:18Z | MEMBER | In https://github.com/RhetTbull/osxphotos/issues/121#issuecomment-623249263 Rhet Turnbull spotted a table called `ZSCENEIDENTIFIER` which looked like it might have the right data, but the columns in it aren't particularly helpful: ``` Z_PK,Z_ENT,Z_OPT,ZSCENEIDENTIFIER,ZASSETATTRIBUTES,ZCONFIDENCE 8,49,1,731,5,0.11834716796875 9,49,1,684,6,0.0233648251742125 10,49,1,1702,1,0.026153564453125 ``` I love the look of those confidence scores, but what do the numbers mean? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623806533 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806533 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzgwNjUzMw== | simonw 9599 | 2020-05-05T02:50:16Z | 2020-05-05T02:50:16Z | MEMBER | I figured there must be a separate database that Photos uses to store the text of the identified labels. I used "Open Files and Ports" in Activity Monitor against the Photos app to try and spot candidates... and found `/Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite` - a 53MB SQLite database file. <img width="1365" alt="Item-0_and_Item-0_and_Item-0_and_Item-0" src="https://user-images.githubusercontent.com/9599/81031213-61ea0800-8e40-11ea-8237-cfce4a5128e0.png"> Here's the schema of that file: ``` $ sqlite3 psi.sqlite .schema CREATE TABLE word_embedding(word TEXT, extended_word TEXT, score DOUBLE); CREATE INDEX word_embedding_index ON word_embedding(word); CREATE VIRTUAL TABLE word_embedding_prefix USING fts5(extended_word) /* word_embedding_prefix(extended_word) */; CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_data'(id INTEGER PRIMARY KEY, block BLOB); CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID; CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_content'(id INTEGER PRIMARY KEY, c0); CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_docsize'(id INTEGER PRIMARY KEY, sz BLOB); CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_config'(k PRIMARY KEY, v) WITHOUT ROWID; CREATE TABLE groups(category INT2, owning_groupid INT, content_string TEXT, normalized_string TEXT, lookup_identifier TEXT, token_ranges_0 INT8, token_ranges_1 INT8, UNIQUE(category, owning_groupid, content_string, lookup_identifier, token_ranges_0, token_ranges_1)); CREATE TABLE assets(uuid_0 INT, uuid_1 INT, creationDate INT, UNIQUE(uuid_0, uuid_1)); CREATE TABLE ga(groupid INT, assetid INT, PRIMARY KEY(groupid, assetid)); CREATE TABLE collections(uuid_0 INT, uuid_1 INT, startDate INT, endDate INT, title TEXT, subtitle TEXT, keyAssetUUID_0 INT, keyAssetUUID_1 INT, typeAndNumberOfAssets INT32, sortDate DOUBLE, UNIQUE(uuid_0, uuid_1)); CREATE TABLE gc(groupid INT, collectionid INT, PRIMARY KEY(groupid, collectionid)); CREATE… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623806687 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806687 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzgwNjY4Nw== | simonw 9599 | 2020-05-05T02:51:16Z | 2020-05-05T02:51:16Z | MEMBER | Running datasette against it directly doesn't work: ``` simon@Simons-MacBook-Pro search % datasette psi.sqlite Serve! files=('psi.sqlite',) (immutables=()) on port 8001 Usage: datasette serve [OPTIONS] [FILES]... Error: Connection to psi.sqlite failed check: no such tokenizer: PSITokenizer ``` Instead, I created a new SQLite database with a copy of some of the key tables, like this: ``` sqlite-utils rows psi.sqlite groups | sqlite-utils insert /tmp/search.db groups - sqlite-utils rows psi.sqlite assets | sqlite-utils insert /tmp/search.db assets - sqlite-utils rows psi.sqlite ga | sqlite-utils insert /tmp/search.db ga - sqlite-utils rows psi.sqlite collections | sqlite-utils insert /tmp/search.db collections - sqlite-utils rows psi.sqlite gc | sqlite-utils insert /tmp/search.db gc - sqlite-utils rows psi.sqlite lookup | sqlite-utils insert /tmp/search.db lookup - ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623807568 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623807568 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzgwNzU2OA== | simonw 9599 | 2020-05-05T02:56:06Z | 2020-05-05T02:56:06Z | MEMBER | I'm pretty sure this is what I'm after. The `groups` table has what looks like identified labels in the rows with category = 2025: <img width="1122" alt="words__groups__2_528_rows_where_where_category___2025" src="https://user-images.githubusercontent.com/9599/81031361-e0df4080-8e40-11ea-9060-6d850aa52140.png"> Then there's a `ga` table that maps groups to assets: <img width="304" alt="words__ga__633_653_rows" src="https://user-images.githubusercontent.com/9599/81031387-f48aa700-8e40-11ea-9a3d-da23903be928.png"> And an `assets` table which looks like it has one row for every one of my photos: <img width="645" alt="words__assets__40_419_rows" src="https://user-images.githubusercontent.com/9599/81031402-04a28680-8e41-11ea-8047-e9199d068563.png"> One major challenge: these UUIDs are split into two integer numbers, `uuid_0` and `uuid_1` - but the main photos database uses regular UUIDs like this: ![image](https://user-images.githubusercontent.com/9599/81031481-39164280-8e41-11ea-983b-005ced641a18.png) I need to figure out how to match up these two different UUID representations. I asked on Twitter if anyone has any ideas: https://twitter.com/simonw/status/1257500689019703296 | {"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623811131 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623811131 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzgxMTEzMQ== | simonw 9599 | 2020-05-05T03:16:18Z | 2020-05-05T03:16:18Z | MEMBER | Here's how to convert two integers unto a UUID using Java. Not sure if it's the solution I need though (or how to do the same thing in Python): https://repl.it/repls/EuphoricSomberClasslibrary <img width="1494" alt="Repl_it_-_EuphoricSomberClasslibrary" src="https://user-images.githubusercontent.com/9599/81032267-0d488c00-8e44-11ea-9be7-680eaccd1611.png"> ```java import java.util.UUID; class Main { public static void main(String[] args) { java.util.UUID uuid = new java.util.UUID( 2544182952487526660L, -3640314103732024685L ); System.out.println( uuid ); } } ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623845014 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623845014 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg0NTAxNA== | RhetTbull 41546558 | 2020-05-05T03:55:14Z | 2020-05-05T03:56:24Z | CONTRIBUTOR | I'm traveling w/o access to my Mac so can't help with any code right now. I suspected ZSCENEIDENTIFIER was a foreign key into one of these psi.sqlite tables. But looks like you're on to something connecting groups to assets. As for the UUID, I think there's two ints because each is 64-bits but UUIDs are 128-bits. Thus they need to be combined to get the 128 bit UUID. You might be able to use Apple's [NSUUID](https://developer.apple.com/documentation/foundation/nsuuid?language=objc), for example, by wrapping with pyObjC. Here's one [example](https://github.com/ronaldoussoren/pyobjc/blob/881c82a7ba90f193934b52b44143360c80dce5e5/pyobjc-framework-Cocoa/PyObjCTest/test_nsuuid.py) of using this in PyObjC's test suite. Interesting it's stored this way instead of a UUIDString as in Photos.sqlite. Perhaps it for faster indexing. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623846880 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623846880 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg0Njg4MA== | simonw 9599 | 2020-05-05T04:06:08Z | 2020-05-05T04:06:08Z | MEMBER | This function seems to convert them into UUIDs that match my photos: ```python def to_uuid(uuid_0, uuid_1): b = uuid_0.to_bytes(8, 'little', signed=True) + uuid_1.to_bytes(8, 'little', signed=True) return str(uuid.UUID(bytes=b)).upper() ``` | {"total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623855841 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623855841 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg1NTg0MQ== | simonw 9599 | 2020-05-05T04:54:28Z | 2020-05-05T04:54:28Z | MEMBER | Things were not matching up for me correctly: <img width="1143" alt="search__select_json_object__img_src____https___photos_simonwillison_net_i______photos_sha256___________photos_ext______w_400___as_photo__groups_content_string__assets_uuid_0__assets_uuid_1__to_uuid_assets_uuid_0__assets_uuid_1__as_uuid__pho" src="https://user-images.githubusercontent.com/9599/81035923-ca8db080-8e51-11ea-95a7-6ee60bae7502.png"> I think that's because my import script didn't correctly import the existing `rowid` values. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623855885 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623855885 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg1NTg4NQ== | simonw 9599 | 2020-05-05T04:54:39Z | 2020-05-05T04:54:53Z | MEMBER | Trying this import mechanism instead: `sqlite3 /Users/simon/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite .dump | grep -v 'CREATE INDEX' | grep -v 'CREATE TRIGGER' | grep -v 'CREATE VIRTUAL TABLE' | sqlite3 search.db` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623857417 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623857417 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg1NzQxNw== | simonw 9599 | 2020-05-05T05:01:47Z | 2020-05-05T05:01:47Z | MEMBER | Even that didn't work - it didn't copy across the rowid values. I'm pretty sure that's what's wrong here: ``` sqlite3 /Users/simon/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite 'select rowid, uuid_0, uuid_1 from assets limit 10' 1619605|-9205353363298198838|4814875488794983828 1641378|-9205348195631362269|390804289838822030 1634974|-9205331524553603243|-3834026796261633148 1619083|-9205326176986145401|7563404215614709654 22131|-9205315724827218763|8370531509591906734 1645633|-9205247376092758131|-1311540150497601346 ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623863902 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623863902 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg2MzkwMg== | simonw 9599 | 2020-05-05T05:31:53Z | 2020-05-05T05:31:53Z | MEMBER | Yes! Turning those `rowid` values into `id` with this script did the job: ```python import sqlite3 import sqlite_utils conn = sqlite3.connect( "/Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite" ) def all_rows(table): result = conn.execute("select rowid as id, * from {}".format(table)) cols = [c[0] for c in result.description] for row in result.fetchall(): yield dict(zip(cols, row)) if __name__ == "__main__": db = sqlite_utils.Database("psi_copy.db") for table in ("assets", "collections", "ga", "gc", "groups"): db[table].upsert_all(all_rows(table), pk="id", alter=True) ``` Then I ran this query: ```sql select json_object('img_src', 'https://photos.simonwillison.net/i/' || photos.sha256 || '.' || photos.ext || '?w=400') as photo, group_concat(strip_null_chars(groups.content_string), ' ') as words, assets.uuid_0, assets.uuid_1, to_uuid(assets.uuid_0, assets.uuid_1) as uuid from assets join ga on assets.id = ga.assetid join groups on ga.groupid = groups.id join photos on photos.uuid = to_uuid(assets.uuid_0, assets.uuid_1) where groups.category = 2024 group by assets.id order by random() limit 10 ``` And got these results! <img width="1054" alt="psi_copy__select_json_object__img_src____https___photos_simonwillison_net_i______photos_sha256___________photos_ext______w_400___as_photo__group_concat_strip_null_chars_groups_content_string________as_words__assets_uuid_0__assets_uuid_1__to" src="https://user-images.githubusercontent.com/9599/81037264-f1021a80-8e56-11ea-9924-6f9f55a0fb4b.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 | |
623865250 | https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623865250 | https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16 | MDEyOklzc3VlQ29tbWVudDYyMzg2NTI1MA== | simonw 9599 | 2020-05-05T05:38:16Z | 2020-05-05T05:38:16Z | MEMBER | It looks like `groups.content_string` often has a null byte in it. I should clean this up as part of the import. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);