Zwabel’s Weblog

March 29, 2009

Portable Meta-Information

Filed under: KDE — zwabel @ 9:55 am
Tags: , , , ,

KDE4 is all about new technologies, and standardizing. Now we have a central mechanism to store metadata, called Nepomuk. However it basically still follows the somewhat problematic approach that all the metadata is stored in one central place.

I think there is nothing more valuable than the data of the user, and meta-information like for example ratings of a song, tags, or comments attached to a file, are user-generated data, that needs to be treated as carefully as the files themselves.

I have already used many different applications in my lifetime, different email-applications, different music-players, image-management software, etc., and all kept the user-generated meta-information closed within the application, which means that when the lifetime of the application is over, the information is lost, or with luck, can be exported with some effort into some re-usable format.

Due to those experiences, application-specific meta-information has only a low value to me. I think, for the future, we need to find a way to keep the users data together, so it is as persistent and approachable as the files themselves:
– When the user copies his photo archive or backs it up to a CD, no matter what application he uses, meta-information like ratings, comments, or tags, have to move together with the photos
– When the user has a fresh install, and copies his photo archive from a CD to the disk, the meta-information for the photos should be just there
– User-generated meta-data should _never_ be lost just because a file/directory was renamed, a mount-point changed, or whatever
– User-generated meta-data should not be lost when a file completely unrelated to the item is damaged or deleted(Database)
– In 20 years, when KDE4 is history for a long time, and I find an old photo backup CD, the meta-data should still be readable

When these conditions are met, then metadata would finally be worthy. But how can it be reached?

I think with Nepomuk and Strigi we have most of the needed infrastructure available, there is just a few missing pieces:
1. Store user-generated file-related meta-data directly where the file is stored, in a standard format, example:
File:
/media/archiv/pictures/picture1.jpg
User-generated meta-information:
[/media/archiv/pictures/.picture1.jpg.meta] or in shared directories: [/media/archiv/pictures/.picture1.jpg.meta.nolden]
Could contain something like:
RATING=2/5
TAGS=funny,family

2. Change file-managers to move/copy meta-information together with the files when handling them individually(I think this already is the case in dolphin), and delete the meta-information when the file is deleted
3. When finding orphaned meta-information, ask the user what to do withit(Don’t forget: It’s valuable information)

Strigi could collect the information from those meta-data files, and nepomuk would manipulate them. Nepomuks database would be a kind of cache for the metadata.

The whole behavior should be standardized among desktop-environments at some point, so the meta-information would not only be persistent, but also accessible from within every application.

With this reached, I could finally start doing using image- or music-rating, tagging, etc. without having the feeling of wasting my time in my stomach.

What do you think?

Update:
Actually probably the best way would be this:
picture.jpg
picture.jpg.meta
With the meta-information not hidden at all, so you will be aware of it when using the command-line. Aware file-managers like dolphin should hide the meta-information automatically, and all other file-managers that are not aware would show it. I think as long this would only be used for user-generated meta-information like ratings, it would be worth it.

58 Comments »

  1. While we’re at it, why not take things to their logical conclusion: store metadata about a file in the file itself.

    Comment by P — March 29, 2009 @ 10:22 am

  2. Totally agree. I have reinstalled linux systems enough and lost all customisations that I now only commit the minimum that I have to… in other words I don’t use tagging and commenting of files and resources because I know whatever effort involved is going to be lost with the next system install. OTOH I have a co-worker who uses windows and has not reinstalled his OS for 5 or 6 years and he has a huge and excellent email databse with 1000s of entries that he uses like a central store of personal info. It’s just not possible for me to emulate this stability under linux as every few months I want to try a new distro and I just can’t imagine still using a linux install from 5 years ago.

    Storing tagging and rating info would be very nice but I think the same principle is even more important for mundane stuff like user details, email etc, bookmarks, rss feeds… all manner of personal info needs to be managed somehow. Yes I know there are ways to do some things like export bookmarks, upload to personal web site, reinstall distro, download and import bookmarks… add, rinse, repeat for 100s of other personal details and… it’s a mess. I’m not sure about one answer being to dump everything in “the cloud” and forget about it.

    Anyway, I think there is at least aspects about persistent user info. When using meto info locally it needs to be stored in some, perhaps any, kind of database. When copying or moving that data it needs to be assocated with the files and resources it belongs to and the only way I can that being done is via some kind of formalised “transit format”… lot so possibilities but one would be putting all meta info into a separate file with exactly the same name as the original file with an extension of “*.meta” in some kind of microformat. So this meta data has *at least* two forms, one in a database and another as an associated file for transit and remote storage. All very doable but the procedure just needs some standardisation.

    Comment by markc — March 29, 2009 @ 10:34 am

  3. @P: That is the best solution, if the file format allows for it, and if the file is not shared. But that’s probably not achievable for arbitrary meta information.

    Actually my first idea for this was using extended file-system attributes to store the meta-information, but that might be not portable enough(Not all file-systems support it, especially those commonly used for CDs/DVDs)

    Comment by zwabel — March 29, 2009 @ 10:35 am

  4. The dot files are a bit dangerous to rely on. you’ll never be able to force all file managers to copy them. Although many users do use Doplhin, there are quite a few, myself included, who rely mostly on cp or midnight commander.

    For that to be multi-environment (Gnome, KDE, E…) it would be needed to be standardized. There would need to be many different standards – for different file types, so that Amarok would share ratings with Songbird or some other player.

    Most binary file-types allow garbage after the last byte or their specifications allow comments (for example, do a “echo ‘this is garbage’ >> file.jpg – it will still be valid jpg file). Those can be used for meta-data too, but in that case there are some different issues.

    The only valid solution would be to make file-systems support file meta-data. This would work like the permissions do now – if the underlying file-systems support them, they will be copied, moved…

    But, still, in all those cases, there will be a need for a centralized database to make the searching faster.

    Comment by Ivan Čukić — March 29, 2009 @ 10:39 am

  5. @markc: At least for emails, there is standard portable formats like mbox that are supported by kmail, that store mails as single files, and that can be backed up easily, so I’m quite happy with that. It could use some more user-friendlyness to make the user aware of this, that should be a goal of akonadi.

    Comment by zwabel — March 29, 2009 @ 10:39 am

  6. @markc
    Then you’re doing something wrong. Why are you deleting user home directory on reinstall?

    @zwabel
    It would probably be the best to store the data in the file-system, but to allow the database to be in a file also – so when you record a CD, a file is placed in the root dir with meta-data about the files.

    Comment by Ivan Čukić — March 29, 2009 @ 10:46 am

  7. @Ivan Cukic: Yes that would be the ideal case, I think most filesystems already support this, but that would probably also need a lot of additional support from applications. There’s probably many applications that always create a completely new file when saving, which means that the meta-information would be lost with every save. Then additionally to that, there would need to be a conversion when copying to a file-system that does not support meta-data, for example when burning a DVD.

    You could not use “cp” to copy individual files that have a tag, but I anyway think that’s not a too common use-case. At least you would get informed about the orphaned meta-data and could also copy it to the right place, or at least remember not to do that “cp” thing in future with such files.

    The thing with the dot files is in no way perfect, but it is at least useful, and not too hard to implement on top of the current stack, and could be a step towards integrating the metadata into file attributes.

    Comment by zwabel — March 29, 2009 @ 10:50 am

  8. @Ivan Cukic: I think if the metadata is stored in the filesystem, and if we want it to be cross-platform and user-understandable, then it just has to be dead-easy. And “1 file = 1 metadata file” is dead easy.🙂 Grouping together the metadata within per-directory or per-volume metadata files would be complicated and even less compatible with your “cp” usecase.

    Actually probably the best way would be this:
    picture.jpg
    picture.jpg.meta
    With the meta-information not hidden at all, so you will be aware of it when using the command-line. Aware file-managers like dolphin could hide the meta-information automatically, and all other file-managers that are not aware would show it. I think as long this would only be used for user-generated meta-information like ratings, it would be worth it.

    Comment by zwabel — March 29, 2009 @ 10:57 am

  9. However it is done, it needs to be with a performant implementation at least in mind. Once such a system exists the first thing you’ll want to do is index and search it efficiently.

    Storing metadata actually in the same file is a mistake, in my opinion. Metadata in some file formats is stored in the header: this creates a big problem when the whole data block has to be moved as the metadata expands. If things are to remain at the conventional filesystem level, it would be preferable if what we now call files were actually directories. IIRC Macs do something like this in some cases.

    Comment by Tom — March 29, 2009 @ 10:59 am

  10. As long as metadata isn’t stored in application-specific format, in some application-specific location, then it’s fine. The problem lies in transporting that information between machines, uploading it to fileservers, etc. For this, there are two solutions I can see:

    a) Bundle the metadata in archives when zipping/tarring/ etc. BeOS had file metadata which was zipped/tarred properly, and the archiver “Star” does it on Linux, afaik. So this problem is essentially solved, except that no one is aware of/uses it.

    b) Public metadata repositories, with upload/publishing/periodic-publishing and download/mirroring/syncing. AFAIK, nepomuk’s design does allow for multiple sources of data, and live queries (so asking for the artist or lyrics of an MP3 file could look up a web-based lyrics database as needed, for example, rather than using a huge local database). So download should be covered (or on its way). But uploading/publishing of metadata is one thing that remains to be done, as far as I know.

    Comment by Lee — March 29, 2009 @ 11:02 am

  11. The problems you describe need to be solved. But your suggestion only works for files – metadata exists also for:
    – Contacts
    – IM users
    – IM History
    – mails on a remote imap server
    – bookmarks (konqueror, kate etc)
    – akregator feed entries

    Niko

    Comment by niko sams — March 29, 2009 @ 11:02 am

  12. Just a question: Is is possible to extend the file and add the meta info (e.g. at the end) in a manner, that the normal file systems would interpret it as the original file and new file systems could read the additional information?
    Like Microsoft kept the long names to the 8+3 names backwards compatible.

    Comment by Peter — March 29, 2009 @ 11:04 am

  13. @Niko: Yes sure, I was only talking about files here. For the other stuff you mentioned, there probably isn’t another way than store the metadata centrally in some bundle(Except from some possible special cases), but it should be less of a problem with those items, since they can be much more clearly identified than files.

    Comment by zwabel — March 29, 2009 @ 11:08 am

  14. Storing metadata in-file would break file integrity, which would render the files incompatible with revision managers, bittorrent and lots of other stuff which uses checksumming. Imagine all your torrents failing just because you rated the files!

    Metadata like filename, creation date, owner, etc. is already stored in the filesystem and, imho, this is where metadata belongs. There is already a concept for this, xattr (see http://en.wikipedia.org/wiki/Xattr) which is implemented in all major filesystems including FAT and NTFS. According to this page Beagle already uses it and there is a freedesktop standard which includes the dublin core.

    I also think that the kernel support for xattr and widespread user space usage has the advantage of ‘coercing’ cp, mv, tar and the likes into supporting them. I don’t consider the chances of cp or tar ever supporting ‘.meta’ files by default very high. If Qt internally supports xattrs (i haven’t looked it up, but it seams like something it would support) than xattr support in kde would be relatively easy.

    Comment by Remco Bloemen — March 29, 2009 @ 11:37 am

  15. @Remco Bloemen: Using xattrs would be the ideal case. The question is just: How to prevent the data attributes from being lost during storing to CD/DVD, sending through internet, zipping/tarring, unaware applications, etc.? Or what if you want the meta-data to be version-controlled as well in a VCS system? This might be the future, but it’s probably a very long way to build something reliable from that.

    .meta files could work right now, and would be a relatively safe storage place.

    Comment by zwabel — March 29, 2009 @ 11:49 am

  16. > I don’t consider the chances of cp or tar ever supporting ‘.meta’ files by default very high.

    But *if* something like associated *.meta files became a standard then it would actually be very simple for file system tools to take notice of a same-named *.meta file and automatically copy or move that extra file in a single operation. In the mean time there is nothing to prevent the user from manually moving *.meta files around and they would be included when copying folders.

    Anyway, I think these additional *.meta files should only be available when *transporting* files and service details, in other words they don’t need to exist unless the user is actively transporting his/her settings from one system to another. For regaular day to day usage the data needs to be in a simple and readily available database such as SQLite to allow for searching and indexing. Relying on indexing and working with 1000s of individual meta files will never work. So this meta info needs to be in 2 states, locally in a database and as individual *.meta files when in transit, so one of the main features of this approach would be the import and export from SQL to *.meta format and half the job is done.

    Comment by markc — March 29, 2009 @ 12:04 pm

  17. @zwabel: I understand your concern, untill about a few minutes ago I also thought that .meta files are the best solutions (it certainly beats a centeralized database). I see the difficulty in having vcs’s, http, ftp, mime, zip, etc. all supporting xattr.

    The .meta sollution reminds me of the rock ridge extension for iso9660 (cdroms). One of it’s aims is to add symbolic link support, which I think is a good example of an xattr like extension. I use quite a few symbolic links in organizing my files and then you face the same problems with support on foreign filesystems, archives and such. In my exprience this is not much of a problem.

    I currently think (I’m no expert) that the best transition path would be to push xattrs and freedesktops dublin as ‘the standard’ and use rock ridge and .meta like extensions only when there is no filesystem support. Ideally, these extension would of course be implemented in a completely transparent fuse driver, so that applications see xattrs everywhere and no further support is required. This would confine and limit the amount of ‘dirty hacks’ required and would keep apis cleaner. But perhaps I’m too optimistic and too much of a dreamer :p

    Comment by Remco Bloemen — March 29, 2009 @ 12:09 pm

  18. @markc: If you would need an explicit step to create the .meta files from the database, they would be completely useless for the average user, since he won’t do that. Also it would bring no data safety at all. The database and the .meta files should exist in parallel for performance reasons, but the .meta files should always be there.

    When strigi has to index 1000 regular files, then it’s not that much slower to also index 1000 tiny .meta files.

    Comment by zwabel — March 29, 2009 @ 12:11 pm

  19. I don’t think storing the metadata in the files themselves and messing around with the file format is a good idea. My pictures eg. are precious (and are chmod’ed -w) and I don’t want to have different tools battling for write access to my data in the background, just waiting for the next power loss (think ext4 or xfs).

    Instead I like the file.meta approach a lot. xattr sound like a good idea first, but have disadvantages:
    * Not supported by all tools, copies between different filesystems tend to lose the xattrs without warning.
    * Many distros don’t enbale user_xattr per default.
    * HAL/udev/Solid/whoever takes care of mounting USB devices don’t do so as well.
    * Most important: The size of the xattr block for each file is limited. The size is dependent on the file system in use, I think ext3 allows 128 chars or so.

    I guess the .meta file should actually be a .rdf file. Of course the files would clutter the directories, so maybe it would be fine stick with a central metadata store (eg. the Nepomuk database) and just offer an export feature for backup.

    Comment by Malte — March 29, 2009 @ 12:13 pm

  20. @markc: I agree with the first paragraph, but I think pushing .meta file as a standard would create an unnecessary competing standard for xattrs. I also agree that a central cache will be required for performance reasons, though I would hesitate in making it the only storage of meta data. Can I deduce from your second paragraph that you consider .meta files an solution for transporting of meta data over channels that do not (yet) have native meta data support? If that is the case than we completely agree!

    Comment by Remco Bloemen — March 29, 2009 @ 12:19 pm

  21. @Remco Bloemen: Let’s say it like this: Right now, _NO_ channel fully supports xattrs. A common linux system will dump xattrs at many many points, thus it’s not yet a suitable solution for storing the actual data. So the first step IMO would be supporting .meta files. From then on, they can eventually be slowly pushed into xattrs, which would take a long time though.

    Comment by zwabel — March 29, 2009 @ 12:24 pm

  22. @zwabel: I respectfully dissagree. I don’t think the .meta files need to exist at all for day to day use, only when explicitly transferred to another system for archiving, backup or re-importing back into a database. The end user would not have to explicity export to meta-file format as that would be part of whatever programme managed the whole show. The end-user would do something to initiate some kind of backup procedure and whatever software involved would export the meta files as just a part of the overall procedure.

    I think the idea of using xattrs could be feasible but I would suggest that “persistently storing user meta info” as simply as possible, but sooner rather than later, could be “hacked together” fairly simply, sanely and reliably, with the idea that, if whatever procedure was used a) worked at all and b) lots of folks wanted to use it, then the 2nd and 3rd generation rewrites, with lashings of hindsight, would be inevitable. I’m thinking of my next (re)install on yet another computer and dreading the tiresome job of (re)configuring all my services for the umpteenth++ time so some emerging solution that worked asap, with readily available tools and concepts, would suit me just fine. Extremely elegant and efficient methods would be welcome too but I’m also sure they would evolve in time if the basic concept was both scratchworthy and useful.

    Comment by markc — March 29, 2009 @ 12:31 pm

  23. @markc: If standard KDE file management tools hide the *.meta files, what would be the disadvantage of always having them there? I don’t see any, but it would have the big advantage that the meta information is preserved when copying/zipping/tarring/vcs’ing/etc. the directories, no matter what tool is used, or even when the central repository is damaged/deleted(Disk crashes happen very often), which is a big one.

    Comment by zwabel — March 29, 2009 @ 12:36 pm

  24. @zwabel: well let’s say that the end-user could make the decision as to whether the .meta files were also used locally, or not. The average user could easily have 100,000+ .meta files after a year or so, and a lot more if they focus on archiving video/photos, so I would opt for no meta files locally and just backup the database… which if it was SQLite is dead simple. A point is that when a new item is added as a separate file or as an entry in a database that the freshly added info must be somehow readily available to the apps that can take advantage of this extra info. If we add an entry to a large 500Mb SQLite database, along with the other 100,000+ entries, then the info is immediately indexed and searchable. OTOH if a meta-info file is added to the FS then, sure it’s there for a file manager to take advantage of immediately, but until some daemon scans for all meta-files and updates some datababase then the new meta-file entry is only half added to the system.

    Comment by markc — March 29, 2009 @ 12:59 pm

  25. @markc:
    If the meta-data is managed by nepomuk, then it’s entered into the .meta file as well as the database at the same time.
    If you have 100000 .meta files, that means that you have more than 100000 tagged files, that are each probably at least 1 MB of size. 100000 additional files with a size of less then 1 kB each are less then trivial in that context, aren’t they?

    The main point is though: Meta-data needs to be persistent, and if it is for files, then it has to stay with the files themselves. Either as .meta files, or as attributes. See the original post for explanation of why I think so.😉

    Comment by zwabel — March 29, 2009 @ 1:06 pm

  26. @markc: If the deamon uses inotify the lag will be unnoticable.

    Comment by Remco Bloemen — March 29, 2009 @ 1:10 pm

  27. There is another complicating factor. User generated metadata can be specific for the user, and not something the user would actually want to transfer (if I am emailing/copying a file to antoher user, I might only want a subset of my metadata transferred). Blindly copying all available metadata on transfer might actually not be what the user wants.

    Comment by inful — March 29, 2009 @ 4:31 pm

  28. How about using the .directory files? If you change the icon for a directory, this is saved in a .directory file. Now, you could add meta info about files in the directory to this file.

    Gnome, or at least Nautilus also does something similar – I remember getting irritated by the dot-files when I was a newbie.🙂

    So, you could use these .directory files as local repositories of meta data and have file managers copy the relevant bits along with the file. Since at least one other major DE does something similar, standardizing this will be easier.

    Comment by KarPer — March 29, 2009 @ 5:46 pm

  29. The meta-data should be stored to file itself if just possible.
    This is the case with the current file-formats for photographs (JPEG, PNG, TIF, all RAW formats etc) and office files like ODF, DOC and PDF. Even the music formats like MP3 etc allows you to store meta-data to file itself. That is the _wisest_ and most secure way to do it.

    Problem is that not all file-formats have possibility to store meta-data. For these cases I suggest to use the .ext.meta possibility.

    So Nepomuk _need_ to know how to get meta-data to/from files itself.

    No one can not use Nepomuk to tag photos or images (two different things boys!) or rate them if the meta-data is not stored to those itself. You can not share the file if you do not have meta-data included to itself. You can not backup it well etc.
    You could not read the photographs meta-data with other applications etc.

    In example the XMP allows user generated tags and other meta-data stored for photograph.

    Comment by Fri13 — March 29, 2009 @ 6:21 pm

  30. The first comment on the page, by P, is exactly what I thought.
    Hey, steganography is already out there and works. It can be used for malicious things. Why not give it some really good use for the masses? It would be just a matter of making the applications aware of the format!

    Comment by Cassiano Bertol Leal — March 29, 2009 @ 7:06 pm

  31. @Fri13: Yes, some file formats can already store metadata on them. But this metadata is usually not extensible. For example, Nepomuk has the support for tags to be added to a file and to a directory (directory metadata might be trickier to handle). Is it possible to add arbitrary tags to all formats that already handle metadata?

    Comment by Cassiano Bertol Leal — March 29, 2009 @ 7:09 pm

  32. I second Fri13 idea that the metadata should be stored in the file itself if the file’s format supports it. E.g. ID3 for MP3 and EXIF for JPG. These are standards already and will be the most likely to be readable by future programs. Every effort should be made to use the existing standards.

    You raise a good conversation about the metadata not covered by existing standards that one wishes to associate with a file.

    Comment by chad — March 29, 2009 @ 7:52 pm

  33. Seems that most people could agree that *where a suitable meta tag space already exists* in a file that it would be best to have it there. Trouble is, as has been pointed out, that there are many cases where that *doesn’t* exist, so we definitely need an *additional* solution, however it would be absurd to argue that if you rate an mp3 out of 5 stars… and the mp3 file itself has a space for remembering a 1-5 star rating that *regardless* of how meta data is handled overall (db / .meta file / whatever) that the data that *can* easily be included in the standard file format be included.

    We run the danger of forgetting that in many cases we can have the best of *both* worlds rather than just arguing which method is *always* the best in *every* use case (which looks to be none of them).

    Comment by Bugsbane — March 29, 2009 @ 9:54 pm

  34. Does the meta-data have to travel with the files all the time? I know that would be ideal, but it’s not particularly simple with our current filesystems and tools. What if there was a distinction between just copying files, and “archiving them”, where the latter generates .meta files from the Nepomuk database for portability. I know that “no one will use it”, but I would, and you would. You only need .meta files for transferring and saving the metadata on other computers and filesystems, otherwise they’d just be clutter on your KDE desktop.

    Metadata shouldn’t be stored in the files directly, because we can’t assume we have the right to modify user files. Checksums break, file sizes change, hell breaks loose. Besides MP3 or EXIF metadata is static — what it tells you about the file isn’t going to change. The artist of a song, or shutter-speed of a photo isn’t dictated by the users (normally).

    I imagine most metadata that we’d be dealing with is more personalized, and should thus be handles separately, tied primarily to users accounts (through your Nepomuk database in your ~/.kde folder) rather than individual files. By default, when you copy files, the metadata should stay behind (because there is no precedent for it being included, and you can’t assume the metadata is as public as the file). When you “archive” your data is when it should be included, because now it’s clear this data is going to remain in your possession, so it’s relevent to you.

    This can all be handled in KDE pretty well. For instance K3B can ask whether you want to save metadata for files you burn, and then request a .meta file from Nepomuk if the user wants to. Perhaps there could be an visual distinction in Dolphin for files with custom metadata, like an icon badge or highlighted colour. Then when files are copied, it won’t be unexpected to ask whether metadata should be transferred to, making it clear to the user what it’s talking about. Nepomuk should be the king of metadata on the desktop, offer simple network metadata sharing and backup, and centralize everything, instead of dumping ugly files all over the filesystem.

    Comment by clinton — March 29, 2009 @ 10:48 pm

  35. @clinton: Yeah I would probably use it, but we are not developing software for you and me, we are developing it for _everyone_, so it should be easy to use for _everyone_.😉 The clutter argument imo doesn’t count if you don’t get to see those files.

    Also I believe that for 90% of all file copying, the posession stays at the same user: Backups, moving something to another place on the hard-disk, to the mp3-player, etc., so to me it would make more sense to copy the meta-data as well, and only leave it behind on explicit demand by the user.

    Generally I think this should be optional: For parts of you filesystem you might want .meta files to be used, for other parts you might want the metadata to reside within the database, and for others you might want it to be embedded right into the files if possible.

    Examples:
    – In the personal documents folder, where I’m the author of all files, I would preferably have the metadata embedded right into the files if possible, since I’d interpret it as part of them
    – In an archive of files that I’m not the author of, like for example mp3’s, or ebooks, I would want the metadata to be in separate .meta files, so my own tagging does not change the identity of the files, and so when I copy around the collection or synchronize it with my other computer or a backup, the metadata moves as well.
    – For a collection on a samba share where I don’t want to do any writing/cluttering, I would want the metadata to stay within the database.

    Comment by zwabel — March 29, 2009 @ 11:09 pm

  36. /sigh this is yet another replacement for xattr’s. The argument that xattrs aren’t supported everywhere ignores the fact the these .meta files aren’t supported ANYWHERE. If you implement this .meta file idea you will have exactly the same problem as we have with xattrs, people will say I won’t use that because it isn’t supported everywhere. Just use xattrs and then people will complain about apps that don’t support them and they will get fixed.

    Comment by Mark — March 29, 2009 @ 11:16 pm

  37. @Mark:
    The big difference: The missing support does not lead to a data loss for the user in this case.

    If xattrs was used right now, it would be so unreliable that nobody would use it, and also nobody would complain. If we want to increase awareness of meta-data, we have to find a solution that works _now_, so users can start using it.

    Comment by zwabel — March 29, 2009 @ 11:27 pm

  38. My personal preference is for meta-data to be stored inside a file where-ever possible. Where facilities exist for doing this….
    Outside this I think a method would be most compatible with existing tools would be the folder as file method…. Files SHOULD be updated in versioning systems if the metadata changes. If changing the file is going to break something like a torrent then those files should be set to read only.

    File gets created or moved to a folder of the same name as the file and a .desktop file gets created inside the folder to tell the file manager to treat the folder as if it was the file by the same name inside the folder and any sort of meta info you want to store inside the folder… if you operate cp,mv on the folders everything should just work.

    If you expand the idea further… you could store multiple revisions of the same file in the one folder (seen as a single entity – with shared meta-data)… and the same file in different versions say a low res jpg for sending, a high quality tiff, the original Raw File and say an XCF of the image as it is that has the editing data for gimp…

    Comment by Danni Coy — March 30, 2009 @ 1:45 am

  39. @zwabel: I totally understand the argument that it should be easy to use for everyone. But you have to take into account that this is unlike anything ever implemented in a popular desktop environment. This is huge leap in the semantic desktop, and if you make the system too transparent than users won’t understand that the quirks, like invisible files called .pic002.jpg.meta is a feature, not a bug. The first time they open a CD they made in K3B on a windows machine and see a dozen files called .pic0xx.jpg.meta they’ll automatically think “linux broke something”.

    It’s like when you open a zip-file from an Apple user and it has an invisible ._MACOSX folder. Or when Windows litters thumbs.db all over your filesystem. It’s a feature, if you own a Mac, or frequently use Windows with “Hide protected and system files” turned on. Otherwise it’s clutter. And unless they never use another desktop again after KDE (haha, I wish!), then it’ll just be more clutter if the user doesn’t know what the files represent.

    The only way to overcome this perception will be for the user to know what metadata is, how it’s handled differently in KDE than any other desktop, and how that makes KDE better. Otherwise, once they start to notice all the quirks, forums across the internet will flood with “How do I stop KDE from filling my CDs with garbage” posts by people who simply don’t know about, or simply don’t care about saving metadata. You need to make users WANT the metadata, make them want the metadata to work on every computer they own (hooray for KDE for Windows/Mac OSX!), and miss it when they don’t have it. And in order to do that, it needs to be a prominent, heavily advertised, highly integrated part of the desktop whose usefulness is readily apparent.

    Haha, so, in a roundabout way, I guess what I’m saying is that IMO the only way for the .meta files method to be a success (as in, accepted by users and not disabled/complained about) is to make it an explicit, separate option than just “Copy” that users have to request specially, implying at least vaguely that what they are copying is more than the sum of the individual files they have selected. That way, they’ll recognize the .meta files when they DO see them eventually (on Windows or something), and recognize their worth. And I don’t think it need be more complicated than adding a third option to the standard “Copy/Move” duo — something like “Copy with metadata” or “Archive with metadata” or something like that, combined with a heavy marketing blitz of this EXACT feature, so that users know what they’re getting, pros and cons, from the beginning.

    Comment by clinton — March 30, 2009 @ 1:53 am

  40. Sorry managed to get my sentences mixed up…. comment should read as following

    My personal preference is for meta-data to be stored inside a file where-ever possible. Where facilities exist for doing this…. Files SHOULD be updated in versioning systems if the metadata changes. If changing the file is going to break something like a torrent then those files should be set to read only.

    Outside this I think a method would be most compatible with existing tools would be the folder as file method…. a File gets created or moved to a folder of the same name as the file and a .desktop file gets created inside the folder to tell the file manager to treat the folder as if it was the file by the same name inside the folder and any sort of meta info you want to store inside the folder… if you operate cp,mv on the folders everything should just work.
    If you expand the idea further… you could store multiple revisions of the same file in the one folder (seen as a single entity – with shared meta-data)… and the same file in different versions say a low res jpg for sending, a high quality tiff, the original Raw File and say an XCF of the image as it is that has the editing data for gimp…

    (Further thinking) This would solve the problem of moving files from one place to another and keeping the meta data. It still presents problems with older programs working within the same file system.

    Perhaps .meta files would be best but perhaps they should be directories instead. They should be visible by default but hidden with programs that can actually deal with them…

    Comment by Danni Coy — March 30, 2009 @ 2:31 am

  41. 1. “Mark” is right, obviously, store meta information using the extended attribute capability of decent file systems. End of story. Push that until it breaks, see what happens, only invent a new layer as a last resort.

    2. On file systems that don’t support extended attributes, your proposal might be reasonable emulation behavior. But the emulation belongs in the file system code, or in the extended attribute library. NOT in lots of user programs.

    3. KDE is multiplatform. Programs ought to use the POSIX extended attribute calls to do this work, but I don’t know if they work on Windows and Mac OS X. I’m not sure these calls are part of POSIX, they’re always in the context of POSIX ACLs.

    4. It seems like the uncertainty of 2 and 3 means the file attribute abstraction should be at the level of QFile. Has someone filed an enhancement request for Qt?

    5. When your emulation workaround is required, I would put the files in a .metadata subdirectory, I don’t want dozens of foo.meta files alongside foo. But it’s messy stuff either way. Using conventional files for extra file information *always* runs into glitches with long file names, long file paths, file name conflicts, attributes on attributes, Hans Reiser killing his wife in the midst of meta directory implementation, etc. That’s why OS/2 and I think NTFS sticks all extended attribute info in a single specially-named “\ea data. sf” file at the root on file systems that don’t support extended attributes.

    Comment by skierpage — March 30, 2009 @ 8:28 am

  42. clinton:
    > What if there was a distinction between just copying files, and “archiving them”, where the latter generates .meta files from the Nepomuk database for portability.

    I think this would be a nice idea. If you send a file via Kopete or KMail or burn it with K3B or copy it to an external HDD etc you will be asked if you want to also send/save the metadata. If so, a .meta file or something like this will be generated out of the database and added to the E-Mail, the File-Transfer, the CD, the folder etc.
    And if you receive files that have metadata-files with them the tags get integrated into your database and the .meta-file will be deleted.
    So you have no chaos with thousands of .meta-files when not using KDE-tools.

    Comment by Daniel — March 30, 2009 @ 12:13 pm

  43. There is one major problem with storing metadata in the file itself or a file alongside it: you have no way of properly linking the files with each-other and with other resources. Plus: files are only one of the possible resources on the desktop. There is much more which also needs metadata and is linked with files and from files. So you need a database anyway. Thus, in the end, the only solution I see at the moment is a kind of copy wrapper that makes sure metadata is copied with the file. Then one could also send information like a person or a project to a friend and the system would pick up all interesting metadata.
    At least this is the idea I have for Nepomuk. Sadly it is a lot of work and I am only one person.

    Comment by Sebastian Trüg — March 30, 2009 @ 12:41 pm

  44. I think best solution is to have central db with metadata, and supply metadata with files only when requested, for export/import purposes… Something like “Copy with metadata” option in file manager.

    Comment by anonymous — March 30, 2009 @ 2:34 pm

  45. It would be great to have something that works out-of-the-box with cp and not doubling the result of ls. I think the only solution for both is use fs, and just use a different tool to export meta information in separate files if fs doesn’t support.
    But I see a bit of confusion: are meta-information only related to a file, or to the couple (user, file) ? If a shared file can be read and write from different people, also its metadata should?
    I think that the best solution is something to identify files in an unique way and use that as a key in a central database. It should be createde a .meta file only when user explicits that. It could be done with separate tool, for instance “cp-meta”, that queries the database and creates the files.

    Comment by charon — March 30, 2009 @ 2:50 pm

  46. Thanks for the post. When thinking about this, I try to start with this: “what technology from today will still be around in 30 years?”.

    ASCII text files will be around. Hierarchical filesystems in some form will be around. Lossless media formats like .wav will still be readable. UTF-8 will probably be convertable into whatever is the current internationalization fashion.

    Sqlite databases? Not so much. Who wants to reverse engineer them (and whatever convoluted database schematics!) even if the sqlite source is free?

    Xattrs from some forgotten filesystem on some outdated OS? Yeah right.

    Given this, the .meta idea makes the most sense. Have key-values pairs inside, ASCII if possible, UTF-8 if necessary. If you find an archive of media files with associated .meta files 30 years from now, you will be able to figure out what the meta data is. Keep a sqlite cache for performance reasons only, and not as the primary store.

    Comment by Hans — March 30, 2009 @ 3:22 pm

  47. @Hans I totally agree, to every single point

    Comment by zwabel — March 30, 2009 @ 5:07 pm

  48. Hi, i found this interesting discussion on PlanetKde and I spend my just 2c.
    As long as the metainfo are “personal” they need to remain so, store the myson.jpg.meta in /home/trueneo/.metadatas/myson.jpg.meta and Every user will have its metadata.
    It is my opinion that copy the metadata alongside with the file is useful only when we need a backup copy, is useless when I just need to give a bunch of photos to a friend of mine using a pendrive or another media. When I need to backup the photos dirs then should be asked if I need to backup the metadatas of mine or the ones of other users. Storing photos on a DVD should be done creating an hidden dir ./metas/ that will contains a dir for every user included in the backup.
    Linking the file with its metadata is the real problem, we could have different files in different location with the same name. Instead of creating a .meta for every files we could use a single file with all the metas and the paths to reach the files, but we need an ID, or any other method, that will make the dir or the file unique, should be a Strigi job?

    Comment by Trueneo — March 30, 2009 @ 9:30 pm

  49. […] Portable Meta-Information KDE4 is all about new technologies, and standardizing. Now we have a central mechanism to store metadata, called […] […]

    Pingback by Top Posts « WordPress.com — March 31, 2009 @ 12:19 am

  50. I just wanted to add one point.

    People gave xattr as an option.
    I think, xattr, or any similar solution for that matter, would be the wrong thing to do.
    During these discussions it is important to keep in mind that KDE is now cross-platform.

    Comment by Ritesh Raj Sarraf — March 31, 2009 @ 1:58 pm

  51. […] Portable meta-information has been discussed twice recently:http://www.kdedevelopers.org/node/3923https://zwabel.wordpress.com/2009/03/29/portable-meta-information/I also have something to say that doesn’t fit into one comment, so i post it :)My thought is […]

    Pingback by Wang Hoi (wkai): Portable Meta-Information again… at Open Source Software Pack — March 31, 2009 @ 10:46 pm

  52. All solutions have strings attached. IMHO Storing it into the filesystems’ metadata channel is the only clear solution. Most modern filesystems have metadata channels for files (= hidden bytestreams that are attached to the file node and can be accessed using magic os commands).
    the trick here is: we can hook into the filesystem driver and let it index the metadata into nepomuk when it changes.
    Storing it in extra files “.meta” always gives the user the possibility to screw up, but has the advantage of USB-stick compability (copy it to a usb stick- the metadata will follow).
    Storing it only in NEPOMUK db has the big advantage that you need the DB anyway – you only enter all this information for SEARCHING and SEARCHING ALWAYS NEEDS A CENTRAL INDEX! (there is no way around that index, even on file-system level there is a node-index hashtable or smthgn)

    btw, I found no perfect solution for this the last 6 years (part of my PhD is about it) and helped desig the way its done in nepomuk.

    Comment by leobard — April 1, 2009 @ 8:54 am

  53. I re-read the thread again today and add:

    First: read the post by Sebastian Trüg again and again, he is the main developer of nepomuk and knows his stuff:

    “you need a database anyway. Thus, in the end, the only solution I see at the moment is a kind of copy wrapper that makes sure metadata is copied with the file. Then one could also send information like a person or a project to a friend and the system would pick up all interesting metadata.”

    @Remco Bloemen: xattr is not a standard in any sense, its a nice word for a set of plethora and incompatible ideas. XATTR does not solve the problem of a user copy-pasting files from a to b. There is no common standard (afaik) for FAT32 implemented by anyone – this is a showstopper for “copy to usb-stick” support, so the whole xattr idea is half an attempt which would need a lot of very expensive standardization work to be viable. BUT of course, in 10 years we want to be there – but now is the time to show what we want to have stored in the xattr files: RDF data!

    @Remco Bloemen, @zwabel: Pushing Dublin Core is the right way to do, but do it the W3C way, standardized. Which I translate to: use the RDF encoding of Dublin Core and for example Turtle/N3 as serialization format, this is a rock-solid W3C industry standards which will be readable in 20 years. This is only slightly longer than the hand-rolled (unimplemneted) stuff suggested by freedesktop, and it is extensible because it uses namespaces. All other solutions will not be extensible and documented, I can trust that namespaces are archived on archive.org forever, documenting how to read the dataformat and interpret it. Example of Dublin Core and W3C standardized data in standardized Turtle encoding:

    @prefix dc: .
    @prefix rdf: .

    dc:creator “Dave Beckett”;
    dc:date “2002-07-31”;
    dc:publisher “ILRT, University of Bristol”;
    dc:title “Dave Beckett’s Home Page” .

    Niko:
    The problems you describe need to be solved. But your suggestion only works for files – metadata exists also for:
    – Contacts
    – IM users
    – IM History
    – mails on a remote imap server
    – bookmarks (konqueror, kate etc)
    – akregator feed entries

    @Niko: the proposed solution does work for things besides files, at least on the signle desktop. This is solved, the metadata is stored in the nepomuk service and uses the nie ontologies, look at http://www.semanticdesktop.org/ontologies/nie/

    Nevertheless, because the world is not perfect and needs many possible ways to evolve, we must store the metadata in redundancy now in as many places as possible – but in one format. for freedesktop and nepomuk RDF is the best choice: it is serializeable, it can be stored in a database, it can be hosted on the web. No other standard has this. It is embedded in PDF already in the XMPP format, and commonly used on the web.

    I propose “.turtle” files to indicate that its RDF/Turtle serialization, but if you insist, “.rdf” is also fine with me (but implying RDF/XML storage, which is a bit sluggish), and “.meta” is also fine with me if you store RDF/turtle inside. Making up a new micro format would be stupid.

    My Summary:
    – storing it in the filesystem is nice, but not a killer-argument. It works ™ by just storing it in the central nepomuk repository for 90% of all use cases, so start hacking applications that help the users save time and improve their user experience with what is there today.
    – do not store it in .meta, but in .turtle, which is the rock-solid industry standard by W3C and human-readable and a simple microformat-like text format (smoother than xml)
    – do also store it however possible in the files themselves, not to block out others. Use EXIF fields, use XMPP fields in PDF, use ID3v2 fields, use those metedata!
    – do also index it in the central search engine, be it nepomuk or beagle++ (beagle++ is the rdf-enabled beagle, check it out if you are not aware of it)
    – Storing it in metadata file attributes (xattr/channels/…) is the goal, but I propose to extend these standards with RDF to achieve cross-system compability. What worked for the web, may also work here.

    Comment by leobard — April 22, 2009 @ 2:52 pm

  54. Annotating files – but where to store the metadata?…

    An interesting thread about file metadata for KDE got my attention: Portable Meta-Information. I waited a month until it cooled down and re-read it to draw my own conclusions.

    The author, zwabel, correclty identified the problem that the Semantic Desk…

    Trackback by leobard.twoday.net — April 22, 2009 @ 3:18 pm

  55. […] […]

    Pingback by . - — May 1, 2009 @ 1:32 pm

  56. […] topic is old enough to be discussed in the FLOSS […]

    Pingback by On Hierarchical File Systems and Storage Location « Thorwil’s — August 23, 2009 @ 9:46 am

  57. […] Thoughts on portable metadata (why a central tag repository/db is flawed), also here Portable Meta-Information continued and here […]

    Pingback by TagFS, tracking progress in the field of semantic file systems | Zen of Linux — October 4, 2009 @ 12:38 pm

  58. […] Thoughts on portable metadata (why a central tag repository/db is flawed), also here Portable Meta-Information continued and here […]

    Pingback by goes Zen » TagFS, tracking progress in the field of semantic file systems — March 19, 2012 @ 4:06 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: