GAVO Blog: Virtual Observatory Matters Home

GAVO at the Fall 2023 Interop in Tucson

2023-11-13 Markus Demleitner

The Virtual Observatory, in practical terms, is the set of standards created and maintained by the IVOA. The IVOA, in turn, is a community almost defined by the two conferences it holds every year, the Interops (previously on this blog). The most recent Interop has just ended: The 2023 Tucson Fall Interop. Here are a few notes on what went on there from my (and to some extent GAVO's) perspective.

This fall's IVOA Interop was hosted by Steward Observatory, where they had ripening oranges in the backyard. They were edible!

For at least a decade and a half, the autumn Interops have been back-to-back with the ADASS conferences. ADASS, short for Astronomical Data Analysis Software and Systems, is a venerable conference series, created far in the last century (this year: ADASS XXXIII) to have a forum for people who work in the magic triangle of astronomy, instrumentation, and data processing. Clearly, such a forum is very well suited to spread the word about the miracles we are working in the VO.

To that end, I was involved in the creation of three posters: One on the use of MOCs in TAP – a somewhat extended version of something you saw on this blog first –, then one on data discovery in pyVO by Renaud Savalle (Paris) et al – a topic again familiar to readers of this blog – and finally one on improving the description of ADQL to enable more reliable machine validation of its grammar by Grégory Mantelet (Strasbourg) et al.

As the conference at large goes, I was really delighted to see how basically everyone talking about data publication at all was stressing they are “doing VO”, which was a very welcome change from, perhaps, 10 years ago when this kind of talk was typcially extolling the virtues of one particular web or javascript framework. One of the great thing about standards in general and the VO in particular is that they tend to be a lot more durable than all those frameworks.

The following Interop was a “short” one, lasting from Friday morning until Sunday noon, which meant that I was far too busy to do anything like a live blog while it went on. Let me hence just briefly point out the main talks related to GAVO's current activities and DaCHS.

In Data Curation and Preservation on Saturday morning, Baptiste Cecconi (Paris) gave a nice overview of – among other things – what our bridge between the Registry and b2find (in particular, using the VOResource to DataCite mapper) enables in the context of the EOSC, and he briefly touched the question of how to properly make landing pages for VO resources (for which I am currently using another piece of XSLT).

In the Radio session later that morning, Ixaka Labadie (Granada) gave a talk on how he is using DaCHS to deliver 3D visualisations for fairly impressive (prototype) SKA data. I particularly liked his illustrations of how DaCHS does Datalink and SODA. See his slide 12:

In the afternoon, there was the Registry session, which featured me talking about the harvest trigger service I have been running for a while to help people across the anticlimactic moment when you have published your new resource but it won't show up in TOPCAT or pyVO for a day or so.

The bulk of this session, however, was used for a discussion about various shortcomings of the Registry or its interfaces that I found pleasantly productive – incidentally, just like the discussion on word lists in EPN-TAP on Friday afternoon's Solar System Session that I had the pleasure to chair.

In the DAL session on that afternoon, I had two talks: One was on the proposed new interoperable user-defined functions already implemented in DaCHS' ADQL and now coming up in several other services, too. Note to self: Some of these would probably be rather suitable blog post material.

The second talk was a sort of brief show-and-tell pitch, in which I pointed out that hierarchical TAP examples using the elegant examples:continued property now actually work in both pyVO and TOPCAT:

Finally, in Sunday morning's Apps session, I talked about global image discovery in pyVO. This was about an early promise of the VO: just say where in space, time, and spectrum you need an image (or spectrum, or time series, or whatever), and some apparatus will find and query all the services that could have pertinent data. It would then present the metadata of the datasets it found in some useful form that would let you make informed decisions which to fetch.

This was not too difficult in the olden days, but by now the VO is so big and complicated that a pyVO module with fairly involved logic is required. If you don't want to read the notes here, don't worry: I can safely predict that you'll read more about that topic on this blog.

This is nowhere near done yet; so, it is one more piece of homework that I am taking home with me.

Category: Meetings
GAVO at the AG-Tagung in Berlin

2023-09-12 Markus Demleitner

It's time again for the annual meeting of the German astronomical society, the Astronomische Gesellschaft. Since we have been reaching out to the community at these meetings there since 2007, there is even a tag for our contributions there on this blog: AG-Tagung.

Due to fire codes, our traditional booth would almost have ended up in a remote location on the third floor of TU Berlin's main building, and I had already printed desperate pleas to come and try find us. But in a last minute stunt, the local organisers housed us in an almost perfect place (thanks!): we're sitting right near the entrance, where we can rope in passers-by and then convince them they're missing out if they're not “doing VO”.

One opportunity for them to realise how they're missing out is our puzzler, this year about a lonely O star:

Since this star must have formed very (by astronomical standards) recently, it should still be in its nursery, something like a nebula – but it clearly is not. It's a runaway. But from what?

Contrary to last year, we will not accept remote entries, sorry – but you're welcome to still try your hand even if you are not in Berlin. Also, if you like the format, there's quite a few puzzlers from previous years to play with.

I have just (11:30) revealed the first hint towards our sample solution:

We recommend solving this puzzler using Aladin. There, you can look for services serving, e.g., the Gaia DR3 data in the little “select” box in in the lower left corner. Shameless plug: Try dr3lite.

If you are on-site: drop by our booth. If not: we will post updates – in particular on the puzzler – here.

Followup (2023-09-13)

At yesterday's afternoon coffee break, we gave the following additional hint:

To plot proper motions for catalogue objects in Aladin, try the Create a filter… entry in the Catalog menu.

And this morning, we added:

If you found Gaia DR3, you can also find editions of the NGC catalog (shameless plug: openngc). These are small enough for a plain SELECT * FROM….

Followup (2023-09-14)

The last puzzler hint is:

Aladin's dist tool comes in handy when you want to do quick measurements on the sky. If you are in Berlin, you still have until 16:00 today to hand in your solution.

However, the puzzler should not prevent you from attending our splinter meeting on e-science and the Virtual Observatory, where I will give an overview over the state of ADQLs in arrays. Regular readers of this blog will remember my previous treatment of the topic, but this time the queries will be about time series.

Followup (2023-09-14)

Well, the prize is drawn. This time, it went to a team from Marburg:

As promised, here's our solution using Aladin. But one of the nice things about the VO is that you get to choose your tools. One participant using pyVO was kind enough to let us publish their solution using pyVO, too: puzzler2023-solution.py. Thanks to everyone who particpated!

Category: Meetings
Making Custom Indexes for astrometry.net

2023-07-26 Markus Demleitner
When you have an image or a scan of a photographic plate, you usually only have a vague idea of what position the telescope actually was pointed at. Furnishing the image with (more or less) precise information about what pixel corresponds to what sky position is called astrometric calibration. For a while now, arguably the simplest option to do astrometric calibration has been a package called astrometry.net. The eponymous web page has been experiencing… um… operational problems lately, but thanks to the Debian astronomy team, there is a nice package for it in Debian.

However, just running apt install astrometry.net will not give you a working setup. Astrometry.net in addition needs an “index”, files that map star patterns (“quads“, in astrometry.net jargon) to positions. Debian comes with two pre-made sets of indexes at the moment (see apt search astrometry-data): those based on the Tycho 2 catalogue, and those based on 2MASS.

For the index based on Tycho 2, you will find packages astrometry-data-tycho2-10-19, astrometry-data-tycho2-09, astrometry-data-tycho2-08, astrometry-data-tycho2-07[1]. The numbers in there (“scale numbers”) define the size of images the index is good for: 19 means “a major part of the sky”, 10 is “about a degree”, 8 “about half a degree”. Indexes for large images only have a few bright stars and hence are rather compact, which is why 10 though 19 fit into one package, whereas astrometry-data-tycho2-07-littleendian weighs in at 141 MB, and indexes at scale number 0 (suitable for images of a few arcminutes) take dozens of Gigabytes if they are for the whole sky.

So, when you do astrometric calibration, consider the size of your images first and then decide which scale number is sensible for you. It is usually a good idea to try the neighbouring scale numbers, too.

You can then feed these to your calibration routine. If you are running DaCHS, you will probably want to use the AnetHeaderProcessor, where you give the names of the indexes in the sp_indices; you also have to say where to find the indexes, as in:
```
from gavo import api

class MyObsCalibrator(api.AnetHeaderProcessor):
  indexPath = "/usr/share/astrometry"
  sp_indices = ["index-tycho2-09*.fits",
    "index-tycho2-10*.fits",
    "index-tycho2-11*.fits",]
```
This would be suitable for images that cover about a degree on the sky.
Custom Indexes for Targeted Observations

The Tycho catalogue starts becoming severely incomplete below m_V ≈ 11, and since astrometry.net needs a few stars on an image to be able to calibrate it, you cannot use it to calibrate images smaller than a few tens of arcminutes (depending on where you look, of course). If you have smaller images, there are the 2MASS-based indexes; but the bluer your images are, the worse 2MASS as an infrared survey will do, and in addition, having the giant indexes is a big waste of storage and compute resources when you know your images are on a rather small part of the sky.

In such a situation, you will save a lot of CPU and possibly even improve your astrometry if you create a custom index for your specific data. For instance, assume you have images sized about 10 arcminutes, and the observation programme covers a reasonably small set of objects (as long as it's of order a few hundred, a custom index certainly will be a good deal). You could then make your index based on Gaia positions and photometry like this:
```
"""
Create an index for astrometry.net and a few small fields based on Gaia.

Be sure to adapt this for your use case; for instance, if what your are
calibrating will be from only a part of the sky, pick specific healpixes
(perhaps on a different level; below, we're using level 5).  Also consider
changing the target epoch, the photometry, or the magnitude limit.

This script takes the sample positions from a text file; have
space-separated pairs of ra and dec in targets.txt.
"""

import os
import subprocess

from astropy.table import Table
import pyvo

# 0 is for images of about two arcminutes, 10 for about degree, 12 for two
# degrees, etc.
SIZE_PRESET = 1

# The typical radius of your images in degrees (this is the size of our cone
# searches, so cut some slack); this needs to be changed in unison with
# SIZE_PRESET
IMAGE_RADIUS = 1/10.


def get_target_table():
    """must return an astropy table with columns ra and dec in degrees.

    (of course, if you have your data in a proper format with actual metadata,
    you don't need any of the ugly magic).
    """
    targets = Table.read("targets.txt", format="ascii")
    targets["col1"].name, targets["col2"].name = "ra", "dec"
    targets["ra"].meta = {"ucd": "pos.eq.ra;meta.main"}
    targets["dec"].meta = {"ucd": "pos.eq.dec;meta.main"}
    return targets


def main():
    tap_service = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
    res = tap_service.run_async(f"""
        SELECT g.ra as RA, g.dec as DEC, phot_g_mean_mag as MAG
        FROM gaia.dr3lite AS g
        JOIN TAP_UPLOAD.t1 as mine
            ON DISTANCE(mine.ra, mine.dec, g.ra, g.dec)<{IMAGE_RADIUS}""",
      uploads={"t1": get_target_table()})

    cat_file = "basic-cat.fits"
    res.to_table().write(cat_file, format="fits", overwrite=True)

    try:
        subprocess.run(["build-astrometry-index", "-i", cat_file,
            "-o", f"./index-custom-{SIZE_PRESET:02d}.fits",
            "-P", str(SIZE_PRESET), "-S", "MAG"])
    finally:
        os.unlink(cat_file)


if __name__=="__main__":
    main()
```
This writes a single file, index-custom-01.fits (in this case).

If you read your positions from something else than the simple ASCII file I'm assuming here: Be sure to annotate the columns containing RA and Dec with the proper UCDs as shown here. That makes DaCHS (and perhaps other TAP services, too) create the right hints for the database, speeding up things tremendously.

You can of course change the ADQL query; it might, for instance, help to replace the G magnitudes with RP or BP ones, or you could use a different catalogue than Gaia. Just make sure the FITS table that is written to basic-cat.fits has exactly the columns RA, DEC, and MAG.

In DaCHS, I tend to keep scripts like the one above in a subdirectory of the resdir called custom-index, and then in the calibration script I write:
```
from gavo import api

RD = api.getRD("myres/q")

class MyObsCalibrator(api.AnetHeaderProcessor):
  indexPath = RD.resdir
  sp_indices = ["custom-index/index-custom-01.fits"]
```
Custom Indexes for Ancient Observations

On the other hand, if you have oldish images not going terribly deep, you may want to tailor an index for about the epoch the images were taken at. Many bright stars have a proper motion large enough to matter over a century, and so doing epoch propagation (in this case with the ivo_epoch_prop user defined function, which is not available everywhere) is probably a good idea. The following script computes three full-sky indexes with quads around the desired size; note how you can set the limiting magnitude and the size preset:
```
"""
Create a full-sky index for bright stars and astrometry.net based on Gaia.

This only works for rather bright stars because the Gaia service will refuse
to server more than ~1e7 objects.

Make sure to choose SIZE_PRESET to your use case (19 means 30 deg,
10 about a degree, two preset steps are about a factor two in scale).
"""

import os
import subprocess

import pyvo

# see the module docstring
SIZE_PRESET = 12

# ignore stars fainter than this; you can't go below 14 all-sky with Gaia
# and the GAVO DC server
MAX_MAG = 12

# Epoch to transform the stars to
TARGET_EPOCH = 1910


def main():
    tap_service = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
    res = tap_service.run_async(f"""
        SELECT pos[1] as RA, pos[2] as DEC, mag as MAG
        FROM (
            SELECT phot_bp_mean_mag AS mag,
                ivo_epoch_prop(ra, dec, parallax,
                    pmra, pmdec, radial_velocity, 2016, {TARGET_EPOCH}) as pos
            FROM gaia.dr3lite
          WHERE phot_bp_mean_mag<{MAX_MAG}) AS q""")

    cat_file = "current.fits"
    res.to_table().write(cat_file, format="fits", overwrite=True)

    try:
        for size_preset in range(SIZE_PRESET-1, SIZE_PRESET+2):
            subprocess.run(["build-astrometry-index", "-i", cat_file,
                "-o", f"./index-custom-{size_preset:02d}.fits",
                "-P", str(size_preset), "-S", "MAG"])
    finally:
        os.unlink(cat_file)


if __name__=="__main__":
    main()
```
With this and my custom-index directory, your DaCHS header processor could say:
```
from gavo import api

RD = api.getRD("myres/q")

class MyObsCalibrator(api.AnetHeaderProcessor):
  indexPath = RD.resdir
  sp_indices = ["custom-index/index-custom-*.fits"]
```
Custom Indexes: Full-sky and Deep

I have covered the cases “deep and spotty” and “shallow and full-sky“. The case “deep and full-sky“ is a bit more involved because it still lies in the realm of big data, which always requires extra tricks. In this case, that would be retrieving the basic catalogue in parts – for instance, by HEALPix – and at the same time splitting the index up between HEALPixes, too. This does not require great magic, but it does require a bit of non-trivial bookkeeping, and hence I will only write about it if someone actually needs it – if that's you, please write in.

[1] You will also find that each of these exist in a littleendian and bigendian flavours; ignore these, your machine will pick what it needs when you install the packages without tags.

Category: Operations
DaCHS 2.8 is out

2023-06-22 Markus Demleitner

Today, I have released DaCHS 2.8 and uploaded it to our APT repository; it should also appear in Debian unstable within the next two weeks. This is the traditional post on what is new in this release.

If I had to name the highlights of what was added since version 2.7, released last November, I would probably say it's HiPS support and the general move towards SIAPv2, although I would have to admit that both did not involve large amounts of code, in particular when compared to the various changes related to COOSYS and TIMESYS.

So, what about HiPS support? As you probably know, HiPSes are zoomable images (or catalogues, too); if you have a survey-like image collection published through SIAP, you owe it to yourself to have a look at this.

Given HiPSes are so interactive in Aladin and the like, it may be surprising that they do not really require an active server component: technically, they are just a directory tree created and organised in a very clever way. So, why would DaCHS have a HiPS renderer and boast about it? Well, there are a few amenities (such as auto-generated hips.params files and properties once you have your RD), and DaCHS will care about the Registry side of a HiPS publication. For details, see the HiPS section in the tutorial.

The SIAP2 story is that (against my rather substantial skepticism) people insisted on creating a new image search protocol in the early 2010s. Since it doesn't have tangible benefits over the venerable SIA1 and even less over Obscore, DaCHS so far has limited its support for SIAP2 to a single global SIAP2 service based on the Obscore table. But then SIAP1 with its stinky UCDs does show its age, and since support for SIAP2 in various clients has been falling into place over the last few years, DaCHS now nudges you to publish your images through SIAP2, for instance by producing a template for a SIAP2 service in dachs start.

SIAP2 is also what the image section of the tutorial now reflects. If you already have SIAP1 services, the migration should not be hard (except where you used the siapCutoutCore), but given occasional shakiness in the SIAP2 support of the various tools, I'd still wait for a year or two; I have certainly no plans to remove SIAP1 from DaCHS within the next ten years or so. If you still want to migrate, feel free to ask for a section on doing so in DaCHS' How Do I? document.

From the department of “this update may break your service”: I you have SODA cutouts of cubes, this update will rather likely break the cutout on the non-spatial axis. To fix things, if that axis is spectral, pass its index in a spectralAxis parameter to //soda#fits_standardDLFuncs (or to //soda#fits_makeWCSParams, if that's what you use)[1]. On the other hand, you can now define a velocityAxis, too (and for other cases, there is still axisMetaOverrides).

Among the more generally interesting new features may be the UnionGrammar. This is for when you have multiple sorts of inputs that require different parsers, for instance, when the data provider changes the formats in which they deliver the data in the midst of a project. I would hope the example from the unionGrammar documentation illustrates what this could be useful for:
```
<unionGrammar>
  <handles pattern=".*\.txt$">
    <reGrammar...>
  </handles>
  <handles pattern=".*\.csv$">
    <csvGrammar...>
  </handles>
</unionGrammar>
```
Also note that you can create some uniformity between what the grammars yield (and thus avoid a lot of if-else-ing in the rowmaker) by using rowfilters.

I would have needed the union grammar several times before but had always quickly hacked around that need with some custom grammar. Another itch that has in this way come up multiple times before and for which 2.8 has what I think is a reasonable solution: I occasionally want to share some logic between multiple RDs, but that logic is not general enough to go into DaCHS itself. For such situations, you can now drop a file local.py into your configuration directory (usually, /var/gavo/etc).

In code saying from gavo import api (which is what you should in general do when programming against DaCHS; in procs, say <setup imports="gavo.api"/>), you can then access the names defined in there as api.local.<name>. For instance (and that's not contrived), say your observers have several particularly babylonian ways of writing times, and you have to parse these in several data collections (i.e., RDs). You could then add a function like this to your local.py:
```
def parse_babylonian_time(raw_time:str) -> float:
  """Tries to interpret raw_time as a time in one of the many forms
  our observers like so much.

  Here is the syntaxes supported by the function:

  >>> parse_babylonian_time("1h")
  3600.0
  >>> parse_babylonian_time("4h30m")
  16200.0
  >>> parse_babylonian_time("1h30m20s")
  5420.0
  >>> parse_babylonian_time("20m")
  1200.0
  >>> parse_babylonian_time("10.5m")
  630.0
  >>> parse_babylonian_time("1m10s")
  70.0
  >>> parse_babylonian_time("15s")
  15.0
  >>> parse_babylonian_time("s23m")
  Traceback (most recent call last):
  ValueError: Cannot understand time 's23m'
  """
  mat = re.match(
    r"^(?P<hours>\d+(?:\.\d+)?h)?"
    r"(?P<minutes>\d+(?:\.\d+)?m)?"
    r"(?P<seconds>\d+(?:\.\d+)?s)?$", raw_time)
  if mat is None:
    raise ValueError(f"Cannot understand time '{raw_time}'")
  parts = mat.groupdict()

  return (float((parts["hours"] or "0h")[:-1])*3600
    + float((parts["minutes"] or "0m")[:-1])*60
    + float((parts["seconds"] or "0s")[:-1]))
```
(or something similarly abominable). That way, the function is available to all RDs, there is just one implementation to maintain, and it can be centrally tested (dachs test could certainly do with with a facility to execute local.py doctests, too).

DaCHS 2.8 also comes with yet another way to declare space-time metadata. That's a longer story, and while all this should have happened 10 years ago, there's no particular hurry now. I will therefore write about improvements in TIMESYS and COOSYS in a later post dedicated to votable:Coords and its products. Meanwhile, just two things: In the unlikely case you already have “stc2“ annotations in your RDs, you will have to rename the value attribute in space clauses to location. And: SSAP and SIAP now produce proper TIMESYS-es. If you happen to know the timescales and reference positions of your observation dates, starting in 2.8 you can define them in the respective mixins (the refposition and timescale mixin parameters).

There are two notable additions in DaCHS' Datalink support (which is newly declared to support version 1.1): For one, you can now pass contentQualifier to descriptor.makeLink[FromFile], which will normally be a product type taken from http://www.ivoa.net/rdf/product-type (e.g., “image” or “dynamic-spectrum“). Because they can help clients select appropriate clients to send a datalink to, it is certainly a good thing to add them to your datalinks where applicable.

Also, datalink meta makers can now return ProcLinkDef instances. This lets you have multiple distinct processing services within a single Datalink document. To make that a bit prettier, there is also a secret handshake (as in: an INFO element with a name of title) between DaCHS' datalink service and the XSLT that formats datalink documents in browsers (also available for third-party datalink documents). See multiple processing services in the reference for details.

Let me briefly mention a few more changes you may be interested in:
- condDescs can now be declared as inputOptional, which is useful when you want to have syntax-adaptive defaults.
- you can now configure the size of DaCHS connection pools in [db]poolSize (in particular, set it to 0 to disable connection pooling).
- in ADQL, you can now do things like CONTAINS(CIRCLE(23, 42, 1), some_moc) (i.e., compute boolean predicates between the classical geometries and MOCs).
- DaCHS no longer fails with numpy-s later than 1.23, and is no longer dependent on the cgi module that is scheduled for removal from python. In consequence, there is a new dependency, python3-multipart.
[1] That is, unless you already defined spectralAxis because DaCHS' heuristics were wrong before version 2.8. But then your service won't break, either.

Category: Software
At the Bologna Interop

2023-05-08 Markus Demleitner
As I usually do at Interops, I plan to give a few impressions from the Virtual Observatory's semiannual get-together on this blog, updating as we go. This time, it's about the May 2023 Bologna Interop.

After six „virtual“ Interops (the last one in October 2022), this is the first one with actual people and, most importantly, an actual coffee break table. Attempts to replace that with gathertown, I have to say, never really panned out, so I'm looking forward to pushing ahead many of the small things that make a project like the VO tick, and do that with less effort than try and get people into telecons.

Also, it's my last Interop as chair of the Semantics Working Group – to prevent informal hierarchies as well as possible, there's a limit of four years in a single IVOA position, and my four years as the herder of meanings are now over. So, the Bologna Semantics Session will be the last one I will chair. Will you do me a favour and attend? Since the conference is hybrid, you can even do that if you are not in town.

2023-05-09, 10:00

I approached this morning's Science Platform Plenary with a fair amount of apprehension because I'm always worried that these platforms actually appear so attractive to management because they are the old silos management knows. For instance, people would go back to write software for their data specifically and no one could be blamed for “wasting“ money on software useful to others.

Sure, custom and tailored software is faster to do, and the resulting lock-in perhaps even helps getting shiny metrics for a while, but the results are also much faster to break, not to mention interoperability goes down the drain, it's a big exercise in exclusion, and of course everyone re-implementing about the same thing every time is a gigantic waste of money and, worse, human effort.

Slide 13 from Jesus' talk. Rights his.

Fortunately, most of the talks did not aggravate these concerns. On the contrary, most of what I saw was fairly generic compute platforms that very credibly strive to be open, both on getting things in and getting things out.

But I'll not deny that what I particularly liked was Jesus Salgado's distinctly un-platformy proposals for extending SODA (slide 13) – most of the operations envisaged sound very useful, sensible, and doable, and I will certainly put them into DaCHS if someone (cough else) works them out.

The only really alarming thing I heard in the platforms session was the term “multi-factor authentication“.

Come on, none of what we're doing here is the sort of thing where anything major would break if someone pilfered credentials. Please, please let's be reasonable. There's a lot less harm done if someone runs a few CPU hours on someone else's account than if humans were forced to copy many digits from one device to another device all the time[1].

Don't get me wrong: There are places where 2FA may be a good idea, in particular when other peoples' personal data is concerned. I'm just saying that most of the time, 2FA causes more annoyance than the occasional pilfered credential would (and that you shouldn't process other peoples' personal data without a really strong reason in the first place).

2023-05-09, 17:00

A personal highlight of every Interop for me as a Registry geek is of course the session of the Registry WG, which today featured two talks by yours truly. However, it opened with a slightly humbling piece by Hendrik Heinl on how unsatisfying it is to discover time series in the current VO. It would have been badly humbling if it hadn't highlighted why several of the things I've been after for many years matter, most of all the move to data discovery I have talked about here before.

Of my two talks, one was an abridged and perhaps a bit more entertaining version of my recent blog post on the various sorts of lint I find in the VO Registry. The other was very dry fare on standards development; only look at it if you're into evolving VOResource and its extensions, and I'm afraid I have to say about as much on Renaud's contribution on some incremental changes to StandardsRegExt, which in itself works pretty much exclusively behind the scenes. Suffice it to say that even in the VO there are those little thankless jobs.

2023-05-10 16:00

Phewy. Another two talks down, one to go. In the session informally called DOI I (where DOI here is a Digital Object Identifier, in our case almost always managed through DataCite), I reminded everyone that if they have an IVOID (in plain English: are in the VO Registry), they can improve their citeability dramatically by getting themselves a DOI using voidoi (which of course only is interesting if you cannot or do not want to mint your resource's DOI in some other way).

Let me mount a soapbox here for a moment: I'm caring about DOIs because I want paper authors to be able to cite data in a way that lets people find the resources used. That in the case of a DOI the reference is machine-readable to me is a liability rather than an advantage, since it makes it even easier to come up with metrics. And metrics, I claim, are almost always a bad thing, either masking agendas that should be made explicit or, worse and more typical, making matters worse accidentally – which is almost inevitable as soon as people start gaming the metrics, which in turn is almost inevitable when you threaten their livelihoods using metrics.

Given that, it was not easy keeping quiet and not starting to argue points to that effect (which I'll gladly do here if anyone gives me an excuse to do so) during much of the second DOI session. Let me at least make one point to any funders possibly venturing here: Persistent identifiers to data don't make persistent institutions keeping the data obsolete.

Such persistent institutions also have a critical role in curating the metadata going into the PIDs, a point driven home in Gus' talk; look at slide 15 for impressions of the sort of desasters happening when you create citations from DataCite records encountered in the wild. In my assumed role as a Registry janitor (as per this recent post) I had complete empathy with Gus.

My second talk this morning I again gave in the wonderful large auditorium (a real treat for a limelight hog like me): I talked about the hairy problems raised by major version steps in protocols. There was not too much discussion on this – less than I had hoped for, really, in particular later during the lunch break –, but having presented the problem in front of this kind of audience, I'm now rather sure the right way to proceed is what's Option I in my talk: deprecate servicetype='image'. The sort of global discovery that was envisaged to be enabled by servicetype constraints probably needs to be handled in a proper function hiding the gory details from the users.
2023-05-11, 12:30

This morning I had the last session in my term as the chair of the Semantics working group, featuring talks reporting on the progress of various semantic artefacts by different people; whether or not it's justified, I feel some satisfaction seeing this sort of activity that I'd take as the sign of a mature working group operating.

Me, on the other hand, talked quite a bit on an entirely maverick topic: Linked Data in VOTable. As I point out in the talk, in the one place we are using RDFa (which I identify with the buzzword “linked data“ for the purposes of this talk) in the VO it's a big success (TAP examples, which use RDFa over XHTML). Perhaps we should have more of that?

The obvious place to add RDFa to VO stuff would be our central container format VOTable, which conveniently is based on XML, and hence existing RDFa tooling is immediately applicable when we add a few RDFa attributes to a few VOTable elements. I proved that with some examples and three lines of pyrdfa code and was sort-of happy with getting nice, Turtle-formatted RDF triples out of very lightly annotated VOTables.

However, if you have followed the pyrdfa link, you may have seen the main argument against the whole effort:

This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

It would seem that RDFa within XML-derived formats is not a terribly active topic these days. If that's true, then effort from the VO side to be interoperable with this part of the outside world would be largely wasted – that outside world might very well be smaller than the VO itself now. On the other hand, if I look at Linked Open Vocabularies, it would seem that there are communities using RDF as such very actively, and some of these vocabularies we could very well reuse.

And then there is a problem I couldn't figure out that may be a good test case for using ChatGPT on technical questions (feel free to try): “How do I make an RDF resource out of element content in RDFa?“ In case that's too dense a question: What I'd like to do is some RDFa markup such that:
```
<INFO property="doap:homepage"
  magic-attribute="magic-value"
  >http://foo.bar"</INFO>
```
works out to:
```
<> doap:homepage <http://foo.bar>
```
in Turtle (note the angle brackets rather than quotes, indicating we are talking about an RDF resource rather than a literal that happens to look like a URI). Can't be hard, can it?

New in TOPCAT: If it senses that a service understands common table expressions, it will inform you accordingly on its ADQL cheat sheet.

Oh, and then I'd like to add an impression from the Apps/Ops session late on Wednesday, where I simply have to hand out the tasteful-application-of-standards award to Mark Taylor. In his news from TOPCAT report he described how based on whether or not the capabilities of a TAP service say its ADQL supports CTEs (“WITH”) he changes his cheat sheet to show or hide the optional with clause as shown in the figure above.

Sure: That's a real small detail. But sometimes it's small details like this that make the difference between folks puzzling how to do a seemingly simple thing (as I am still on the resourcification of element content in RDFa) and them realising there is an elegant solution to what they're trying to do.
2023-05-13 11:00

The Interop ended yesterday morning, and now I'm returning home with about a metric ton of homework. Which is probably a good thing.

One piece of homework I got from Robert Nikutta (NOIRLab) who blasted a piece of text I wrote when I was chairing the Registry WG: Getting into the Registry (this may already have improved by the time you read this). Here's Robert's slide on it:

Now, I think I have to put up the defense that this was basically the abstract and there are more explanations further down the page, for instance on the “purx” that confused Robert so much[2]. More importantly, though: If you don't understand some VO documentation, it is rather likely that you are not the only one. You will not only help yourself but all these other people if you complain, ideally with suggestions on how to improve or perhaps concrete questions.

If it is not otherwise clear just who to complain to, use the mailing list of a working or interest group that sounds as if it might be responsible. I can't promise you we will improve matters, but knowing about a problem makes it a lot more likely someone will address it.

In Robert's concrete issue of a simple and straightforward OAI-PMH component, on the other hand, documentation is not enough. At least as long as I cannot convince the rest of the world that collaborating on DaCHS [3] is a much smarter move than everyone developing their own server software, there really should be such a thing, and I think I've charmed some of the self-implementors into collaborating in such an effort.

Traditionally, the last talk of an Interop is reserved for the chair of the Exec (the bosses of the national VO projects). They then reveal who the Exec has chosen as the future chairs and vice-chairs of the working and interest groups. I will not pretend that I was surprised: I will be vice chair of the solar system interest group in the next few years. And I already have a first project that came up during one of the many, many, many coffee break discussions of this Interop: finally start collecting planetary reference frames for the vocabulary of references frames. What a nice bridge from semantics to solar system!

[1] No, having to carry around and plug in and out some additional hardware is only marginally less annoying than the digit-copying 2FA schemes.

[2] I will give you that my predilection for cute names is not always helpful, though.

[3] DaCHS of course has an OAI-PMH interface built in, but that is so highly integrated with its metadata management and XML generation that pulling it out just is not worth it.

Category: Meetings