Saturday, October 12, 2013

My Concerns with XBRL

Over the past two or so years, I have been toying around, on and off, with a technology called XBRL. The following is the definition of XBRL from its Wikipedia page:

“XBRL (eXtensible Business Reporting Language) is a freely available and global standard for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based and uses the XML syntax and related XML technologies such as XML Schema, XLink, XPath, and Namespaces. One use of XBRL is to define and exchange financial information, such as a financial statement. The XBRL Specification is developed and published by XBRL International, Inc. (XII).”

XBRL is slowly gaining adoption among more and more public companies for mandated financial reporting. As a value investor, I am interested in analyzing public companies by their fundamentals and financial statements. As a professional software developer, I loved the idea of automating this analysis, and XBRL seemed like the perfect tool for gathering information that was already available. After several significant efforts on my part to understand and use XBRL via XML libraries and my own custom programming, my excitement quickly turned to disappointment. While I admittedly don’t have a wealth of experience in financial reporting or the use of XBRL, I can say with certainty as a software developer that XBRL is not easy to jump into and start programming with.

I slowly became convinced that XBRL was impossible to use on a large scale in its current form, without significant resources and development effort on my part. My disappointment only deepened when I found that a small start-up had already done what I set out to do: Gather XBRL into a database and expose a query interface for that data via the web. The website to which I am referring is www.calcbench.com. They seem to have built a very powerful system for ingesting, processing and presenting XBRL data, for which I give them credit, given my understanding of how complicated XBRL can be. Calcbench even admits to the complexity of XBRL themselves: “Priceless data being created, BUT extremely complicated to decipher.”


Not only is XBRL complicated, it also seems to be fraught with incorrect data. Calcbench claims to: “Use advanced computing techniques to identify and correct errors (close to half a million corrections made so far)”. It is somewhat worrying to me that such a large number of errors exist in the official financial data of public companies. There is no clue as to what the total sample size was for that number, but in any case, the accuracy of XBRL data is likely nowhere near 100%. This is not to discredit Calcbench; I am simply illustrating one of the issues with XBRL itself and its surrounding infrastructure.

Now that you have a summary of some of the deficiencies of XBRL, I will go into more detail regarding my experience in trying to use the standard. First of all, the Securities and Exchange Commission in the United States has mandated the use of XBRL in public company financial reporting. You can view their main XBRL web page at http://xbrl.sec.gov. To allow scripts, programs or human beings access to the full XBRL data store, the SEC has set up an FTP server that allows anonymous login. See http://www.sec.gov/edgar/searchedgar/ftpusers.htm for details. As far as I know, this is the only official source of XBRL data in the U.S. Due to that fact, the server is often slow, and downloading data (especially in bulk) can be very time consuming or sometimes not possible at all. Without reliable data availability, it is difficult for any software to ingest an initial data set in bulk.

The XBRL files that are stored on the SEC FTP server are not quite what a software developer would expect. Rather than an XML file containing only XBRL, files from the SEC seem to contain XBRL files within them, but are surrounded by additional SEC-specific data. For example, the typical format of a file may look something like the following:

<SEC-DOCUMENT>
     <SEC-HEADER>
        ...
        </SEC-HEADER>
        <DOCUMENT>
        ...
        </DOCUMENT>
        <DOCUMENT>
                <XBRL>
               (XBRL file)
               </XBRL>
        </DOCUMENT>
        ...
</SEC-DOCUMENT>


This format isn’t especially complicated, and wouldn’t be a concern other than the fact that these documents don’t even seem to constitute valid XML at times, let alone valid XBRL. Error correction is already necessary, despite not even having attempted to process XBRL data yet. When XBRL files are being extracted from this format, the type of each XBRL document must be detected and each file likely needs to be written out separately for better ease of use.

My concerns so far have only been minor concerns. One of my biggest concerns is that XBRL is too extensible, and that definitions of reporting standards are not coherent. One could say that a sort of meta-standard is needed, in order to regulate how standards are defined. For example, the US-GAAP and IFRS reporting standards are different in many respects, as you might expect. But if both are to use XBRL to define their standards, they should have commonalities when it comes to definitions and processing. In my view, this does not seem to be the case. Each uses XBRL in a different manner to mandate how reporting data should be formatted. On top of a variety of reporting standards being built on top of an overly-extensible data format standard, each XBRL reporting software suite interprets these reporting standards differently, resulting in a multiplicative number of XBRL interpretations and dialects. This places a large burden on software that is meant to ingest XBRL data, as that software must support a multitude of XBRL dialects and somehow convert each dialect to a common format.

For example, look at the following definitions of a “context” that defines a reporting period, as defined by four different XBRL software suites:

<!-- Generated by iC(tm) - CompSci Interactive Converter - http://www.compsciresources.com -->
<context id="S000002337Member_C000006127Member">
    <entity>
        <identifier scheme="http://www.sec.gov/CIK">0001002427</identifier>
        <segment>
            <xbrldi:explicitMember dimension="dei:LegalEntityAxis">ck0001002427:S000002337Member</xbrldi:explicitMember>
            <xbrldi:explicitMember dimension="rr:ProspectusShareClassAxis">ck0001002427:C000006127Member</xbrldi:explicitMember>
        </segment>
    </entity>
    <period>
        <startDate>2013-09-17</startDate>
        <endDate>2013-09-17</endDate>
    </period>
</context>

<!-- Generated by DataTracks version 1.0.7 on 10-Oct-2013 [11:18:47] {AM} EST - www.datatracks.com -->
<xbrli:context id="P04_01_2011To06_30_2011">
    <xbrli:entity>
        <xbrli:identifier scheme="http://www.sec.gov/CIK">0001301991</xbrli:identifier>
    </xbrli:entity>
    <xbrli:period>
        <xbrli:startDate>2011-04-01</xbrli:startDate>
        <xbrli:endDate>2011-06-30</xbrli:endDate>
    </xbrli:period>
</xbrli:context>

<!-- Produced by EDGARsuite software, Advanced Computer Innovations, Inc., Copyright (C) 2008-2013 [PPXCRNPC214X]. www.edgarsuite.com -->
<context id='D130101_130630'>
    <entity>
        <identifier scheme='http://www.sec.gov/CIK'>0001487997</identifier>
    </entity>
    <period>
        <startDate>2013-01-01</startDate>
        <endDate>2013-06-30</endDate>
    </period>
</context>

<!-- RR Donnelley Xcelerate Instance Document, based on XBRL 2.1  http://www.rrdonnelley.com/ -->
<context id="eol_PE1343----1310-Q0004_STD_92_20130831_0_926437x923140_929865x932116">
    <entity>
        <identifier scheme="http://www.sec.gov/CIK">0000090896</identifier>
        <segment>
            <xbrldi:explicitMember dimension="us-gaap:PropertyPlantAndEquipmentByTypeAxis">us-gaap:BuildingAndBuildingImprovementsMember</xbrldi:explicitMember>
            <xbrldi:explicitMember dimension="us-gaap:RangeAxis">us-gaap:MaximumMember</xbrldi:explicitMember>
        </segment>
    </entity>
    <period>
        <startDate>2013-06-01</startDate>
        <endDate>2013-08-31</endDate>
    </period>
</context>

In each example, namespaces may or may not be used, XBRL Dimensions may or may not be used, and the context id fields follow wildly different formats between XBRL tools, some of which are indecipherable. These illustrated differences just scratch the surface of how different each XBRL tool output can be.

Upon realizing the difference between tools, I went in search of information, to see if anyone else shared my opinions. I was surprised to find that the “Father of XBRL” himself had written a whitepaper in 2008 on the growing problem of the emergence of different XBRL dialects.

From the whitepaper: “XBRLS: how a simpler XBRL can make a better XBRL”, by Raynier A. van Egmond and Charles Hoffman:
 “Additionally, there is an ever-increasing number of “dialects” of XBRL being created. For example, the IFRS-GP, COREP, FINREP, US GAAP, and the FDIC taxonomy each has a different architecture.” 
 “Because of the missing solution components, each implementation of XBRL creates the missing components of the overall solution themselves, and usually in a different way. This causes the differences among dialects of XBRL.” 
 “XBRL runs the risk of fragmenting into a number of different dialects. While they will likely remain interoperable – because they are all structured information, but just structured differently – it will come at a high cost of consulting and development fees to achieve this interoperability in order to convert one dialect or architecture to another.”


The above opinions nicely summarize my own concerns with XBRL.  From the “Business Reporting Logical Model” category on Charles Hoffman’s blog:

“I have been pushing the idea of using a logical model above XBRL's syntax and application profiles to make using XBRL easier for business users.”


This blog post was written in 2010 (3 years ago, as of this writing), which is 2 years after the initial whitepaper. I hope that progress has been made toward the goal of building a logical model on top of XBRL, but I find it difficult to ascertain the status of the XBRL industry, and the evidence I’ve shown from 2013 seems to indicate that this is not the case.

After reading a summary on each XBRL tool mentioned above, I’ve found that each tool claims to be easy to use, without any knowledge of XBRL. This is most likely true for those that need to use XBRL for reporting purposes. As long as each tool uses some dialect of XBRL and reports XBRL filings to the SEC, no other requirements are enforced. This situation results in no consideration for those that need to write systems to ingest all these different formats of XBRL, to be able to actually use the reported data.

Regarding the reporting standard definitions themselves, I feel like XML concepts are not being applied properly. For example, here are some XML tags defined in the US-GAAP XSD:

<ScheduleOfDeferredCompensationArrangementWithIndividualExcludingShareBasedPaymentsAndPostretirementBenefitsByTitleOfIndividualAndByTypeOfDeferredCompensationTable>

<QualitativeAndQuantitativeInformationAssetsOrLiabilitiesForTransferorsContinuingInvolvementInSecuritizationOrAssetbackedFinancingArrangementNotPreviouslyRequiredFinancialSupportProvided>

<CertainLoansAcquiredInTransferAccountedForAsAvailableForSaleDebtSecuritiesAcquiredDuringPeriodNotAccountedForUsingIncomeRecognitionModelAtAcquisitionAtCarryingValue>

In some cases, would it not make more sense to apply the typical hierarchical model commonly found in nearly every markup language? For example, instead of these US-GAAP tags:

<ClosedBlockOperationsChangeInPolicyholderBenefitsAndInterestCreditedToPolicyholderAccountBalances>

<ClosedBlockInvestmentsAvailableForSaleChangeInUnrealizedAppreciation>

<ClosedBlockLiabilitiesFuturePolicyBenefitsAndPolicyholderAccountBalances>

Could one not instead use a much more readable and practical hierarchy such as:

<ClosedBlock>
    <Operations>
        <AccountBalance owner="PolicyHolder" type="Benefits" period="Delta">
            <InterestCredited>1000000</InterestCredited>
        </AccountBalance>
    </Operations>
    <Investments availability="ForSale">
        <UnrealizedAppreciation period="Delta">500000</UnrealizedAppreciation>
    </Investments>
    <Liabilities>
        <AccountBalance owner="PolicyHolder" type="Benefits" period="Future">
            <Balance>2000000</Balance>
        </AccountBalance>
    </Liabilities>
</ClosedBlock>

The above is by no means my suggestion of a real standard (since I’m not an accountant), but rather to illustrate that proper XML hierarchy could be applied in nearly any fashion and still be an improvement over the flat, overly-descriptive tag names, as shown above.

As you can imagine, the complexity of XBRL does not allow best practices to be intuitive or obvious to users of XBRL. There exists an XBRL Best Practices Committee, which publishes monthly updates to the most recent best practices guidelines for using XBRL. The latest best practices can be found here: http://xbrl.us/research/Documents/Resolutions-BPC.pdf. If you open the document, you will find (as of this writing) that the best practices document is 118 pages long. To me, this is a sign that a problem exists with the ease of use of XBRL, because of the need for such a long document on best practices.

In reading various XBRL websites over the past few years, I’ve noticed that some software development conferences or competitions have been held, in order to generate interest in the concept and prototype some workable software to encourage XBRL adoption. Calcbench was the winner of one such challenge: http://www.prnewswire.com/news-releases/xbrl-us-announces-grand-prize-winner-of-xbrl-challenge-developer-contest-140919833.html.

While it is good that there seems to be some interest in developing processing-side XBRL software (as opposed to reporting-side software used by reporting companies), I expected that after over 10 years of development, XBRL would be more widespread and well known. I can’t speak for other software developers that have attempted or succeeded in developing processing-side XBRL software, but I can say that for me personally, my motivation to use the standard could not overcome the standard’s complexity, incoherency, and difficulty of use. This could be one of the reasons that XBRL is less well known than I would expect, even though it is now mandatory in the United States.

When I gave some thought to what I think XBRL should be, I came up with a few ideas. In my opinion, first and foremost, XBRL should be more standardized than it currently is. Reporting standards should be standardized, XBRL extensions should be approved by a standards oversight body of some kind, and more commonly-used strategies and practices should be readily available out-of-the-box, as Charles Hoffman alludes to in his whitepaper. Each tool that claims to conform to XBRL standards should have to pass some sort of certification, to ensure that truly standardized output is produced by the tool. Reported XBRL data should be verifiable, and could possibly incorporate some sort of public key cryptography for authenticity purposes. Lastly, it would be highly useful to standardize on some sort of object-relational mapping to be used for converting XBRL data in XML form into a standardized database format, such as SQL. For more information on XML and ORM, see http://docs.jboss.org/hibernate/core/3.3/reference/en/html/xml.html and/or https://code.google.com/p/joxm/.

In conclusion, I feel that I’ve nicely summarized my concerns with XBRL. It is my opinion that the XBRL standards in their current form are not in a “production-ready” state that would be used for mandatory filings. I feel that XBRL needs further work and consolidation, and using it in production systems before that happens will only cause further pain down the road. There are data modelling practices that I feel could still be incorporated into XBRL to improve the deficiencies that I’ve mentioned in this blog post. I may write more in-depth blog posts in the future on the topic of XBRL, and I hope that the future is bright for digital financial reporting. XBRL is a powerful concept, and I hope that it will continually improve, to help reporters, processors, and developers of XBRL and its associated software.

Monday, October 7, 2013

Rules

If you haven't read this blog's "About" page, I encourage you to do so as a good starting point. The "About" page explains how I arrived at where I am today, in terms of blogging and lessons learned. Now that I've described (and you've hopefully read) how I got here, I'll explain a little about who I am and how I intend to write blog posts on this blog in the future.

My name is Mike, I hold a Bachelor of Computer Science degree with High Honours, and I am currently employed full-time as a software developer. As of October 2013, I have close to 4 years of experience in the software development industry. While this isn't much compared to some people, I feel that it will help to keep my career path in mind while reading the posts on this blog. It provides context in the sense that I'm relatively young and new to the industry, but have enough experience to not make a complete fool of myself with naive notions. My focus so far in the industry has been on C and the Microsoft Windows operating system. I also have some experience with C++, enterprise-level Java, and Python development. I've touched various other languages and technologies that aren't worth enumerating here, but may come up in future blog posts.

As I mentioned on the "About" page, I also have experience in value investing and the stock market. While this interest is more of a hobby and has never been supported by formal education, I look forward to including investing and finance topics in my blog posts on technology.

With my background out of the way, I want to state some rules for posts on this blog. These rules are more for myself, to be used as a self-check to make sure I'm sticking to my goals. I'll list them in point form below:
  • Blog posts must be coherent with a defined beginning, middle and end (rather than short posts with no insight or value).
  • Blog posts must be researched and contain citations to sources, where appropriate.
  • Blog posts must have a position or statement of opinion, if applicable.

I may add to this list in the future. I realize that there are a number of caveats that allow for wiggle room ("where appropriate" and "if applicable"), but the rules should hold in most cases.

Now that I'm ready to start writing blog posts, I wish myself luck and I hope that you find something worth spending your valuable time reading.