Essential Complexity: 2010

Monday, December 27, 2010

The Decline of Trust: Follow-up to Andrew Maguire post

Earlier this year I blogged about an alleged case of silver manipulation as raised by Andrew Maguire to the CFTC. If you haven't already, I urge you to read that post before continuing on with this post.

I ended my post with a challenge to any serious journalist to properly investigate Andrew Maguire's case. So far I have yet to see anything resembling a real investigation undertaken by the media establishment. While I cannot say for sure why this is given how much of a juicy story the Andrew Maguire case appears to be. I am beginning to see an interesting pattern emerge that may partly explain why this is. I must say that I'm a bit surprised by my own findings, but they are enough for me to draw conclusions.

For a period of time the only newspaper that reported the Andrew Maguire story was the New York Post. You can read the first report here which was first published in March (shortly after Maguire blew the whistle). A follow-up report can be found here. However, a third story was published shortly after I posted my last blog entry which in hindsight is most interesting. You can read it here. What's interesting about the last post is that one week following its original publication, a correction was issue which basically nullified the entire story. Namely, JPMorgan responded by saying there was no investigation to begin with. The correction is posted at the bottom of the story. The NYPost never countered.

Fast forward to October 8th of this year. Reuters reported that CFTC commissioner Bart Chilton had publicly announced that the CFTC was investigating silver manipulation. Oddly, the statement wasn't from the CFTC itself. Rather Chilton felt the public deserved some answers, speaking on his own accord. You can read that story here.

By the end of October, Reuters published a follow-up story reporting that the CFTC was in particular investigating JPMorgan and HSBC for silver manipulation. You can read that story here. This would appear to directly contradict the statements made to the NYPost back in May, that they were NOT being investigated for silver manipulation. JPMorgan declined to comment on the matter.

A day later, Reuters reported the first of at least five class action suits lawsuit filed by investors targeting JPMorgan and HSBC. Here is the article, and here is a scanned in copy of the court filing as filed in the Southern District of New York. About a week later, a second class action lawsuit was filed. You can read the court filing here. A week after that, a third law firm posted a press release announcing they were investigating silver manipulation by JPMorgan and HSBC on behalf of their clients. As a follow-up, Reuters posted another article providing historical context to these cases. The article begins with the obligatory mention of the Hunt brothers, some mention of "gold bugs" and "conspiracies", but no mention of Andrew Maguire himself.

To be fair to the media establishment, on November 3rd Andrew Maguire was mentioned by name in this Wall Street Journal article describing a fourth lawsuit against JMorgan and HSBC. You can read that article here. Apart from that brief mention in the WSJ, most newspapers either leave out mentioning Andrew Maguire by name, or they describe a "London based Metals Trader".

For those that are curious about how the manipulation works, a fifth class action lawsuit was just launched on December 28th, which describes in detail HOW the manipulation works and how JPMorgan and HSBC profited from it. It also describes WHY they were in a unique position to do so.

Let's pause for a moment and ask some questions. Why is it that only the NYPost originally reported on the story? Why didn't NYPost continue to follow the story they claim to have broken after it developed? Why doesn't Reuters mention the name of Andrew Maguire? Why doesn't the WSJ and Reuters print or mention the original e-mail transcript of the correspondences by Andrew Maguire to the CFTC? Is this just a one-off oversight?

Perhaps.

But around the same time Bart Chilton's announcement was made, another interesting and related story came to light. Namely, The Washington Post reported that one of the two CFTC judges (Judge Painter) had announced his retirement and that he had issued an Order requesting that all seven of his open cases NOT be transferred to the other CFTC judge, Judge Levine. I have quoted the key paragraph of the one page order (which you can read here) below so you can see for yourself:

There are two administrative law judges at the Commodity Futures Trading Commission: myself and the Honorable Bruce Levine. On Judge Levine's first week on the job, nearly twenty years ago, he came into my office and stated that he had promised Wendy Gramm, then Chairwoman of the Commission, that we would never rule in a complainant's favor. A review of his rulings will confirm that he has fulfilled his vow. Judge Levine, in the cynical guise of enforcing the rules, forces pro se complainants to run a hostile procedural gauntlet until they lose hope, and either withdraw their complainant or settle for a pittance, regardless of the merits of the case. See Michael Schroeder, If You've Got a Beef With a Futures Broker, This Judge Isn't for You - In Eights [sic] Years at the CFTC, Levine Has Never Ruled in Favor of an Investor, Wall St. J., Dec. 13, 2000, at A1 (copy attached).

In the last sentence of the previous paragraph, Painter cites an article that appeared on the front page of the Wall Street Journal back in 2000. You can read that article following Painter's Order here. It is a well researched investigation into the conduct of Judge Levine and sure enough does raise some important and relevant questions. It's not entirely one sided, but it does put Levine on the defensive. You would not be unreasonable to think the Wall Street Journal would publish a follow-up article vindicating their original investigation. Right?

Wrong. The Wall Street Journal has taken an entirely different and rather underhanded tack. Namely, Sarah Lynch penned an article which is a blatant smear on Judge Painter, describing him as a mentally ill alcoholic who would sleep at work and who was a failure in his private life. Sandwiched in the middle of the article we find a very brief mention of Painter's Order against Levine, which is immediately followed by "Judge Levine declined to comment, but a former colleague defended Judge Levine's record and said he is fair." The article continues to smear Painter to the very end quoting a doctor that diagnosed him with "cognitive impairment, alcoholism and depression". The tone and content is more in line with something you might read in The National Enquirer or US Weekly. The article seems highly uncharacteristic and rather beneath a newspaper like the Wall Street Journal with such a distinguished history.

Reading Lynch's WSJ article after reading Schroeder's original investigative article and the recent Washington Post article left me feeling stunned. The only thing I will say in the article's defense is that it does help to explain why Painter kept this to himself for so long. I suspect his enemies were holding a grenade over his head, and clearly they've pulled the pin.

I am not the only person who noticed this miscarriage of journalism. Barry Ritholtz was equally disgusted and contacted Sarah Lynch for an explanation. Lynch response was summarized as:

Lynch wrote back to note she did a story on the judge last Friday, but it ran on newswires but was not picked up by WSJ. (Reporters have no control over those editorial decision). The current article is a follow up to that prior piece.

This would all suggest that Lynch's article was taken out of context and something immoral is going on within the higher ranks of the WSJ with respect to reporting around the CFTC and the silver manipulation story.

******

I recently went to see the movie "Inside Job". The film documents the reasons that led up to the 2008 economic meltdown, the impact it has had so far, and why nothing has substantially changed in terms of policy and regulation. What surprised me was how interconnected the corruption is. There is no grand conspiracy of the sort you might think is required to fake the moon landing, or plan 9/11 as an inside job. Rather, the conspiracy is of incentives and motives which readers of Freakonomics will quickly recognize.

The movie describes the interconnectedness between the big investment banks (e.g. Goldman Sachs, JP Morgan, Morgan Stanley, etc.), the Federal Reserve, the government executive branch (both Bush and Obama), the government regulators (i.e. SEC and CFTC), the rating agencies (i.e. Moodys, S&P, and Fitch), and even the prestigious business schools (e.g. Harvard, Columbia, Wharton, etc.). On that last one, I have to admit I never realized that so many business professors were on the payrolls of these banks. Actually, they're usually not on a direct payroll, but rather work as consultants through intermediaries like Analysis Group. Inside Job argues that influence into academia is so pervasive that most of these school's curriculum has morphed to reflect the will of the banks, and tends to be very negative in view of regulation.

In light of this I would argue then that the media establishment, which has become highly consolidated is also susceptible to the same conspiracy of motives that has adversely influenced some of the world's most respected economic professors. In this blog post I have attempted to present evidence supporting my theory.

Newspapers like the Wall Street Journal are generally regarded as pillars of the fourth estate. I no longer believe the WSJ should be trusted and I'm sure others feel the same way. Since 2008, trust in governments, banks, rating agencies, business schools, and the media has been in steady decline. This is not good for any society.

I don't believe there is any silver bullet to resolve this. However, by continuing to shine as much light on media biases I believe it is possible for us through Citizen Journalism and other forms of grass roots reporting to get out of these dark days, hold the media to account, and in turn hold the government, banks, and corporations to account. Looks like an uphill battle, but I'm an optimist.

Monday, August 23, 2010

Twelve Reasons to use an ETL tool / Experience Report on ETL tools: Pentaho DI, SSIS, and PowerShell

Over the past couple months I've been jumping between three different ETL tools (well PowerShell is not exactly an ETL tool but has some overlapping functionality). The experience has given me new perspective on the strengths and weaknesses of each tool. I hope to share with you my experiences, opinions, and recommendations.

This report is not going to be as structured, or cover as many tools as something you might find from a Gartner report. My approach here is to get into specific experiences with the tools as well as discuss why I think ETL tools are important to begin with. My perspective is not intended to be a definitive decision making tool, but rather a useful component to the decision making process when choosing an ETL tool.

Before getting into my experiences I'll give you a brief overview of what ETL is, and how these tools fit into the ETL landscape.

ETL stands for "Extract Transform Load". ETL tools first started showing up in the mid nineties as a response to the growing demand of data warehouses. The first major ETL tool was Informatica, and continues to be one of the best ETL tools available on the market today. The main reason ETL tools were invented was that it has traditionally been time consuming and error prone to extract data from multiple source systems, merge those data together, and load into a data warehouse for reporting and analytics.

To this day, the majority of techies in IT are unaware of what an ETL tool is. However those same people are often tasked with solving the very problems ETL tools are designed to solve. More experienced developers will achieve their goals through a combination of shell scripts and SQL scripts. Less experienced developers will simply stay in their "comfort zone" and use whatever programming language they happen to be the most familiar with, with little regard for software maintenance and support.

The task itself of writing a computer program to extract data, merge it (or perform some other transformation), and loading those data into a target data warehouse is conceptually very simple for the average developer. However, it is not until you start running into the following problems do you realize that the "follow your nose" approach doesn't work so well. Here are some of the issues you will likely run into when maintaining ETL jobs for a data warehouse (I've run into them all). In other words, here is what an ETL tool is designed for:

Code readability: This is probably the biggest difference between an ETL tool and pretty much every other programming language out there. It is driven off of a visual metaphor. When I first saw an ETL job, it reminded me of one of those crazy Rube-Goldberg machines. However, once you acquaint yourself with the iconography, and what all the connecting lines mean, it becomes very easy to look at an ETL job and understand what it is doing, and how it works. Unfortunately, many developers scoff at this visual approach, after all "real coders write real code". I also wonder if many developers are afraid of developing ETL jobs as it lays their code bare for all to see. Managers on the other hand love this visual approach, as it allows them to scrutinize and partially understand how code is working. Furthermore, an ETL tool will visually show your data coursing through it from one step to the next, much like observing water flow through tributaries into rivers into lakes. When a full load takes several hours to complete, it is very reassuring to see what is actually going on with the data.
Data Element mapping: Much of what you're doing in data warehousing involves mapping a source data element to a target schema. This is tedious work and it's potentially error prone - especially if you're lining up an INSERT statement with a SELECT statement. ETL tools make it very easy and safe to line up and map data elements in a fraction of the time you would normally take.
Impact Analysis: Agility is often seen as a function of management. However, if it takes you a month to figure out what has being impacted as a result of a minor schema change (e.g. adding a column to a table), then you're going to get stuck in your tracks before you have the chance to yell "Scrum!". ETL tools make it easy to identify where data is being sourced from, and where it is being targeted. Some ETL tools will even produce impact reports for you. Keep in mind that this only one aspect of impact analysis, but every little bit helps.
Incremental loading: By default most developers will develop code to perform a full load on a data warehouse. This code is not only easy to develop, but is fairly reliable. Unfortunately, full loads can take a lot of time to complete, and as time goes on will take longer and longer. Switching to an incremental load is tricky since it involves figuring out what data is new, changed, or has been removed. Depending on your source system, your options may be limited. Nevertheless, ETL tools provide functionality to assist with different scenarios
Parallelism: Writing concurrent software is not for the "faint of heart". There is an inherent overhead required to manage semaphores and other forms of inter-process communication. Furthermore, the threat of deadlocks or livelocks in complex systems can often only be discovered through trial-and-error. There are modern frameworks which greatly simplify parallelism in languages like Java or Python. However it is hard to argue against the simplicity of the ETL approach when developing parallel data processing applications. Furthermore, some ETL tools (e.g. IBM DataStage Enterprise Edition) can even parallelize JOINs. This makes it possible to outperform an in-place JOIN within an RDBMS like Oracle (most SQL queries still run single threaded).
Checkpointing and job recovery: Imagine you have just developed a script to perform a large data warehouse load. Let's say you run the script on a weekly basis, and the load takes 8 hours to run (you run it over night). One morning you arrive to find the script has failed 80% into the job. Some of the tables have been updated, but there are a few that haven't been updated. You can run the job from the beginning, but this is problematic since you need to back out all the data you have already inserted (otherwise you'll get duplicate data). Alternatively, you can pinpoint the point of failure and run from there. However, this requires a code change, which means you will need to test your modified code before getting it into production (and let's not forget the psychological pressure when people are waiting on fresh data). ETL tools make it relatively easy to checkpoint your code, and resume from a failed step. Some ETL tools have built-in checkpointing which means you don't have to even instrument your code for checkpointing. Furthermore, if code changes are required [due to a failed job], they are more often a change in configuration, rather than a real change in code, so it is generally safe to make a configuration change and resume the job with less risk than traditional scripting. That said, not all ETL tools support checkpointing. However, even ETL tools which don't explicitly support checkpointing can be more easily instrumented to checkpoint than traditional scripting or programming languages.
Logging and monitoring: I've already touched upon this point with respect to monitoring. To reiterate, ETL tools allow you to see (seeing is believing) data flowing from one step to the next. It's powerful data visualization that is not unlike a floor plan or map visualization. Namely, it can communicate far more information than a bar chart or line chart ever will. As for logging, ETL tools by default centralize logs and implicitly logs what you need to log (with configurable views of granularity), without having to instrument your code.
Centralized Error Handling: Scripting languages don't provide centralized error handling (this has more to do with the OS and legacy applications), so you're forced to always explicitly check for errors and call the appropriate error handling routine. ETL tools often include generic error handling routines which are invoked regardless of where the error originated from or how it is raised.
Slowly Changing Dimensions: Although I've listed this as #9, perhaps this should be #1 from a data quality perspective. If you don't implement a Type II (or higher) SCD policy, you're precious data warehousing investment will most certainly degrade over time. Slowly Changing Dimensions are crucially important, as they allow you to show a consistent view of history. I could go on at greater length as to what an SCD is, but if you're not sure just read the Wikipedia article. At any rate, ETL tools have built-in SCD steps which allow you to define the technical key, the natural key, the effective begin and end date fields, and in some cases a version field and "current version" flag field. Without an ETL tool SCDs are a major pain in the butt to manage, and developers will often not implement a Type II SCD policy (often because they aren't aware of what an SCD policy is), but also due to the added effort. Don't forget the classic cop-out: "Hey, if the business analyst doesn't ask for it, I'm not going to build it". SCDs are one of those classic business requirements that few people realize they need until it's too late.
Pivoting. It is increasingly popular to see data being stored and logged in what are known as Entity Attribute Value data models. I generally prefer to see explicit data models over generalized data models (which effectively treat the database as "bit bucket"). But they're here to stay, and in some cases actually make sense. However, it is virtually impossible to analyze EAV data without pivoting it back into columns. This is another task which is well served through an ETL tool, and which can be very cumbersome if you intend to accomplish through traditional scripting. In short, an ETL tool allows you specify the grouping ID (essentially the primary key for the row), the pivot key column, and the various pivot values you wish to map to columns. Furthermore, if you need to extract data out of some kind of BLOB or something like an XML or JSON document, this too is easily achieved through standard ETL steps. If you were to do this through scripting, you would probably have to rely on temporary tables and plenty of code. In other words, just like with Slowly Changing Dimensions, there are fewer Function Points to worry about with an ETL tool
In-memory batch manipulation: Often when manipulating data sets in bulk, many developers will rely on temporary tables, and from there perform batch operations against those temporary tables. While this works, and is sometimes even necessary, ETL tools allow you to more easily process large batches of data as part of a continuous in-memory pipeline of records. It is this notion of a "data pipeline" which also distinguishes ETL from other programming paradigms, such as scripting approaches (even those scripting languages that support piping operations.
Distributed transaction control: While it is possible to create distributed transactions using a transaction coordinator, often developers who eschew ETL tools will also avoid using other high level tools, such as a distributed transaction co-ordinator. Some ETL tools include built-in integration with distributed transaction co-ordinators. However, in my own experience I have never had the need for two-phase-commit in an ETL tool. I suppose if I needed to ensure that my data warehouse was always 100% consistent, even during load times, then this makes sense. However, I would urge caution when using transaction managers, since large transactions can easily lead to locking, not to mention they perform horribly under full load scenarios, and a single COMMIT statement can take hours (depending on how you've configured indexes).

Before I get into my current experiences with PowerShell, SSIS, and Pentaho DI (PDI), I'd like to point out that there is also something called ELT (Extract Load Transform). ELT differs from ETL in that it tends to rely on the target DBMS to perform the transformations. ELT tools achieve this through code generation. Sometimes they are referred to as "code generating ETL" versus most other ETL which distinguishes itself with the moniker "engine-based ETL". ELT works like this:

Extract data from source databases(s) using standard connectivity
Load data in bulk into target database
Generate and submit SQL DML statements within target database to achieve desired data transformations

Popular ELT tools include the open source TalenD and Oracle Data Integration (formerly Synoptic). The strength of this approach is that the target DBMS can be leveraged to achieve superior performance. Furthermore, it is possible to override the code generation and hand-tune the generated code.

If you target a robust DBMS (e.g. Oracle), the results can be impressive. However, since you are relying on the target DBMS, your options are more limited. Yet this is not the main reason I am wary of the ELT approach. My worry with ELT tools is that they can easily break the visual metaphor by relying too much on hand generated code. Furthermore, during the transformation stages it is impossible to see what is going on (I prefer maximum transparency during long running transformations). Over time, your ELT jobs can devolve into something resembling scripting.

I should point out that when I first started developing transformations and jobs using Ascential DataStage (now an IBM product), I found that most data was being read from and written to the same Oracle database, where I was working. At my workplace I was quickly able to run circles around most ETL developers using bread-and-butter SQL queries. Indeed, Oracle has one of the best (possibly THE best) query optimizer out there, and few of my peers understood how to take advantage of DataStage's parallel extender, and were probably fairly average ETL developers. But since that time I have come full circle and am a firm believer in the visual metaphor. Yes, I may sacrifice some performance, but at the time I was doing this I didn't have the perspective that I'm giving you in the aforementioned 12 reasons to use an ETL tool.

PowerShell

Getting back to my recent experiences, I'll start with PowerShell. To be clear, PowerShell is NOT an ETL tool. It's a shell scripting tool. When I first discovered it (as part of MS SQL Server 2008) I was very impressed with what I saw. There was a time many years ago when I was a proficient UNIX shell script programmer. As any UNIX buff knows, mastery of the Bourne/Korn or C Shell (and all the standard command line tools) is the fastest route to becoming a UNIX power user. Occassionally I'll still fire up Cygwin (I used to use MKS toolkit) and will pipe stuff through sed or awk. Vi is still my favourite text editor.

I was therefore very eager to embrace PowerShell as a potential ETL disruptor (one should always be on the lookout for disruptive technologies). There's a lot to love about PowerShell. As much as I will diss Microsoft for its monopolistic practices, they are one of the few true innovators of the back-office. PowerShell is no exception. Let me quickly point out some of the more powerful features of PowerShell:

Objects can be piped: Piping in UNIX is a way of life, but it can also be tedious as you're forced to serialize and deserialize streams of text, in order to achieve the desired result. PowerShell allows you to pipe .NET objects from one shell app to the next. For example, it's possible to pipe a result set, or directory listing as an object. This comes in handy quite a bit. For example, it is very easy to retrieve a particular set of columns from a query's result set, without having to sift through and parse a whole bunch of text
Snap-ins can be used to move between shell contexts: Currently the only snap-ins I've used are for the OS shell, and the MS SQL Server shell. The OS shell is the default shell, and listing "entities" will basically just list files and directories as you would normally expect. But while in SQLServer mode, the shell lists databases and tables as though they were directories and files. While most DBAs are accustomed to running database commands using a query shell window, PowerShell goes far beyond what is easily possible using a basic SQL (or T-SQL) query interface.
Parsing files is easy: Going back to the object pipes, it is also very easy to pipe the contents of a file, and parse those contents as a file object. PowerShell has also reduced the number of steps required to open and read a file, making such operations second nature. Let's face it, processing text files is as common activity in shell processing as ever (e.g. you need to parse a log file for an error message), so it's very much a pleasure to use PowerShell for these tasks
Great documentation: It's not often I compliment software documentation, but I feel it should be acknowledged that Microsoft even rethought those dry UNIX "man" pages. Perhaps it's just one technical writer at MS making this difference. But whoever you are, hats off for having a good sense of humour and making the dry and technical a pleasure to read.

With all that said, I've decided that PowerShell in its current form (I used version 2.0) has some major drawbacks from an ETL perspective. First off, by definition it's a "scripting language". So by definition it will never follow the visual metaphor. Depending on your perspective this is may be a good thing. But I for one am a huge believer in code visualization. While it is possible to run a script and examine log output in real time, it's not easy to tell what is happening where when tasks are running in parallel, and what the relationship between those tasks are.

The second major issue I have with PowerShell is that it simply cannot perform the function of what an ETL tool can do. Originally I had coded my ETL tasks as T-SQL Stored Procedures, which were basically wrappers for INSERT/SELECT statements. Unfortunately under high volumes, these queries caused database locking to occur. Apart from using TEMP tables, my only other option was to perform my JOINs within an ETL tool. This all but makes PowerShell a complete showstopper from a data warehousing perspective when compared to an ETL tool. Nevertheless, I did for a while attempt to call my individual ETL transformations within PowerShell, and instead treat PowerShell as a job controller.

The third issue with PowerShell is that I couldn't find a simple way to log everything. Depending on the type of error being raised I had to capture it in any one of three ways. Furthermore, I basically had only one easy option for where to put the log: into a text file.

Overall I am very impressed with PowerShell from the point of view that Microsoft has elevated scripting to the next level. But as an ETL job controller, it simply cannot do as good a job as an ETL tool like Pentaho DI or SSIS.

SSIS

At first I wasn't overly impressed with SSIS. It struck me as overly architected, and too reliant on scripting steps, thus undermining the all important visual metaphor. I have come to believe that it's architecture is more elegant than meets the eye, but I still believe it could use a heck of a lot more built-in steps. For example, the lack of an INSERT/UPDATE step is a glaring omission. I also couldn't find any built-in step for hashing (which comes in very handy when comparing multiple columns for changes) or an explicit DELETE or UPDATE step (you must accomplish DELETEs and UPDATEs using an OLE DB transformation step). SSIS also lacks a Slowly Changing Dimension step. Instead, SSIS provides you with a wizard which will generate multiple steps for you out of existing components. I don't agree with Microsoft's approach here since it makes SCDs less configurable, and more reliant on moving parts. Contrast this with Pentaho DI which includes all of these steps, as well as supporting a single configurable Slowly Changing Dimension step. That said, it is possible to add in third party steps, or simply code the step using the built-in scripting components. However, the problem with third-party add-ins is that they complicate deployments. And the problem with over reliance on scripting steps is that they can't be explicitly configured, rather logic must be coded. As I keep mentioning, scripting steps break with the visual metaphor, so you have to either read the step's label or open it up to know what it means (ETL iconography makes it easily possible to see what is going on in a single glance).

Continuing my gripes with SSIS, there is some confusion as to what steps to use when it comes to database connectivity. Namely, SSIS 2008 offers both OLE DB and ADO.NET connectors. I'm currently using OLE DB, but should I be using ADO.NET now? I'm not too concerned about that dilemma. What drives me bananas is that they also include a SQLServer destination step. It turns out that this step only works properly if you're running your SSIS server on the same server as your SQL Server. For myself this is maddening, and I'm not sure why Microsoft even bothered including this step - I couldn't discern any difference in performance from the OLE DB destination step.

Okay, that's the bad news. The good news is that SSIS has some pretty nifty features making it worthwhile, especially if you're executing on a Microsoft strategy. I'll list out the strong points of SSIS, and what sets it apart from Pentaho:

Checkpointing and transaction management: This is probably my favourite feature of SSIS, when compared to Pentaho DI. Checkpointing is more or less what you would expect. If a job fails part way through, SSIS records (in a local XML file) the point of failure, as well as all of the job variable values. This allows an operator to investigate the cause of failure (typically by examining logs), take corrective action, and then restart the job from point of failure. In the world of ETL, this scenario is not uncommon. On top of checkpointing, SSIS also makes it possible to contain multiple steps as part of a single transaction, so that if any step within the transaction fails, previous steps are rolled back. This is accomplished through tight integration with Microsoft's Distributed Transaction Controller - a component of the Windows operating system. I have experimented with this feature, and have got it to work successfully, although at first it was less than straightforward to configure. That said, I've never had a need for this feature. Also, this transaction management (from what I can tell) only works at the job level, and not at the transformation level, which is where it's more needed. One last thing I should point out about checkpointing: Don't confuse job checkpointing with incremental loading. Incremental loading (as opposed full refreshes) requires the ETL developer to identify when data has been INSERTed, UPDATEd, or DELETEd since the last time the job was run, which requires upfront analysis and cannot be addressed solely by a tool.
Consolidated logging, and ability to log to Windows Event Log. All ETL tools produce a single log. Most ETL tools allow for varying levels of verbosity, all from a single setting. SSIS goes one step further and nicely integrates with the Windows Event Viewer which if you're a Microsoft shop is HUGE benefit. As I mentioned earlier, logging was a sore spot for PowerShell. So the difference between SSIS and PowerShell is day-and-night.
Consolidated error handling. Typically when a job fails (for any reason), you want your ETL job to fire off an e-mail to technical support. SSIS makes this very easy to do, as there is a single consolidated Error Handling job which you can develop for. Both PowerShell and Pentaho DI appear to lack this feature.
C# and VB.NET scripting steps: Most ETL tools include a scripting step. This makes it possible to perform arbitrary transformations, as well as doing other tasks like adding or removing rows from the pipeline, or even adding and removing columns. Pentaho DI relies on either Java or JavaScript for its main scripting step. Personally, I'm fine with just the JavaScript step in Pentaho. SSIS allows you to script in either VB.NET or C# (C# is only supported in SSIS 2008 or later) which if you're already a .NET developer is a major benefit. Furthermore, it's significantly easier for that same code to me made into a standalone data flow step component (which gets you back to the visual metaphor). My only warning with SSIS's scripting step, is that I can easily see developers overly relying on it to perform the majority of their transformations. As I keep saying: the visual metaphor is the most powerful concept behind ETL, and dramatically lowers post-implementation costs, such as support costs, and the cost of impact analysis.
Persistent look-up cache: This is new with SSIS 2008. This feature allows you to physically store data in a local binary file optimized for keyed look-ups. I've only gone so far as to test that this thing works, and can be used in a look-up step. I'm not exactly sure what the primary business driver is behind this feature. I would be nice if Microsoft would elaborate on what the recommend use cases are, and just as importantly, where NOT to use this feature. My concern is that developers may use this feature in bulk look-up situations, where a sort/merge/join would perform just as well if not better. Compare this to, Pentaho DI offers "stream lookup" step. Can't say which approach is better though.
Excellent SQLServer integration: This goes without saying. My only gripe here is the dedicated SQLServer target step which appears only to work if the job is running on the same server as the SQLServer DB. I found out the hard way that it won't work otherwise. While I could go on about some of the subtle benefits of the SQLServer integration (e.g. you can easily configure batch INSERTs into cluster indexed tables to perform very well), the biggest advantage is the same advantage you get whenever you homogenize technology under a single vendor. Namely, by using SSIS and SQLServer together, if something goes wrong Microsoft can't blame some other vendor. Nor can they say "well, it's really not designed to work for _that_ database)." These games that vendors will play with you is the main reason why IT departments prefer homogeneous architectures. On the downside, there is also very tight MS Excel integration. As of this writing though, this integration doesn't work when running SSIS in 64-bit mode. Maybe with Office 10 this will be fixed now, since Office 10 is the first version of Office to ship a 64-bit version.
Dynamic step configurations using expressions: SSIS includes a declarative expression language which allows you to dynamically configure most settings. The most obvious use of expressions is to configure the database connection string(s), based on a variable value (which is set at run-time). At a higher level, expressions make it much easier to develop generic packages/jobs which share the same code base, but have different source data stores and target data marts. I feel Pentaho DI is lacking in this department.
Built-in text analytics steps. This is one of the first things I noticed about SSIS. Basically there are two Data Flow (i.e. transformation) steps which support basic text analytics. The first is a keyword extractor. Namely, the step can be configured to extract relevant keywords from documents. These keywords can then be used to classify documents, making them easier to search against and report on. The second text analytic step allows you to search a given set of text against a list of keywords retrieved from a configurable database query. This step is very useful for doing things like sentiment analysis (i.e. determining if a block of text includes words like "cool" or "fun" versus words like "sucks" or "fail"), but it could also be used to flag sensitive information, such as people's names. While, Pentaho DI does not include any built-in steps like these, it does have a regular expression step (which SSIS does not explicitly include), but if I had to choose, I prefer SSIS's text analytics functionality.

So as you can see SSIS has some very powerful features. If you're already a Microsoft shop and already use MS SQLServer, and have a pool of developers and application support personnel already trained on Microsoft's technology, I doubt you'll find a better SSIS tool to meet your requirements. Even if you're not using SQLServer but are doing most of your development on a Microsoft stack, it's probably a very good fit. However, if you're on non-Microsoft platform, there are many other options you should to consider.

Pentaho Data Integration (PDI)

Which brings me to Pentaho DI. One of the most platform neutral ETL tools out there. I am an unabashed fan of Pentaho, partly I'll admit, because I have quite a bit of experience working with it. Before I get into what it is I like and dislike about Pentaho DI I'll explain how it's different from other ETL tools at a very high level.

Pentaho DI is an "Open Source" ETL tool, based on the Kettle project. However, unlike Linux which has many different flavours and supporting vendors, there is only one vendor, Pentaho, that supports the tool. This is not unlike MySQL which was only being supported through SUN and now Oracle. I'm fine with that.

Just to digress for a moment, years ago I read "The Cathedral and the Bazaar" which is basically a manifesto for open source development. It argues that people will self select projects to work on, and this will drive the future of software development. I don't agree with the overarching thesis, but there are merits to this argument. The problem with volunteer based projects is that people tend to volunteer for selfish reasons. In the case of Linux, it is seen by many as a way of combating Microsoft's perceived tyranny. Shared adversaries are after all the strongest bonds between humans (I heard that from an anthropologist, and believe it to be true). When it comes to other forms of crowdsourcing, people often derive enjoyment from contributing. For example, I've contributed to Open Street Map adding in streets and landmarks in my neighbourhood. If I had more time on my hands I'd probably be doing more of it. It's kind of fun. I can also see how millions of other people might find Open Street Map fun. And let's not forget how popular Wikipedia is. But the problem is that most of the work people get paid to do is not necessarily fun - even in the software world. Some things need to get done, even if nobody wants to do them. The other problem with open source projects, is that until they reach a critical mass, competing projects can pop up, which not only dilutes the overall pool of talent but also reduces the likelihood of any one project succeeding. For example, if there were many different open source operating systems at the time Linux began to emerge, all competing with each other, would we have Linux today? I'm not so sure. My point in all this is that I support companies like Pentaho backing open source projects, even if that means more corporate control over the project. The important thing is that the entire source code is code is available, and anyone can contribute to that source code. This therefore puts Pentaho in the position of being a _service_ company, which is really the future of software any way you look at it.

Okay, moving on to Pentaho DI itself. First, the bad news. As with any piece of software, there are bugs. Now, if you have a support contract with Pentaho, this is never a problem as they will provide you with a patch (and you will receive intermediate patch releases). It's also possible to install those patches yourself (or fix the bug yourself), but that involves building the application from source files. That said, all of the bugs I've encountered can be worked around. And to be fair, I haven't found any bugs yet in the latest version (version 4.0).

Now on to the good news. Here is what I love about Pentaho DI:

Plenty of built-in steps: Although I've had to resort to the JavaScript scripting step on more than one occasion (version 4.0 now includes a Java scripting step), most transformations can be entirely built with their built-in steps.
Web ETL steps built-in: PDI comes with several web extraction steps, include GET and POST lookups, WSDL lookups, and a step to check if a web service is even available. There is also a step for decoding XML documents. For JSON documents, I just use the JavaScript step (it's not too hard to find JavaScript code to decode JSON documents). I mention all of these steps because I believe web ETL is a big part of the future of ETL. This is especially true now with so many data services out there. Contrast this to SSIS, which is rather weak in this department.
Single Type II Slowly Changing Dimension step: Unlike SSIS (which doesn't have a single SCD step, but rather uses a wizard to generate multiple steps), PDI includes a single Type II SCD step. I've used it quite a bit, and it works very well and is simple and straightforward to configure.
Normalized database repository: The other ETL tools I have used (SSIS and DataStage) will allow you to save jobs in the form of a file, or within a database. However, I find those files to be relatively opaque and difficult to query (I suppose with the right tools it would be possible to query them). PDI on the other hand stores all of its Jobs and Transformations in a normalized database. This makes it possible to query your ETL code using standard SQL queries. I have taken advantage of this by writing queries to produce simple impact analysis reports, showing which tables are impacted by which jobs. Since impact analysis is a huge aspect of data Enterprise Architecture, this is a real benefit. Furthermore, because PDI jobs are stored within a centralized RDBMS database, I find that it is much easier to deploy and troubleshoot jobs in production than it is with SSIS. To be fair, SSIS does allow packages to be deployed to SQLServer, but their really just deployed as an opaque "BLOB" which can't be easily queried.
Java based (runs on any platform): There was a time where Java applications were not considered appropriate for applications requiring high performance, such as an ETL tool. Well, to my surprise, Java has caught up pretty closely to binary native compiled code. As such, the benefits of "write once, run anywhere" are starting to come into focus. Indeed, I have run PDI on both Windows and Linux machines without incident. I also can't tell the difference between the performance of SSIS versus PDI
Clustering/Cloud deployment: I honestly don't have any hands-on experience deploying PDI as part of a cluster, or within a cloud. Although I have heard some impressive benchmarks, with reports of over 200,000 records processed per second within a single transformation. Because it is possible to cluster PDI servers, this means it is possible to scale up individual ETL transformations across multiple server nodes. Furthermore, Pentaho is now beginning to offer cloud based solutions. This is also something I have no hands-on experience with, but I like what I'm hearing. Contrast this with SSIS, which is not "cluster aware" and must be installed as "standalone" in Windows Cluster. To be fair, I believe part of the reason for SSIS's shortcoming is that it is more transaction aware than PDI, and with that comes certain challenges. I also think that it is possible to run SSIS as part of a cluster, but Microsoft does not recommend this.

Well, that's my experience report. Hope you learned something. Feel free to e-mail me or comment any corrections. It's quite possible I've made an incorrect assumption, so if any PowerShell, SSIS or PDI developers are reading this, don't hesitate to chime in. I'll happily update this blog if I agree there is a problem with something I've written.

Monday, July 19, 2010

The Tooth Fairy, Easter Bunny, and the Long Form Census Privacy Breech

"What do you think of the long form census" I ask inquisitively, adding "You know, the census form, which one out of every five persons needs to fill out. In order to provide information about stuff like ethnicity, religion, and income".

"I don't want to give out information to the government if I don't have to." I'm sternly told.

"Why not?" I ask politely. "It will never get released." I elaborate: "I've worked quite a bit with the Canadian census and with StatCan. I know from dealing with them that it is impossible for them to release any private information." I qualify my argument further "StatCan has strict privacy regulations preventing them from releasing identifiable information of either an individual or a business. It's the law." I go on "they have a 'rule of 3' whereby information must be aggregated to at least 3 parties. Furthermore, they also have a 'round to 5 rule', whereby all respondent counts are rounded to the nearest 5". I natter on some more "there has never been a privacy breech, even for a terrorism related investigation. The chief statistician reports directly to the prime minister of Canada. The OECD ranks StatCan as the best national statistics body in the world, and their workers take great pride in all of this, and see privacy protection as their 'Prime Directive' - I know these guys."

No change. "I don't care. You never know what they'll do with this information. That's just what they tell you."

Hmmm, I had to dig deeper. "But they 'the big bad government' [not StatCan] already know how many bedrooms you have, where you live, what you buy, who you're friends are. If you believe they are corrupt, they can access bank records, your Internet history, your telephone calls, your property records. Even I can bring up Google Street Maps and take a look at your house. I can browse your Facebook profile [which Facebook surreptitiously has opened up unbeknownst to many]. They can check your vehicle registration, check your credit rating, your insurance premiums, your credit and debit card purchases. All they need to do is send a bunch of faxes and make a few phone calls. They have the authority already. You're too late." Feeling a bit frustrated, I become more forceful in my argument "The only way you can get around them is by living like The Unibomber Ted Kaczynski. You would have to abandon your home and all your possessions and start again. You'd have to drive up to northern Ontario in the cover of night, build a log cabin, and live there for the rest of your life, living off the land - even making your own soap. And there's no guarantee that will work."

It's no use, his mind is made up. No counterargument, just "I don't want to give out my personal information to the government."

I give up on this conversation and change the subject. I can see it's going nowhere. Thank goodness for blogs...

The problem is, without a census, it's really difficult to make educated policy decisions. The census was specifically designed to be analyzed in aggregate. It was not and is not intended to be used as a repository of personal information. Although detail records are required in order to generate accurate pictures at a higher level, and these analyses almost always are based around geography.

As you may well be aware, our Prime Minister, Stephen Harper has single-handedly dismantled one of the most essential and proud pieces of Canadian infrastructure: the Canadian Census Long Form. As such, he is hobbling all levels of government, including provincial and municipal, not to mention the thousands of businesses that depend on this information to make informed decisions like where to open their storefront. To make matters worse, he is greatly hindering foreign investors from investing in Canada. The biggest obstacle to making any kind of investment is basic information. Canada has traditionally been a very safe place for companies to invest, in large part due to the educated work force and middle class consumers. However, if you don't know where that workforce lives, or where those consumers are, it's a lot tougher to justify those multimillion dollar investments.

But the reason why Harper has decided to drop the Long Form census is that he knows there are a lot of people who have unfounded and irrational fears of the Census Long Form. Many of those people are immigrants who come from tyrannical governments who do abuse their power (although ironically these governments also don't need the census to locate and persecute).

Stephen Harper understand this hysteria all too well, and is using it bolster support for himself and his government, facts be damned. And in selling this hatchet job, he is using every dirty trick in the book. Last week the line was "the government doesn't need to know how many bedrooms you have in your house" - a clear attempt conflate the census with "the government should stay out of your bedroom", which was a common tag-line that comes up when sexual liberties are being discussed. Today it was "why does the government need to infringe on people's privacy?", a line that Glenn Beck or Bill O'Reilly may attempt to deliver with a straight face.

There is something also very "meta" about all this. Harper is effectively abandoning Evidence Based Decision making in order to... have everyone else abandon Evidence Based Decision making. It is shameless fear mongering, and it is bad for everyone, including those who don't like filling out forms.

On that note, I would like to conclude with a plea to reach out and help people understand why the census is so important, and why privacy fears are unfounded. A good start is to keep this conversation going. This should be our prime ministers job. Instead he would rather us be fearful of the bogeyman and hide under our beds.

Pathetic.

Friday, May 07, 2010

The Curious Case of Andrew Maguire / My challenge to serious journalists

In August 2008 economic world history was made, as a major world record was smashed. You probably never heard this news - even if you read the business section of a major newspaper, or work in the finance industry. Instead, what you may have heard was the consequence of this news. Namely: gold and silver prices hitting a major slump.

But behind those price slumps was the biggest and most concentrated naked short position ever held for any commodity. To give you some numbers, world silver production is 680 millon oz annually. The short position of 3 (or fewer) US banks increased 11 fold from July to August, with a naked short position of 169 million oz. That's 25% of all annual silver production! Gold followed a similar pattern and the same 3 banks increased their short position in August to account for 11% of all annual gold prodution. For those that aren't familiar with "short selling", it's the practice of selling something NOW (that you have borrowed), and paying for it in the future. So, if the price goes down you make a profit. If the price goes up you get "squeezed" and have to take a loss.

If you'd like to see the numbers for yourself, they're all available in the official Bank Participation Report for July and August. While it is possible that the banks themselves were not ordering the short-selling (but rather possibly clients), it is very unlikely that all short-sellers just happened to go to the same bank. It should be noted that this short-selling of silver caused the price of silver to halve from $20 down to $10.

So why would these banks be betting so heavily against silver and gold? Only the banks themselves know for sure, and they aren't saying much. Nevertheless, things worked out well for them and as a result of dumping so much bullion on the market at once, in effect they triggered a larger sell-off (in part due to stop loss orders) which created a prophecy that fulfilled itself. These banks more than likely generated huge profits.

Not surprisingly many people (especially "Gold Bugs") cried foul and claimed there was market manipulation. Independent Toronto based investor Harvey Organ submitted a written statement to the CFTC, which he accompanied with oral testimony at their March 25th hearing. You can see the video here (skip to 4:40:00 into the webcast). Silver analyst Ted Butler also wrote a piece on this event referring to it as The Smoking Gun.

Of course this doesn't actually prove there was any manipulation going on. A common retort to the conspiracy theorists was "If there was a conspiracy, surely somebody would spill the beans". Well, as if a Sasquatch were to walk out of the forest and into broad daylight, so arrived Andrew Maguire, a London based metals trader who mainly trades in the silver market. Maguire was privy to a very small network of metals traders (ostensibly with JP Morgan Chase at the helm) who would co-ordinate "takedowns" in the sliver market through the aforementioned "naked short selling" of huge concentrated quantities of silver contracts. Whether this constitutes long term or short term manipulation is certainly up for debate. But what Mr. Maguire's revelations appear to show (if they are true) is that manipulation is happening on a fairly regular basis, and at the great expense of honest investors. It should be pointed out that Maguire himself profited from these manipulations and is said to be a wealthy man.

For several months, Andrew Maguire provided evidence to the Commodities Future Trading Commission (this is the regulatory body [similar to the SEC] that regulates commodities trading, including gold and silver) that demonstrated how the manipulations were orchestrated, and how they played out. I have pasted the entire e-mail trail (which was later leaked to the Gold Anti-trust Action Committee) at the bottom of this blog. However, as you can hear in this audio interview, Maguire himself was prevented from attending the CFTC hearing on bank position limits on March 25th. Nevertheless, Adrian Douglas and Bill Murphy (of GATA) were able to attend, and were able to provide testimony on Maguire's behalf. You can see the footage on YouTube here.

But for me, what's most disturbing is that there has been no media coverage from any major newspaper or television network. And what's even weirder is that if you listen to the interview it's stated that interviews were lined up with Bloomberg and Reuters, but were both cancelled within 24 hours with no explanation. It is this fact alone that has prompted me to write this blog entry.

I have spent the last four weeks research this, and so far all signs point to market manipulation. In fact after a while it becomes hard to understand how the market could not be manipulated. To make matters worse, if you read the prospectuses on Gold and Silver ETFs (e.g. here is the GLD prospectus and SLV prospectus) your eyes will pop out at how complex and risky they appear to be. In fact just this week an excellent in depth investigation into these prospectuses was released which showed that there are major undisclosed conflicts of interest for JPMorgan (the custodian for SLV) and HSBC (the custodian for GLD). The report goes on to suggest that the ETF has been architected to effectively avoid any obligations around reporting these conflicts of interest. In some regards you could say if you wanted to design an investment product that avoids all forms of regulation, but which can be sold by trusted investment banks, you couldn't do a better job than SLV and GLD. What's also weird is that in Canada I can't even find the prospectus for IGT (the Canadian equivalent of GLD) on the Canadian iShares web site. And a search conducted just now on the Canadian ishares site for "IGT" returns nothing (but it was there in their web site 2 days ago, and a mention can still be found in the PDF). For a complete analysis of the SLV and GLD ETFs, I highly recommend you read the well researched report by Catherine Austin Fitts and Carolyn Betts.

One of the interesting items brought up in this report is the conflict of interest that the US Government itself has with respect to this manipulation. I quote the following paragraph:

The absence of these disclosures is particularly disturbing given the potential conflicts between the banks' responsibilities serving as custodians and trustee and their responsibilities and liabilities as members and shareholders of the Federal Reserve Bank of New York. The NY Fed serves as the depository for the US government and as agent for the Exchange Stabilization Fund on behalf of the U.S. Secretary of Treasury. The ESF allows the Secretary of the Treasury to deal in gold, foreign exchange, and other instruments of credit and securities. NY Fed member banks typically serve as agents of the NY Fed in providing services. In addition, JPMorgan Chase and HSBC maintain responsibilities as Primary Dealers of U.S. government securities. When called upon to defend the U.S. government’s interests in the bond market, or the U.S. dollar’s interest in the currency market, or to help prevent another financial meltdown, whose interests will be primary? Will it be the central bank and government with pressing national security interests or retail investors?

I was curious to see what if anything the US government has to say about the obvious decline in sliver prices due to that massive short sell in August of 2008. As such I referred to the US Geological Survey Annual Summary on silver for 2008 and 2009 (released in 2009 and 2010 reports, respectively). Given that they comment on changes on the average price of silver, I was curious to see what they would say about the precipitous drop in average price for 2009. Well, it's either complete ignorance or Orwellian (depending on your point of view), but the report states (contradicting its own reported numbers) that the price actually went up in 2009. And by up, they clearly mean down. Cue Seth Myers and Amy Poeler: "Really USGS! Really!!!"

To be fair, I have found the following rebuttal from Jeffrey Christian of The CPM Group (a consultancy which in part serves the bullion banks). While Mr. Christian's points are valid, they don't do much to put my mind at ease. Firstly, he mainly rebuts "red herring" arguments that aren't important, and doesn't bother to comment on the heart of Mr. Maguire's allegations or the activity in August 2008. Secondly, none of his explanations use much real data, instead relying on the abstract providing scant details of any names, numbers, or dates. I am skeptical of his sincerity in getting to the bottom of this problem. I think a truly dedicated consultant would say "Hey, I don't know for sure, but let's figure out how to get these facts". He doesn't go there.

So what's my point? Why am I blogging this?
The reason I am blogging this is that I simply cannot figure out why no serious journalist has investigated Andrew Maguire's story. I mean let's face it: Either this is all one big hoax (which is a story unto itself). Or this is part of a massive fraud involving rigged markets and major international players. This is the kind of story journalists dream of. What's going on here people!

So, before I post a copy of Andrew Maguire's e-mails to the CFTC, I put forth the following challenge to any and all serious journalists out there:

Obtain a copy of or confirm or deny the existence of conversations between Andrew Maguire and the CFTC
Obtain a comment from the CFTC on their interactions with Andrew Maguire
Obtain a comment from the five (5) Bullion banks on the massive short of silver for August 2008 clarifying whether they were behind the short or not, and why
Record an interview with Andrew Maguire to learn more about his relationship with the bullion banks.

And so I end this blog post with a copy of the allegedly leaked e-mails between Maguire and the CFTC. Enjoy!

----
From: Andrew Maguire
Sent: Tuesday, January 26, 2010 12:51 PM
To: Ramirez, Eliud [CFTC]
Cc: Chilton, Bart [CFTC]
Subject: Silver today

Dear Mr. Ramirez:

I thought you might be interested in looking into the silver trading today. It was a good example of how a single seller, when they hold such a concentrated position in the very small silver market, can instigate a selloff at will.

These events trade to a regular pattern and we see orchestrated selling occur 100% of the time at options expiry, contract rollover, non-farm payrolls (no matter if the news is bullish or bearish), and in a lesser way at the daily silver fix. I have attached a small presentation to illustrate some of these events. I have included gold, as the same traders to a lesser extent hold a controlling position there too.

Please ignore the last few slides as they were part of a training session I was holding for new traders.

I brought to your attention during our meeting how we traders look for the "signals" they (JPMorgan) send just prior to a big move. I saw the first signals early in Asia in thin volume. As traders we profited from this information but that is not the point as I do not like to operate in a rigged market and what is in reality a crime in progress.

As an example, if you look at the trades just before the pit open today you will see around 1,500 contracts sell all at once where the bids were tiny by comparison in the fives and tens. This has the immediate effect of gaining $2,500 per contract on the short positions against the long holders, who lost that in moments and likely were stopped out. Perhaps look for yourselves into who was behind the trades at that time and note that within that 10-minute period 2,800 contracts hit all the bids to overcome them. This is hardly how a normal trader gets the best price when selling a commodity. Note silver instigated a rapid move lower in both precious metals.

This kind of trading can occur only when a market is being controlled by a single trading entity.

I have a lot of captured data illustrating just about every price takedown since JPMorgan took over the Bear Stearns short silver position.

I am sure you are in a better position to look into the exact details.

It is my wish just to bring more information to your attention to assist you in putting a stop to this criminal activity.

Kind regards,
Andrew Maguire

* * *

From: Ramirez, Eliud [CFTC]
To: Andrew Maguire
Sent: Wednesday, January 27, 2010 4:04 PM
Subject: RE: Silver today

Mr. Maguire,

Thank you for this communication, and for taking the time to furnish the slides.

* * *

From: Andrew Maguire
To: Ramirez, Eliud [CFTC]
Cc: BChilton [CFTC]
Sent: Wednesday, February 03, 2010 3:18 PM
Subject: Re: Silver today

Dear Mr. Ramirez,

Thanks for your response.

Thought it may be helpful to your investigation if I gave you the heads up for a manipulative event signaled for Friday, 5th Feb. The non-farm payrolls number will be announced at 8.30 ET. There will be one of two scenarios occurring, and both will result in silver (and gold) being taken down with a wave of short selling designed to take out obvious support levels and trip stops below. While I will no doubt be able to profit from this upcoming trade, it is an example of just how easy it is to manipulate a market if a concentrated position is allowed by a very small group of traders.

I sent you a slide of a couple of past examples of just how this will play out.

Scenario 1. The news is bad (employment is worse). This will have a bullish effect on gold and silver as the U.S. dollar weakens and the precious metals draw bids, spiking them higher. This will be sold into within a very short time (1-5 mins) with thousands of new short contracts being added, overcoming any new bids and spiking the precious metals down hard, targeting key technical support levels.

Scenario 2. The news is good (employment is better than expected). This will result in a massive short position being instigated almost immediately with no move up. This will not initially be liquidation of long positions but will result in stops being triggered, again targeting key support levels.

Both scenarios will spell an attempt by the two main short holders to illegally drive the market down and reap very large profits. Locals such as myself will be "invited" on board, which will further add downward pressure.

The question I would expect you might ask is: Who is behind the sudden selling and is it the entity/entities holding a concentrated position? How is it possible for me to know what will occur days before it will happen?

Only if a market is manipulated could this possibly occur.

I would ask you watch the "market depth" live as this event occurs and tag who instigates the move. This would surly help you to pose questions to the parties involved.

This kind of "not-for-profit selling" will end badly and risks the integrity of the COMEX and OTC markets.

I am aware that physical buyers in large size are awaiting this event to scoop up as much "discounted" gold and silver as possible. These are sophisticated entities, mainly foreign, who know how to play the short sellers and turn this paper gold into real delivered physical.

Given that the OTC market (where a lot of the selling occurs) runs on a fractional reserve basis and is not backed up by 1-1 physical gold, this leveraged short selling, where ownership of each ounce of gold has multi claims, poses a very large risk.

I leave this with you, but if you need anything from me that might help you in your investigation I would be pleased to help.

Kind regards,
Andrew T. Maguire

* * *

From: Andrew Maguire
To: Ramirez, Eliud [CFTC]
Sent: Friday, February 05, 2010 2:11 PM
Subject: Fw: Silver today

If you get this in a timely manner, with silver at 15.330 post data, I would suggest you look at who is adding short contracts in the silver contract while gold still rises after NFP data. It is undoubtedly the concentrated short who has "walked silver down" since Wednesday, putting large blocks in the way of bids. This is clear manipulation as the long holders who have been liquidated are matched by new short selling as open interest is rising during the decline.

There should be no reason for this to be occurring other than controlling silver's rise. There is an intent to drive silver through the 15 level stops before buying them back after flushing out the long holders.

Regards,
Andrew

* * *

From: Andrew Maguire
To: Ramirez, Eliud [CFTC]
Cc: BChilton [CFTC]; GGensler [CFTC]
Sent: Friday, February 05, 2010 3:37 PM
Subject: Fw: Silver today

A final e-mail to confirm that the silver manipulation was a great success and played out EXACTLY to plan as predicted yesterday. How would this be possible if the silver market was not in the full control of the parties we discussed in our phone interview? I have honored my commitment not to publicize our discussions.

I hope you took note of how and who added the short sales (I certainly have a copy) and I am certain you will find it is the same concentrated shorts who have been in full control since JPM took over the Bear Stearns position.

It is common knowledge here in London among the metals traders that it is JPM's intent to flush out and cover as many shorts as possible prior to any discussion in March about position limits. I feel sorry for all those not in this loop. A serious amount of money was made and lost today and in my opinion as a result of the CFTC's allowing by your own definition an illegal concentrated and manipulative position to continue.

Bart, you made reference to it at the energy meeting. Even if the level is in dispute, what is not disputed is that it exists. Surely some discussions should have taken place between the parties by now. Obviously they feel they can act with impunity.

If I can compile the data, then the CFTC should be able to too.

I would think this is an embarrassment to you as regulators.

Hoping to get your acknowledgement.

Kind regards,
Andrew T. Maguire

* * *

From: Andrew Maguire
To: Ramirez, Eliud [CFTC]
Sent: Friday, February 05, 2010 7:47 PM
Subject: Fw: Silver today

Just logging off here in London. Final note.

Now that gold is undergoing short covering, please look at market depth right now in silver and evidence the large selling blocks in a thin market being put in the way of silver regaining the technical 15 level, which would cause a short covering rally and new longs being instigated. This is resulting in the gold-silver ratio being stretched to ridiculous levels.

I hope this day has given you an example of how silver is "managed" and gives you something more to work with.

If this was long manipulation in, say, the energy market, the shoe would be on the other foot, I suspect.

Have a good weekend.

Andrew

* * *

From: Andrew Maguire
Sent: Tuesday, February 09, 2010 8:24 AM
To: Ramirez, Eliud [CFTC]
Cc: Gensler, Gary; Chilton, Bart [CFTC]
Subject: Fw: Silver today

Dear Mr. Ramirez,

I hadn't received any acknowledgement from you regarding the series of e-mails sent by me last week warning you of the planned market manipulation that would occur in silver and gold a full two days prior to the non-farm payrolls data release.

My objective was to give you something in advance to watch, log, and follow up in your market manipulation investigation.

You will note that the huge footprints left by the two concentrated large shorts were obvious and easily identifiable. You have the data.

The signals I identified ahead of the intended short selling event were clear.

The "live" action I sent you 41 minutes after the trigger event predicting the next imminent move also played out within minutes and exactly as I outlined.

Surely you must at least be somewhat mystified that a market move could be forecast with such accuracy if it was free trading.

All you have to do is identify the large seller and if it is the concentrated short shown in the bank participation report, bring them to task for market manipulation.

I have honored my commitment to assist you and keep any information we discuss private,however if you are going to ignore my information I will deem that commitment to have expired.

All I ask is that you acknowledge receipt of my information. The rest I leave in your good hands.

Respectfully yours,

Andrew T. Maguire

* * *

From: Ramirez, Eliud
To: Andrew Maguire
Sent: Tuesday, February 09, 2010 1:29 PM
Subject: RE: Silver today

Good afternoon, Mr. Maguire,

I have received and reviewed your email communications. Thank you so very much for your observations.

Essential Complexity

Monday, December 27, 2010

The Decline of Trust: Follow-up to Andrew Maguire post

Monday, August 23, 2010

Twelve Reasons to use an ETL tool / Experience Report on ETL tools: Pentaho DI, SSIS, and PowerShell

Monday, July 19, 2010

The Tooth Fairy, Easter Bunny, and the Long Form Census Privacy Breech

Friday, May 07, 2010

The Curious Case of Andrew Maguire / My challenge to serious journalists

My Web Site

Click the icon below to add Essential Complexity to your RSS reader (like My Yahoo!)

Blog Archive

About Me