From Casetext: Smarter Legal Research

In re Telexfree, LLC

United States Bankruptcy Court, District of Massachusetts
Jun 22, 2021
No. 14-40987-MSH (Bankr. D. Mass. Jun. 22, 2021)

Opinion

14-40987-MSH 14-40988-MSH 14-40989-MSH Adversary Proceeding 16-04006-MSH 16-04007-MSH

06-22-2021

In re: TELEXFREE, LLC, et al. Reorganized Debtors v. BENJAMIN ARGUETA et al. Defendants STEPHEN B. DARR Plaintiff STEPHEN B. DARR Plaintiff v. PAOLA ZOLLO ALECCI et al. Defendants

Charles R. Bennett Jr., Esq. Andrew G. Lizotte, Esq. Murphy & King, Professional Corporation for the plaintiff Stephen Darr, as Chapter 11 Trustee of the Estates of TelexFree, LLC, TelexFree, Inc., and TelexFree Financial, Inc. Ilyas J. Rona, Esq. Michael J. Duran, Esq. Milligan Rona Duran & King, LLC for the class defendants


Charles R. Bennett Jr., Esq.

Andrew G. Lizotte, Esq.

Murphy & King, Professional Corporation

for the plaintiff Stephen Darr, as Chapter 11 Trustee of the Estates of TelexFree, LLC, TelexFree, Inc., and TelexFree Financial, Inc.

Ilyas J. Rona, Esq.

Michael J. Duran, Esq.

Milligan Rona Duran & King, LLC

for the class defendants

MEMORANDUM OF DECISION ON CLASS DEFENDANTS' MOTION TO EXCLUDE EXPERT WITNESS TESTIMONY OF TIMOTHY MARTIN

MELVIN S. HOFFMAN, U.S. BANKRUPTCY JUDGE.

Stephen B. Darr, the chapter 11 trustee of the estates of TelexFree, LLC, TelexFree, Inc., and TelexFree Financial, Inc., commenced these adversary proceedings to recover funds from the class of TelexFree participants who profited or were "net winners" in TelexFree's fraudulent schemes. As detailed below, this memorandum focuses on issues related to the trustee's ability to identify net winners and quantify their winnings. To assist him in accomplishing that task, the trustee retained a firm of professionals to develop a methodology for making net winner determinations. The professionals' work forms the basis for the claims asserted by the trustee in these adversary proceedings, and the trustee wishes to introduce that work as an expert opinion through the testimony of one of his professionals. The defendants retained their own expert, and based upon his evaluation, they seek to exclude the opinion of the trustee's expert as unreliable.

Unless otherwise noted, all references to "TelexFree" in this memorandum may refer to one or all of the debtors depending on the context. Distinguishing among specific debtors is unnecessary for my ruling.

I. Background

On April 13, 2014, TelexFree and its affiliates filed voluntary chapter 11 petitions in the United States Bankruptcy Court for the District of Nevada. Around the same time, the U.S. Securities and Exchange Commission initiated an action against TelexFree and others, asserting that TelexFree and its affiliates were engaged in an illegal Ponzi/pyramid scheme and in the fraudulent and unregistered offering of securities, and the U.S. Department of Homeland Security executed a search warrant, seizing TelexFree's computers and servers. Shortly after filing, the chapter 11 cases were transferred to this Court. Mr. Darr was appointed chapter 11 trustee to administer all three bankruptcy estates on June 6, 2014. TelexFree was later found to have been a hybrid pyramid and Ponzi scheme. The scheme began in 2012, and, operating from the United States, it ensnared participants domestically and worldwide. Millions are believed to have participated, and well over $1 billion was lost in the fraud.

References to "chapter 11" are to that chapter within the Bankruptcy Code, which is codified at 11 U.S.C. §§ 101-1532.

TelexFree purported to be a multi-level marketing company selling Voice over Internet Protocol (VoIP) subscriptions, which could be used to make relatively inexpensive international telephone calls via the internet. In reality, TelexFree's business was to recruit participants as members. Almost all TelexFree's revenue came from membership subscriptions that enabled participants to earn "credits" by selling VoIP subscriptions, by posting make-work internet advertisements, and especially by recruiting new participants who bought memberships. Credits were redeemable for cash, used to offset membership fees, and often transferred between participants.

Additional details of the TelexFree scheme are recounted in prior decisions of this and other courts. E.g., Darr v. Dos Santos (In re TelexFree, LLC), 941 F.3d 576, 579-80 (1st Cir. 2019); Darr v. Internal Revenue Serv. (In re TelexFree, LLC), 615 B.R. 362, 365-67 (Bankr. D. Mass. 2020).

Each time a participant purchased a VoIP plan or a membership, the participant created a user account. As discussed in more detail below, it was quite common for individual participants in TelexFree to have many user accounts. Unfortunately, TelexFree's record database did not directly link user accounts that belonged to the same participant. Determining the extent to which a given individual participant had paid in or received funds from the TelexFree scheme would require aggregating that participant's user accounts. Then, the transaction data associated with those user accounts could be combined to determine whether the participant had gained or lost in the end, that is, whether the participant was a "net winner" or a "net loser."

The trustee retained a team of professionals from Huron Consulting Group, LLC, led by Timothy Martin, to develop a methodology for determining net winners and net losers. More broadly, Huron's task was to develop a methodology for identifying who participated in the TelexFree scheme and the extent to which they gained or lost money. To do this, Mr. Martin and his team created an aggregation methodology for linking multiple user accounts to individual participants. The trustee has already used that methodology to assist him in the claims resolution process in the main chapter 11 cases by identifying net losers, who qualified as creditors of TelexFree.

The trustee originally retained Mesirow Financial Consulting, LLC, where Mr. Martin led the team working on the engagement. Mr. Martin and other team members later moved to Huron, and the trustee retained Huron to succeed Mesirow.

The trustee commenced these adversary proceedings as "reverse class actions" to seek to recover funds from the class of TelexFree net winners. He divided the proceedings into one against the class of alleged net winners who reside in the United States (Darr v. Argueta) and a second against the class of alleged net winners who reside abroad (Darr v. Alecci). For purposes of this memorandum, the adversary proceedings are considered together, as the issues in contention are identical. To recover from alleged net winners in these adversary proceedings, the trustee uses the same methodology that he used in the claims resolution process in the main cases. In doing so, the trustee relies upon the expert opinion of Mr. Martin as to how the methodology was developed and applied. The defendants have retained their own expert, Joshua Dennis of StoneTurn Group, LLP, to evaluate Mr. Martin's work.

Recognizing that the admissibility of Mr. Martin's expert opinion would play a decisive role in determining the course of these adversary proceedings, the parties jointly requested a limited discovery process followed by an evidentiary hearing on this issue. During a two-day evidentiary hearing, which was conducted remotely by video due to the Covid-19 pandemic, each party's expert testified and all expert reports were admitted into evidence. Subsequently, the defendants filed the motion that is now before me to exclude Mr. Martin's expert opinion. The trustee responded in opposition. This memorandum sets forth the reasoning for my ruling on the defendants' motion.

The parties' joint request noted potential disputes about "the admissibility of [Mr.] Martin's expert opinion and . . . whether [Mr.] Martin's expert opinion, in conjunction with the Ponzi Presumptions, establishes the Trustee's prima facie case shifting the burden of production to the individual Class Action Defendants." Joint Mot. for Sched. Order 6. The parties then asked that the Court determine admissibility first. See id. The parties appear to have agreed or they assume that if Mr. Martin's expert opinion were to be admitted, it would satisfy the trustee's prima facie case (in combination with the Ponzi scheme presumption, see DeGiacomo v. Palladino (In re Palladino), 556 B.R. 10, 13-14 (Bankr. D. Mass. 2016), rev'd on other grounds, 942 F.3d 55 (1st Cir. 2019)). For the purposes of this memorandum, I assume the same.

II. Legal Standard

The defendants' arguments focus on the reliability of Mr. Martin's expert opinion, raising questions about the validity and use of the data upon which Mr. Martin relied and about various assumptions and choices that Mr. Martin made throughout the process of developing his opinion. The parties agree that the admissibility of Mr. Martin's expert opinion is to be determined under the standards set forth in Federal Rule of Evidence 702.

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if: (a) the expert's scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue; (b) the testimony is based on sufficient facts or data; (c) the testimony is the product of reliable principles and methods; and (d) the expert has reliably applied the principles and methods to the facts of the case.
Fed. R. Evid. 702. The rule incorporates key Supreme Court decisions-including Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993) and Kumho Tire Co., Ltd. v. Carmichael, 526 U.S. 137 (1999)-and their progeny in which the trial judge's gatekeeping role as to expert testimony has been emphasized and clarified. Fed.R.Evid. 702 advisory committee's note to 2000 amendment. The rule applies to all expert testimony, requiring the same level of scrutiny regardless of the area of expertise being offered. See Kumho, 526 U.S. at 141, 146-52.

The party seeking to have the expert witness's opinion testimony admitted bears the burden of establishing by a preponderance of the evidence that the expert is sufficiently qualified, is offering a reliable opinion, and the opinion is relevant to understanding or determining a fact in question. See Fed. R. Evid. 702 advisory committee's note to 2000 amendment; Daubert, 509 U.S. at 592 n.10; Ruiz-Troche v. Pepsi Cola of P.R. Bottling Co., 161 F.3d 77, 81 (1st Cir. 1998). In considering reliability and relevance, the expert's methods, not the expert's conclusions, are "the central focus." Ruiz-Troche, 161 F.3d at 81. Because the two, however, "'are not entirely distinct, '" the trial judge may consider the adequacy of the information offered to support the expert's ultimate conclusion. Ruiz-Troche, 161 F.3d at 81 (quoting Gen. Elec. Co. v. Joiner, 522 U.S. 136, 146 (1997)). The point being that the trial judge should not and cannot simply take the expert's word for it, as there may be "too great an analytical gap between the data and the opinion offered." Joiner, 522 U.S. at 146; see also Kumho, 526 U.S. at 157 (noting that expert's assurances of accuracy in selection and application of methodology are insufficient alone); Ruiz-Troche, 161 F.3d at 81 (indicating need to guard against admitting expert opinion based on "guesswork" rather than having been properly grounded).

Because each case's circumstances will vary, even among cases necessitating seemingly similar expertise, the procedures applied and the factors considered in determining whether an expert's opinion meets Rule 702's criteria can also vary. See Kumho, 526 U.S. at 152-53; Fed.R.Evid. 702 advisory committee's note to 2000 amendment (collecting cases). The Daubert Court identified several factors that might be helpful in reviewing a scientific expert's opinion, including "the verifiability of the expert's theory or technique, the error rate inherent therein, whether the theory or technique has been published and/or subjected to peer review, and its level of acceptance within the scientific community." Ruiz-Troche, 161 F.3d at 80-81 (citing Daubert, 509 U.S. at 593-95). The Kumho Court noted that such factors could be relevant in non-scientific situations, as could untold other factors. Kumho, 526 U.S. at 141, 149-52. The objective of this intrinsically versatile inquiry "is to make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field." Id. at 152.

III. Findings and Analysis

Before the evidentiary hearing, the trustee filed a memorandum in support of the admissibility of Mr. Martin's expert opinion. During the evidentiary hearing, counsel presented brief opening arguments, and as noted, each party's expert testified. Mr. Martin submitted an original report, Mr. Dennis submitted a rebuttal report, and Mr. Martin submitted a reply report. The parties also referenced and displayed additional materials, including excerpts from and information considered in the experts' reports, documents providing other explanatory and background information, and filings from related proceedings. Afterward, defendants filed their motion to exclude Mr. Martin's expert opinion, which was followed by the trustee's opposition.

Identical documents have been filed and entered in each adversary proceeding. Because the docket numbers differ between the identical filings in each case, they are cited in this memorandum without reference to the adversary proceeding in which they were filed. The transcript for the two-day evidentiary hearing was filed as two volumes with continuous pagination, and that pagination has been used in citations to it, with the first volume being cited as "Martin Tr." and the second volume being cited as "Dennis Tr." Finally, near the end of the hearing each day, the parties held discussions on the record about the court's consideration of exhibits that had been referenced but as to which neither party had sought admission. Some items were noted to be court filings here and in the district court, and it was agreed that I could take judicial notice of such filings. Other items were a mixture of demonstrative exhibits and additional background information. I have reviewed those items but have not relied upon them to the extent that my conclusion in this matter would be different if those items had not been available.

a. The Expert Witnesses

The trustee's expert, Mr. Martin, has a background in financial advisory services and financial investigations as a forensic accountant. Martin Rpt. App. A. He has an undergraduate degree in accounting. Martin Rpt. App. A (providing curriculum vitae); Martin Tr. 20:15-17. He has not completed any post-graduate work, but during his more than 20-year career, he has obtained professional certifications, including as a fraud examiner, a turnaround professional, and an insolvency and restructuring advisor. Martin Tr. 20:20-23:18; Martin Rpt. App A. Mr. Martin described having significant experience serving as a professional in insolvency-related litigation, including multiple notable Ponzi scheme cases. Martin Tr. 25:13-28:7; see also Martin Rpt. App. A (providing examples). Mr. Martin has held positions at KPMG, LLP, and Arthur Andersen, LLP. Martin Tr. 24:1-20. He is currently a managing director at Huron and previously served in the same position at Mesirow Financial Consulting, LLC. Martin Rpt. ¶ 3; Martin 24:18-25:8. Mr. Martin has served as a financial advisor and accountant to the trustee throughout the administration of the TelexFree chapter 11 cases. Martin Rpt. ¶¶ 1, 4. As indicated, his tasks in that role included "develop[ing] a methodology to identify Net Winners and Net Losers," leading to his expert opinion that is now before me. See Martin Rpt. ¶ 2.

Mr. Martin attached three documents to his original report. In the table of contents, he designated them as "Addendum A," "Appendix A," and "Appendix B," and he refers to them as such throughout the report. The actual appendices, however, are labeled as "Exhibit A" and "Exhibit B." In this memorandum, I will refer to the attachments to Mr. Martin's report as they are described in its table of contents-Addendum A, Appendix A, and Appendix B.

In developing his expert opinion, Mr. Martin "was assisted by professional staff members who worked under [his] direction and supervision." Martin Rpt. ¶ 7. Mr. Martin did not further describe this team in his report, but he provided a brief overview during his testimony. The team initially included a manager, and as Mr. Martin "obtained more information about the engagement, [he] brought on more team members," including "a computer specialist" and "a database e-discovery specialist." Martin Tr. 29:18-30:7. The specific credentials of these team members were not discussed or otherwise provided.

Although Mr. Martin worked with a team, and has used "we," "Huron," and the passive voice in his reports and testimony, for ease of reference herein, I attribute all actions to Mr. Martin. I do likewise with Mr. Dennis.

The defendants' rebuttal expert, Mr. Dennis, has a background in financial and economic consulting services. Dennis Rpt. ¶ 1. He has an undergraduate degree in management and business, with a minor in computer science. Dennis Tr. 235:8-12; Dennis Rpt. Ex. 1 (providing curriculum vitae). He has also completed continuing education courses in computer science, including in Structured Query Language (SQL - often pronounced "sequel"), "which is a programming language . . . use[d] to query and manipulate large datasets." See Dennis Tr. 235:17-24. He is certified as a valuation analyst and has also received a certificate in intellectual property law. Dennis Tr. 236:3-6. Mr. Dennis described having significant experience in quantitative and complex data analysis, including in the areas of economic damages, valuation, and forensic accounting, among others, and his work has been used to assist in a broad range of litigation. Dennis Tr. 237:22-240:10; Dennis Rpt. ¶ 1, Ex. 1 (providing examples). Such work has "also frequently include[d] evaluations related to data sufficiency and integrity testing," as well as "integration of disparate datasets." Dennis Rpt. ¶ 2; see also Dennis Tr. 258:24-260:8, 262:8-263:22 (providing two recent examples involving complex data analysis requiring different data linkage methods). Mr. Dennis has completed this work during a more than 15-year career with StoneTurn, where he is currently a partner and a member of the data analytics group. Dennis Rpt. ¶ 1, Ex. 1; see Dennis Tr. 237:13-21, 238:3-15, 247:20-21. The defendants initially retained Mr. Dennis to complete a preliminary analysis of Mr. Martin's work, which then led to Mr. Dennis's preparation of his rebuttal report. Dennis Rpt. ¶¶ 19-21; Dennis Tr. 240:18-24, 241:11-14. Mr. Dennis was not tasked with redoing the work of Mr. Martin but rather with assessing that work. Dennis Tr. 367:13-18.

In the transcript of Mr. Dennis's testimony, his references to SQL have been transcribed as "sequel" because he used that pronunciation of the acronym.

Like Mr. Martin, Mr. Dennis was assisted by other "professionals acting under [his] direct supervision." See Dennis Rpt. ¶ 19 n.21. Mr. Dennis described two of those professionals. One was "a fellow member of the data analytics group" who has a "master's [degree] in data science" and whose assistance included helping with loading data tables into StoneTurn's SQL database system and reviewing Mr. Dennis's calculations and data queries. See Dennis Tr. 247:20-25. The other team member assisted Mr. Dennis with curating and formatting exhibits for his report. See Dennis Tr. 248:2-4.

Neither party claims that the other's expert is unqualified. I note, however, that although Mr. Martin's background includes experience in accounting, including forensic accounting, and in Ponzi scheme cases, it is not readily apparent that he has expert-level "knowledge, skill, experience, training, or education," see Fed. R. Evid. 702, in the field of complex data analysis, as would seem to be necessary for developing and implementing an aggregation methodology upon which the trustee could rely to identify net winners for the purposes of these adversary proceedings. In fact, when asked during his direct testimony about whether his prior work on Ponzi scheme cases had involved "multiple user accounts and the need to link those accounts," Mr. Martin conceded that he had not encountered such issues in any prior cases. Martin Tr. 117:2-11. Absent from the record is any indication of Mr. Martin's familiarity with, for example, a relevant database programming language, as needed to query a large dataset, such as the one described below, for the purpose of analysis. Nevertheless, given the lack of objection to Mr. Martin's qualifications, I find that Mr. Martin meets the minimum requirements to offer an opinion as an expert witness in this matter.

b. The Data

The parties do not dispute that Mr. Martin obtained and used an accurate electronic copy of TelexFree's electronic database containing participant activity records and that he shared accurate excerpts from that data with Mr. Dennis for his analysis. Neither party asserts that the copied data was damaged or altered or that it was improperly reconstructed relative to its original configuration on the equipment that the federal government confiscated from TelexFree.

According to Mr. Martin, "[t]he TelexFree database contains more than 100 tables with over a billion records," including records from Ympactus Comercial Ltda., a Brazilian affiliate that had an allegedly similar scheme. Martin Rpt. ¶¶ 13, 27, 35-37. Mr. Martin determined that four tables were useful for his purposes: the account, invoice, transfer, and bonus tables. See Martin Rpt. ¶¶ 22-27; Martin Tr. 44:12-45:16. Mr. Martin used the account table in his process of aggregating user accounts. He reported that the account table "[c]ontains one record for each User Account registered with TelexFree/Ympactus." Martin Rpt. ¶ 27. Each user account record, of which there are approximately 17 million, contains a number of information fields. See Martin Rpt. ¶ 35; Dennis Rpt. ¶¶ 33-34. One of the fields, the "rep_id" field, lists "a unique, system-generated serial number" for each user account. See Martin Rpt. ¶¶ 27, 34.

See note 30 below for how Mr. Martin treated the Ympactus records in his aggregation methodology.

The table names are originally in Portuguese, as are most field names within those tables. See Martin Rpt. ¶ 21; Martin Tr. 42:20-43:16. Mr. Martin provided English translations or interpretations of table and field names in his reports. For example, Mr. Martin has translated or interpreted the table named "Representante" as "Account."

The precise number of fields is not stated in Mr. Martin's report. Based upon fields discussed in Mr. Martin's and Mr. Dennis's reports, there were at least 17 fields. Mr. Martin testified that there were 40 fields but also expressed uncertainty as to that total. See Martin Tr. 69:5-18, 81:15. A March 9, 2017 analysis attributed to Huron, Mr. Martin's employer, describes the account table as containing 44 fields and lists those fields. TelexFree Analysis of Damages at 23-24, United States v. Merrill, No. 14-cr-40028-TSH-1 (D. Mass. Mar. 16, 2017), ECF No. 332-1.

Certain fields of information in the account table contain user-entered data. See Martin Rpt. ¶¶ 28-29, 52; Martin Tr. 71:17-19. Users entered this data through a website-based registration form when creating new user accounts. Martin Tr. 45:11-12. In certain situations, users entered no data, and the field is blank. Despite a purported prohibition against creating more than one user account, it was not uncommon for individual participants in the scheme to create multiple user accounts. Martin Rpt. ¶¶ 30, 31, 40, 50. As noted, because the individual participants who created multiple user accounts could have incurred gains and losses through activity associated with any of those accounts, each such participant's accounts would need to be aggregated for the purpose of determining that participant's overall gains and losses and thus that participant's status as a net winner or net loser.

Mr. Martin has provided very little detail about the website-based registration form, which is the apparent mechanism by which data flowed into the account table (i.e., users did not directly enter their information into fields in the table). Mr. Martin has not confirmed, for example, whether the webform was in English, whether users had the ability to choose other languages, and what the actual prompts stated in directing users to enter information. Mr. Martin has also not indicated the extent to which the labels on webform fields corresponded to the field names reflected in the account table, which were primarily in Portuguese. He briefly indicated that he believed that the webform asked for "Name" but was uncertain on that point. See Martin Tr. 162:1-163:4. Mr. Martin also has not specified which fields in the webform required users to provide input before proceeding, although he indicated in his testimony that some did so. See Martin Tr. 71:13-19. Further, Mr. Martin appears to have assumed that the data for each user account in the account table did not change over time, but he has not indicated if he confirmed whether any user-entered data could have been later updated by the participant and, if so, whether such updates could be reflected in the account table.

Complicating the ability to aggregate user accounts was the fact that individual participants appear to have created user accounts under multiple names. James Smith might have accounts in the names of "J. Smith," "Jim Smith," "Smith, J.," or even a nickname, "Smitty," or a pseudonym, "Superman." The possibilities were practically endless. Other user-entered data appears to have been similarly plagued with inconsistencies and errors. In short, Mr. Martin determined that no single item of data would allow him to sort the user accounts so as to identify every user account belonging to a TelexFree participant. See Martin Tr. 64:19-25, 65:7-68:1. Mr. Martin thus devised a complex system to try to accomplish the task. In doing so, he selected seven different fields of user-entered data from the account table to use as his universe of data: the name, email address, home phone, cell phone, physical address, login, and password fields. See Martin Rpt. ¶¶ 51, 55, Addendum A. Using various combinations of the data or portions of the data from the seven fields, he ultimately developed a 13-step data-matching process that searched for consistencies between user accounts in order to aggregate and assign them to specific participants. See Martin Rpt. ¶¶ 51-52, 54-55, Addendum A; Martin Tr. 56:9-13; 192:15-193:11. It is the development and use of this process that the defendants question.

The field names are as translated or interpreted by Mr. Martin. See Martin Rpt. ¶ 51, Addendum A 1. The field described as the password field contains an encrypted version of the user account's password, and although the encrypted version itself is system-generated, it would have been generated based upon user-entered data, at least when the account was created. See Martin Rpt. Addendum A 1, 4. In addition to the seven fields listed, Mr. Martin also included the system-generated rep_id field for the obvious purpose of keeping track, during his aggregation process, of each user account.

c. Starting Point Selection

As his "starting point," Mr. Martin chose the name field, and he used data from that field in every step of his aggregation process. See Martin Rpt. ¶ 55 (asserting that "[s]election of a single data field to serve as a starting point for the linkage process was necessary"), Addendum A 2. Mr. Martin did not, however, use the name field data alone in the first (or in any) step, and thus his designation of that field as his starting point is not entirely descriptive. It would be more apt to liken the data in the name field to a constant. Because Mr. Martin used the name field data in every step of the aggregation process, his grouping of user accounts relies more heavily upon the data in the name field than on any of the other six fields used in his process, which were not included in every step. See Martin Tr. 222:15-24.

In his report, Mr. Martin indicates, without explanation, that the name field (labeled as "rep_nome") "usually" contained a "full name." See Martin Rpt. ¶ 51, Addendum A 1.

The trustee's counsel compared Mr. Martin's aggregation process to that of rolling a snowball to make a snow boulder. In the process of aggregating a participant's user accounts, the original snowball was formed by matching records using data from both the name and email address fields. Martin Tr. 64:6-17. Each subsequent step included matching records using data from the name field along with data from the other selected fields in various combinations- packing more snow around the snowball depending upon additional data matches. In effect though, because it was included in every step, the name field data performed a limiting function in Mr. Martin's aggregation process. If, for instance, an individual participant created user accounts under names too dissimilar to match under Mr. Martin's aggregation process (James Smith and Superman), those accounts would not be combined in determining whether the individual was a net winner, and thus a class defendant in one of these adversary proceedings, or a net loser, eligible to file a claim and receive a distribution from the bankruptcy estates. See Martin Rpt. Addendum A 2; Martin Tr. 192:23-193:11.

In his original report, Mr. Martin explained that the selection of the name field as his starting point or constant was based, in part, on an assumption about participant behavior: "Because Participants could receive Direct Receipts from TelexFree, it is reasonable to assume that a Participant would include an accurate name when registering a User Account." Martin Rpt. ¶ 55. As Mr. Dennis noted in his rebuttal report, however, Mr. Martin did not explain how this assumption was reasonable-that is, what about the process of receiving funds directly from TelexFree had any bearing on whether a user entered an accurate name. Dennis Rpt. ¶ 54. Mr. Dennis also asserted that the TelexFree transaction data did not support Mr. Martin's assumption. Dennis Rpt. ¶ 55. Mr. Martin did not address these criticisms in his reply report. In his testimony, Mr. Martin conceded that he did not know whether a participant would have been impeded in receiving funds directly from TelexFree if the participant had failed to provide the participant's actual name. Martin Tr. 149:5-12.

Mr. Martin stated that, as to user accounts that were connected to an eWallet account for which a user "had to provide [a] driver's license, passport, that type of information," "there's usually a strong correlation between [the name on the eWallet account] and the name on their . . . user account." Martin Tr. 149:5-12. He did not, however, provide specific details about such a correlation, including how often it occurred.

Adding to the assumption stated in his original report, Mr. Martin asserted in his reply report that he also considered the following when deciding to select the name field as his constant:

Over the approximately two-year operations of TelexFree, a Participant's name is more likely to remain constant when compared against other potential identifiers, such as phone numbers, email addresses, and physical addresses. Additionally, a Participant's name will remain distinct even when other similar information is shared, as would be expected among family members or roommates.

Martin Reply Rpt. ¶ 16. All this sounds logical, but it is unclear whether Mr. Martin's statements were based upon having tested the actual data and, if so, how and to what extent the results supported his statements.

In a peer-reviewed health care research-related study cited by Mr. Martin in Appendix B to his original report, the study's authors similarly claimed that they had "selected surname, first name, sex, and date of birth as [their] common identifiers [to be matched between datasets] because they were less likely to be changed over time, compared to other identifiers like address." Bing Li et al., Assessing Record Linkage Between Health Care and Vital Statistics Databases Using Deterministic Methods, 6 BMC Health Servs. Rsch., no. 48, April 2006, at 1, 3, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1534029/pdf/1472-6963-6-48.pdf. That study relied upon data from three Canadian administrative health databases that "are widely used in population and health services research to determine death status, cause of death and medical history," with one of the databases, which contains universal health care recipients' demographic information, being described as "nearly complete and consistent, and [as being] used as a proxy for the population of [the relevant Canadian province]." Id. at 2. Using a method that relies upon exact matches (a deterministic method, discussed infra), the study sought to confirm which combination of identifiers matched the data most accurately when no unique identifier was available, settling on "the combination of surname, sex and date of birth [as] the optimal choice." Id. at 2, 3-10. All of the databases contained the government-assigned "unique lifetime Personal Health Number" for each record included in the study and which were used by the study's authors to confirm the extent to which records matched correctly. See id. at 2-3. Mr. Martin does not appear to have had access to similarly reliable data such that he could have accurately assessed which identifiers would have been less likely to change over time. Nor does he state that he attempted to make such an assessment.

As further justification in both his original and reply reports, but without elaboration as to the analytical steps taken and results observed, Mr. Martin stated that the name field was selected as the starting point "[a]fter a thorough review of the data." Martin Rpt. ¶ 55; Martin Reply Rpt. ¶ 14. His testimony lacked detail as to this underlying data review process. See, e.g., Martin Tr. 64:12-68:1, 71:13-72:6, 76:11-13, 77:8-78:7, 78:12-16, 118:14-19, 222:15-20.

Mr. Martin has also largely failed to detail the extent to which acknowledged data quality issues were considered, analyzed, and addressed, including issues with name field data. The name field and the other six fields that Mr. Martin selected for his aggregation process all contained user-entered data (or, depending upon the whim of the individual and whether the field was optional, no data at all). Mr. Martin has described the user-entered data as "often incomplete, inaccurate, or clearly incorrect." Martin Rpt. ¶ 52; see also Martin Rpt. ¶ 33 (stating that "Participants often provided inaccurate, inconsistent, or incomplete information when opening accounts"); TelexFree Analysis of Damages at 24, United States v. Merrill, No. 14-cr-40028-TSH-1 (D. Mass. Mar. 16, 2017), ECF No. 332-1 (stating in Huron-produced report that "many Participants included incomplete and/or intentionally incorrect information when registering their User Accounts"). In his original report, Mr. Martin provided a few general examples of apparent anomalies in the data-"phone numbers comprised entirely of letters, email addresses missing the '@' symbol and instances where only a period or an ellipsis was entered instead of other requested information," see Martin Rpt. ¶ 52, Addendum A-but he has never quantified how often the data failed to meet expectations or explained in detail how he made such determinations. Apart from Mr. Martin's statement that it "complicated the task of aggregating multiple accounts held by a single Participant," see Martin Rpt. ¶ 33, the precise scope of this issue and its impact are essentially undiscussed.

As to the name field specifically, Mr. Dennis identified several potential data quality issues. Mr. Dennis noted, and Mr. Martin did not dispute, that there do not appear to have been any restrictions on which characters a participant could enter in the name field or any process by which the information entered would have been validated as the actual name of the user account holder. See also TelexFree Analysis of Damages at 24, United States v. Merrill, No. 14-cr-40028-TSH-1 (D. Mass. Mar. 16, 2017), ECF No. 332-1 (noting that TelexFree "did not validate data"). Providing a noncomprehensive list of examples of this issue, Mr. Dennis identified tens of thousands of name field entries that contained blatantly false names such as "a a" and ". ." Dennis Rpt. ¶ 56; see also Dennis Tr. 274:12-276:8. Mr. Dennis also discussed finding an account under the name of "Mickey Mouse," which led to his discovery of multiple accounts under other apparently false names, including "Walt Disney" and "Team Legendary." Dennis Tr. 277:2-282:2. When asked, Mr. Martin conceded that false names could have been entered and that he had not attempted to determine the extent to which this had occurred. See Martin Tr. 161:20-25.

In what might have been Mr. Martin's attempt to limit the impact of such an issue, he excluded from his aggregation process all user account records that had three characters or fewer in the name field, after all spaces and "extraneous characters" had been removed. See Martin Rpt. ¶ 56, Addendum A 1. That is, Mr. Martin made no attempt to group such user accounts with any other user accounts. Such user accounts thus stand alone as net winners or net losers, regardless of whether they might belong to a participant who held other user accounts and who may be an alleged defendant net winner in these adversary proceedings. Mr. Martin did not explain the purpose or offer any analytical or statistical support for his decision to treat the name field data in this manner.

Although Mr. Dennis raised concerns about the lack of clarity, see Dennis Rpt. ¶ 57 (discussing Chinese characters), no definition has been provided for "extraneous characters." Thus, it is unclear precisely what characters were excluded. In his testimony, Mr. Martin mentioned only that he had "removed any non-alphanumeric characters" and then provided the punctuation mark of period as an example. See Martin Tr. 63:13. The trustee has asserted that numbers were also removed from the name field, but that assertion is unsupported. See Pl.'s Resp. to Defs.' Mot. to Exclude 12.

Mr. Dennis noted in his rebuttal report that more than 200, 000 user accounts had name field data with three characters or fewer and thus would have been excluded from Mr. Martin's aggregation process. Dennis Rpt. ¶ 56. In his reply report, Mr. Martin responded that "these 200, 000 User Accounts represent less than 2% of the 11 million User Accounts associated with at least one invoice and they are all the result of the Participant choosing to use a name other than their own." Martin Reply Rpt. ¶ 17. Mr. Martin appears to be suggesting that, relative to the whole, a statistically insignificant number of user accounts were excluded. Mr. Martin's response does not, however, explain or support that suggestion or his decision to exclude the records from the aggregation process and does not address the potential impact of that decision. Mr. Martin has not explained, for example, how a three-character entry was determined to be more likely to contain a false name than a four-character entry. Mr. Martin has also not explained the extent to which a defendant's alleged net winnings could be impacted if the excluded user accounts had instead been included.

Mr. Dennis also reported that he observed records in which the name field appeared to contain Chinese characters "generally two or three characters in length." See Dennis Rpt. ¶ 57. Regardless of whether these characters might have represented a participant's valid and complete name, the user account would have been excluded from Mr. Martin's aggregation process. See Dennis Rpt. ¶ 57. Mr. Martin did not address this potential issue or its impact in his reply report. Rather, as noted, Mr. Martin asserted that in all circumstances in which three characters or fewer were entered into the name field, it was "the result of the Participant choosing to use a name other than their own." Mr. Martin did not explain how he knew this.

In addition to acknowledging inaccuracies such as false names, Mr. Martin acknowledged the potential for variations to exist in real names. Martin Tr. 78:15-16, 161:5-22. Using his own name as an example, Mr. Martin noted that one might "enter Timothy Martin on one user account, but another user account they might use T. Martin or Tim Martin." Martin Tr. 78:17-19. Even after removing spaces and punctuation, such user accounts would not be aggregated if matching were based exactly upon the name field data. In an attempt to address this issue, Mr. Martin "ran various analyses of picking up certain letters within a name, certain combinations." Martin Tr. 78:21-22. After "hundreds of hours" spent on such analyses, Mr. Martin created two variants of the name field data, using only certain portions of the data when attempting to match user account records in various steps of his aggregation process. See Martin Tr. 75:5-8, 78:22-79:15; Martin Rpt. Addendum A 1. Mr. Martin did not provide any details about the analyses or results that prompted or justified the creation of these particular variants. Mr. Martin testified that one of the variants was found to be "useful" but did not elaborate on the effectiveness of using either variant. See Martin Tr. 79:13-15.

In his testimony, Mr. Martin stated that one of the variants used only the first three characters from the name field, and he indicated that thus "Timothy" and "Tim" from his example could then be brought together as a potential match. Martin Tr. 79:2-7. Mr. Martin's report, however, states that the variant used the first four characters from the name field, making it unlikely that "Timothy" and "Tim" would be aggregated. See Martin Rpt. Addendum A 1. The second variant created and used by Mr. Martin joined the first character and the last four characters from the name field. See Martin Rpt. Addendum A 1; Martin Tr. 79:8-12.

As for Mr. Martin's process of selecting the name field as his starting point and constant, Mr. Martin asserts that he analyzed available data across multiple fields before settling on the name field. He has, however, provided no details or documentation of these efforts. Mr. Martin also generally discussed the overall selection of fields for his multistep aggregation process. For example, Mr. Martin stated that he "wanted to use fields that were required [to be completed during the account creation process] as opposed to optional" because optional fields would "be blank in . . . many of the records." Martin Tr. 71:16-19. He also stated that the distinctness or uniqueness of a field's data was factored into the field selection process-e.g., name field data was more distinct than country field data, which was not selected. Martin Tr. 71:19-72:6; see also Martin Tr. 54:20-56:3 (discussing total and unique counts for certain fields but not detailing analysis of uniqueness in determining which fields to choose). Mr. Martin also generally discussed (i.e., without providing statistical or other concrete support for his assertions) why certain fields, including the name field, were determined to be inadequate to use alone for aggregating the user account records. Martin Tr. 65:7-67:22. Having concluded that fields would need to be used in some combination, Mr. Martin asserted that he "considered name . . . to be the most important variable." Martin Tr. 67:23-68:1, 222:19-20. Yet, he has offered only unsupported assumptions and untestable bases for why he selected the name field, despite its acknowledged flaws, above all other fields to be the primary focus in his aggregation process. See also Martin Tr. 222:21-24 (agreeing that "if name didn't mean as much to . . . a particular user, that could skew the results of an aggregation").

Mr. Martin stated that the name field could not be used alone due to a "significant amount of overlap in names," meaning that participants who seemed to be different people shared the same name. See Martin Tr. 67:4-10. It is unclear how Mr. Martin determined that to be the case or the extent of the issue. Mr. Martin stated that a field believed to be intended to contain users' social security numbers or some equivalent was not used at all because it had "incomplete data, bad data." See Martin Tr. 65:13-16; see also Martin Tr. 80:3-15 (asserting "[a] lot of incomplete information in the field," "many of them were not filled in and a lot of it was more garbage information, like a high percentage of dash, dash, dash, or dot, dot, dot . . . if my recollection is correct"), 81:1-10 (asserting that field related to "tax ID number" was similarly rejected). No further supporting information was provided. Mr. Martin also stated that an email address field was insufficient to use alone because he "found" that "there were a lot of instances where a [person who recruited new members] may open accounts for other individuals and . . . would continue to use [the recruiter's own] email address" but would enter someone else's name, presumably a recruit's. See Martin Tr. 66:7-13. First, this explanation potentially conflicts with Mr. Martin's earlier explanation for deciding to include that field in the first step of his aggregation process. In his original report, he justified that decision by stating that "Participants were presumed to more likely provide accurate email addresses than most other data," offering no support for that presumption. See Martin Rpt. ¶ 57. Mr. Martin also separately asserted in his testimony that email addresses could not be used alone because "there w[ere] also a lot of bad email addresses, email addresses that didn't have extensions, no, no 'at' symbols, and things like that." Martin Tr. 66:17-19. Mr. Martin also extended his invalid-email-reasoning to justify his decision to disregard user accounts with allegedly invalid email addresses during that first data-matching step (although such accounts might have become aggregated based upon data-match attempts in later steps not involving the email address field). Mr. Martin never explained, however, why the validity of an email address could matter for data matching purposes. That is, it seems that one could have typed anything into the webform and might have done so consistently across multiple user accounts, and thus, it's unclear why Mr. Martin considered whether the email field contained valid data but did not clearly extend such a consideration to other fields, including the name field upon which he focused.

As noted, for an expert witness's opinion testimony to be admissible, it must be "based on sufficient facts or data." Fed.R.Evid. 702(b). Given actual and potential data quality and reliability issues discussed above, the defendants assert that, having selected the name field as his starting point and constant, Mr. Martin relied upon insufficient facts or data to support his expert opinion that user accounts should be aggregated primarily by using data from that field to determine the identity of individual net winners and the extent of their net winnings. See Defs.' Mot. 13-15. As also noted, an expert witness's opinion testimony must be based upon reliable principles and methods and the reliable application of such principles and methods. Fed.R.Evid. 702(c)-(d). Defendants suggest that Mr. Martin's process of selecting the name field as his starting point and constant does not meet either of these requirements for admissibility of his expert opinion. See Defs.' Mot. 15-16.

Given the issues identified above, I conclude that Mr. Martin has not shown that the name field data is sufficient to support his opinion that user accounts should be aggregated primarily by using that data for the purposes of determining who the net winners are and the extent of their liability. He has not shown that his assumptions about the name field data's accuracy are reasonable and not speculative. He effectively conceded that his initial assumption about participant behavior lacked support. He failed to support his additional assumptions with any reference to the actual data. His own statements and concessions about the name field data's quality issues, along with his failures to show that his efforts to address those issues were well-reasoned, all undermine his position that such data is sufficient to support his opinion. Further, Mr. Martin has offered almost no concrete information about the principles and methods that he used and how he used them to select the name field as his starting point and constant. With no documentation or detailed explanation of his efforts, there is no practical ability to objectively review the reliability of Mr. Martin's analytical techniques. For admissibility, it is not enough for an expert simply to provide his conclusions, he must also show his work so that its reliability, including its adherence to accepted practices in the relevant field, can be meaningfully reviewed. While I need not for purposes of the current dispute conclude that Mr. Martin's selection and use of the name field as a starting point and constant were correct or incorrect, I must and do conclude that Mr. Martin has not shown the requisite reliability of his selection and use of the name field, which formed the basis of his opinion.

d. Methodology Selection

The data quality issues raised with respect to the name field also existed in the six other user-entered data fields that Mr. Martin relied upon in his multistep aggregation process. Building on these data quality issues, the defendants attack Mr. Martin's selection of an iterative (multistep) deterministic methodology for his aggregation process as incorrect and unsupported. Defs.' Mot. 16-20. While the correctness of Mr. Martin's selection of a methodology is not a determinative factor in ruling on the defendants' motion to exclude Mr. Martin's expert opinion, I must consider whether Mr. Martin has supported his selection such that his opinion based upon that methodology is reliable. See Fed. R. Evid. 702(c).

Although the defendants assert in their motion that Mr. Martin's methodology selection was incorrect, Mr. Dennis does not appear to have reached that conclusion. Rather, he concluded only that Mr. Martin's selection was inadequately supported.

In his original report, Mr. Martin quotes and cites two health care research-related publications to convey basic information and considerations that could apply to any data matching method, of which there are several. See Martin Rpt. ¶¶ 43, 47, 48. His sources note that, because any method has pros and cons, the choice of method depends upon several factors, including the data's quality, how the matched records will be used, and the level of error that will be acceptable for that use. See Martin Rpt. ¶¶ 43, 47. Mr. Martin briefly describes two methods-deterministic and probabilistic-and indicates that he chose an iterative deterministic method. See Martin Rpt. ¶¶ 44-46, 54. Mr. Martin does not explain that choice in his original report. Rather, he again quotes from two health care research-related publications, each discussing conditions in which using a deterministic method might be appropriate. See Martin Rpt. ¶¶ 47, 49. He excludes any discussion of when a probabilistic method might be appropriate.

For a match to be made under a deterministic method, the data being compared must match exactly, character for character. For example, if the data from an email address field is to be matched between records, the data must be the same for a match to be made. In an iterative deterministic process (rolling the snowball) matches are generally made using a series of more lenient criteria. Under a probabilistic method, an exact match is not required. Matches are made based upon a degree of similarity between the data being compared. This method requires the use of mathematics and statistics to determine the likelihood or probability of a match based upon the similarities. Such a method could be appropriate if, for example, the data contains typographical errors or other discrepancies that could prevent exact matching of records that should be brought together.

According to Mr. Martin's first source:

In information-rich scenarios where direct identifiers are available and of good quality, deterministic methods have been recommended. . . . In scenarios that are information poor (where direct identifiers are unavailable) and/or the data are of poor quality, probabilistic methods consistently outperform deterministic methods and thus merit the extra time and resources required to implement them.

Stacie B. Dusetzina et al., Univ. of N.C., Linking Data for Health Services Research: A Framework and Instructional Guide 36 (2014), https://www.ncbi.nlm.nih.gov/books/NBK253313/pdf/Bookshelf_NBK253313.pdf; see also Martin Rpt. ¶ 47 (quoting same but excluding probabilistic method portion). As used here, "direct identifiers" is a term of art from federal health care data privacy regulations that includes certain individually identifiable information such as name, address, phone number, and social security number that might be included in a patient's protected health information. See Dusetzina et al., supra, at 10 (citing 45 C.F.R. § 164.514(e)(2)). That source continues:

Mr. Martin cites this source using various citation formats and, in one instance, cites to a specific chapter rather than the entire publication. See Martin Rpt. 13 nn.8-9, 14 nn.12 & 14, App. B.

The first step in selecting the appropriate linkage strategy is to determine whether direct unique identifiers are available (e.g., SSNs). In scenarios in which direct unique identifiers are available, deemed to be of high quality, and nonmissing in approximately 95 percent of cases in each dataset, a deterministic approach is recommended. A one-time deterministic approach is the easiest to design, implement, and interpret. It involves a binary, "all or nothing" decisionmaking process in which record pairs are compared character for character across all identifiers. Record pairs that agree exactly on the given identifiers are classified as matches, while record pairs that disagree on even a single character are classified as nonmatches. . . .
An iterative deterministic approach, such as the well-documented [two-step process devised by the National Cancer Institute to match cancer patient registry data and Medicare claims data], provides a more flexible alternative to a one-time deterministic approach. It involves an initial match on the most conservative matching criteria, followed by subsequent matches where record pairs that failed to meet the initial criteria are passed to a second, more lenient set of matching criteria. Record pairs that meet the matching criteria at any step are classified as matches, while record pairs that meet no matching criteria are classified as nonmatches.
In many cases, identifiers are available but incomplete, fraught with typographical errors, or imperfectly measured. In these scenarios, probabilistic techniques are recommended, as they have consistently outperformed deterministic techniques in earlier research. . . .
An optimal approach that covers all scenarios, datasets, research questions, and/or situations does not exist. . . . The decision of which approach to use depends ultimately on the research question and the available resources.
Id. at 67-68; see also id. at 32 ("The key is to develop algorithms to extract and make use of enough meaningful information to make sound decisions."). The source further notes that deterministic and probabilistic methods can be combined, using the deterministic approach for initial matching and then using the probabilistic approach, which can require exceedingly complex computations based upon varying degrees of similarity between records, to attempt to match additional records that were unmatched initially. See id. at 33-35, 38.

As to the second of Mr. Martin's quoted sources discussing conditions in which a deterministic method might be appropriate, Mr. Martin asserts that the source demonstrates that "[d]eterministic record [l]inkage is supported by peer-reviewed analysis." See Martin Rpt. ¶ 49. He does not otherwise comment on his inclusion of this source. Peer-reviewed or not, the source does not support using a deterministic method generally. It analyzes a specific scenario in which a deterministic method might be used, involving a scenario and method bearing no clear resemblance to those faced and used by Mr. Martin.

Roughly stated, the source discusses a study assessing the validity of certain non-iterative deterministic approaches using various combinations of specific indirect identifiers (e.g., "date of birth, sex, admission date, and provider information such as hospital ID") to match records in one dataset (or registry) of patients who have a specific disease or medical device with those same patients' records in a separate dataset of inpatient insurance claims data, as might be needed for epidemiological studies and health care effectiveness research involving data that, for privacy reasons, lacks direct identifiers. Soko Setoguchi et al., Validity of Deterministic Record Linkage Using Multiple Indirect Personal Identifiers, 7 Circulation: Cardiovascular Quality & Outcomes 475, 475-76 (2014), https://www.ahajournals.org/doi/pdf/10.1161/CIRCOUTCOMES.113.000294. The validity of the approaches being tested was assessed by comparing the accuracy of the matches made using indirect identifiers to those made when matching the same patient registry and claims data using indirect identifiers plus a unique direct identifier (beneficiary ID)-essentially comparing the results to something approaching what the results would be with perfect information to determine the extent to which the tested approaches could achieve valid matches despite using indirect identifiers only. Id. at 476-77. As is evidenced from the language partially quoted by Mr. Martin, the study found that certain deterministic approaches using only indirect identifiers could be highly valid by comparison, as long as the provider information was among the indirect identifiers:

In conclusion, deterministic linkage using multiple indirect identifiers including provider IDs can produce reliable and valid linkage compared with that using a combination of direct and indirect identifiers to link hospitalization records from a registry to inpatient claims data. In the absence of direct personal identifiers, provider information was the key to identifying unique records and conducting successful linkage. Further studies are needed to understand the validity of similar methods to link outpatient records and the performance of deterministic versus probabilistic linkage methods in real-world record linkages for comparative effectiveness research.
Id. at 479; see also Martin Rpt. ¶ 49 (quoting same but excluding final sentence). Mr. Martin misattributed the above quote. To the extent that he intended to have quoted from the source that he actually cited, see supra note 17, that source likewise evaluates a narrow and specific use of a deterministic approach with no clear relevance to Mr. Martin's decision to use that method here.

As to the more general considerations for selecting among data matching methods, discussed in Mr. Martin's sources, Mr. Dennis contends that Mr. Martin has not shown that he adequately incorporated them into his decision-making process, including considerations as to data quality and how the matched data would be used. See Dennis Rpt. ¶¶ 49-53. Mr. Dennis notes that, by Mr. Martin's own admission, the TelexFree data being used was of poor quality- "often incomplete, inaccurate, or clearly incorrect," having been user-entered without restrictions or verification. See Dennis Rpt. ¶¶ 51 (quoting Martin Rpt. ¶¶ 33, 52), 56; see also TelexFree Analysis of Damages at 24, United States v. Merrill, No. 14-cr-40028-TSH-1 (D. Mass. Mar. 16, 2017), ECF No. 332-1. Thus, based on Mr. Martin's own sources, a probabilistic method would seemingly have been indicated and recommended. See Dennis Rpt. ¶ 52. Yet, as noted, Mr. Martin did not explain why he rejected a probabilistic method in favor of an iterative deterministic method or why he selected a deterministic method despite knowing that the data was of poor quality. Dennis Rpt. ¶ 53. Mr. Dennis also notes that Mr. Martin did not explain how publications about specific health care data research techniques were similar enough to the TelexFree scenario such that they would have been relevant to Mr. Martin's decision to use an iterative deterministic method. See Dennis Rpt. ¶ 49.

In his reply report, Mr. Martin attempts to address some of Mr. Dennis's critiques. See Martin Reply Rpt. ¶¶ 10-13. He states that, to improve data quality before developing his aggregation method, "extensive data scrubbing was performed and data that did not meet the minimum quality thresholds derived through iterative testing and comparisons of results was excluded." Martin Reply Rpt. ¶¶ 12-13. He mentions only one example of data scrubbing "removing spaces between names." See Martin Reply Rpt. ¶ 12. Other similar modifications to clean and standardize the data were briefly noted in his original report, including removing "extraneous characters" from the name field and removing non-numeric values from phone number fields. See Martin Rpt. Addendum A 1. In his testimony, Mr. Martin stated that "[t]he fact that the, some of the information was inaccurate didn't play a tremendous amount in, into that decision [to use a deterministic method] . . . ." Martin Tr. 62:16-21. He then broadly described some of his efforts to clean and standardize the data and provided a few basic examples, seeming to suggest then, as well as in his reply report, that such efforts would have alleviated data quality concerns. See Martin Tr. 62:18-63:22; Martin Reply Rpt. ¶¶ 12-13.

Although in his original report he states that he used a deterministic method, Mr. Martin states in his reply report that he "did not rely on a particular methodology" but that his selected methodology overall "most closely followed a deterministic methodology," specifically an iterative one that "was replicable, and, in [his] opinion, yielded the most accurate result." See Martin Reply Rpt. ¶¶ 11-12. He states that he selected this approach because "each aggregation step's results could be evaluated and reproduced." Martin Reply Rpt. ¶ 12. Without any reference to authority, he insists that "[t]his could not be accomplished with a probabilistic approach, in which initial weights and subsequent manual training would not lend itself to documenting and explaining the results of each step in a simple manner." See Martin Reply Rpt. ¶ 12. Mr. Martin did not explain why simplicity would have been a factor.

In his testimony about selecting a method for aggregating user accounts, Mr. Martin noted that, in the circumstances presented, any aggregation process could not be expected to be entirely accurate but that he strived to select "a reasonably reliable method." Martin Tr. 56:9-20. Mr. Martin stated that he assumed that his aggregation results would not be "final" and would instead serve to "start[] the conversation" with participants who could then "provide either evidence or explanation as to why some of the user accounts that were aggregated shouldn't have been or why other ones should have been included." Martin Tr. 56:9-17, 144:4-8, 144:22-145:1; see also Martin Tr. 140:20-23 (stating he was "looking for an aggregation process that would be reasonably accurate based on the information, but fully with the anticipation that the . . . participant would have the ability to challenge [it]"), 223:5-7 ("What we were trying to do was develop a reasonably accurate aggregation methodology that somebody could review and then challenge."). Because he assumed that the aggregation results would be starting a conversation, Mr. Martin "wanted to make sure that anybody, not just a net loser, but a net winner, anybody could look at the user accounts and just by looking at it, get a sense of why it is that they were [brought together]." Martin Tr. 145:7-10. Offering very little detail, Mr. Martin asserted that a probabilistic approach-requiring calculations that would help to establish likely but not necessarily exact matches (potentially allowing for more matches despite variations in the data)-would be "less concrete" and "not as transparent in trying to explain to a participant why it is that this . . . certain user account is included with their aggregation." See Martin Tr. 61:1-10; see also Martin Tr. 134:4-20. Thus, he clarified for the first time that his primary reason for choosing an iterative deterministic approach was because he would be able to explain most easily, based upon the easy-to-see exact matches between certain types of data in the aggregation results, why multiple user accounts were thought to belong to one person-such as two user accounts being matched in the first iteration (or step) because both records contained the same data in the name and email address fields-expecting that the person could then "determine if it looks right to them" and, if not, dispute it. See Martin Tr. 60:17-61:1, 105:20-25, 110:5-12, 134:8-20, 140:19-23, 144:4-145:10, 223:5-7.

Mr. Martin also newly asserted in his testimony that a probabilistic approach was or might have been impossible and that, for this reason, he did not choose it. Martin Tr. 61:19- 62:7, 130:15-25, 132:10-13, 145:1-3. He noted that the data was contained within one table. Martin Tr. 58:13-15. He stated that it would usually be the case, when matching data, to be comparing the data in two tables and making matches between records from each table. See Martin Tr. 58:9-12. With only one table involved here, he was comparing data and matching records within the table-comparing the table to itself. See Martin Tr. 58:13-15. He indicated that this one-table scenario lacked an "answer key" that would otherwise be available in a two-table scenario and that, as a result, calculations under a probabilistic method "would have been much more subjective." Martin Tr. 61:19-62:7; see also Pl.'s Resp. to Defs.' Mot to Exclude 21 n.9 (asserting without reference to authority that "probabilistic method . . . generally requires the existence of one data set containing information known to be correct, i.e. an 'answer key'"). He later stated that because he was working with only one table he was "not sure" whether probabilistic calculations could have been made "in a proper way in this case." Martin Tr. 130:20-25; see also Martin Tr. 132:10-13 (asserting that he had either been unable to make a specific probabilistic calculation due to having only one dataset or could not remember what the calculation had been), 145:1-145:4 (indicating that having only one data set would have prevented probability-related calculations), 211:9-212:2 (expressing uncertainty as to whether probabilistic method could be applied to single dataset). Mr. Dennis disagreed that it would have been impossible to use a probabilistic method. See Dennis Tr. 264:6-7. In any event, Mr. Martin's new claims and equivocal testimony certainly raise questions about the extent to which he actually considered using a probabilistic method before making his selection.

As discussed, there is no optimal approach, but not all approaches are equally suitable to all circumstances. To support the reliability of having chosen one approach or combination of approaches over others, the specific circumstances must have been considered and reasonably incorporated into the decision. Again, I am not deciding whether Mr. Martin made the correct choice but whether he has shown that his choice was well-reasoned such that it could be objectively reviewed wherever possible and on par with what would generally be expected from an expert in this field.

Mr. Martin's sources emphasize data quality considerations, and Mr. Martin certainly was aware of multiple issues indicating that the TelexFree data was of poor quality. To the extent, however, that Mr. Martin has suggested that he attempted to improve data quality and those attempts were successful to a degree that poor quality would no longer be a factor in choosing an aggregation method, he has not cited any source or otherwise supported such a proposition. Moreover, Mr. Martin has not provided any detail about the extent to which his data cleaning and standardization improved the data quality. He also has not explained what "minimum quality thresholds" were applied, see Martin Reply Rpt. ¶ 13, what data was excluded, and the extent to which such exclusions might have affected any analysis or results. Further, Mr. Martin's discussion of his efforts does not fully address the quality concerns identified above, particularly the substantive issues with user-entered data. Overall, whether and to what extent Mr. Martin properly considered data quality before choosing to use an iterative deterministic method cannot be meaningfully reviewed.

Mr. Martin's most frequently cited source indicates that, before choosing a methodology, data cleaning and standardization are helpful for reducing idiosyncrasies such as formatting inconsistencies and typographical errors in the data that might prevent otherwise valid matches. Dusetzina et al., supra, at 29-31 (noting that "[c]leaning has been highly recommended if the data quality is poor and/or only a few identifiers are available"), 67, 70. The source does not suggest, however, that cleaning and standardization would be adequate to eliminate data quality concerns or that it could upgrade low-quality data into high-quality data.

In his reply report, Mr. Martin also states that he "developed and matched" variants of the data in certain fields-for example, he used and combined portions of certain fields' data as discussed above in the context of name field data. Martin Reply Rpt. ¶ 12. He later stated in his testimony that he "borrow[ed] from probabilistic for that," Martin Tr. 60:2-5, but he has not provided any details about this probabilistic approach such as how it was executed or how it supported choosing the variants that he did. Mr. Martin clearly intended that these variants improve matching despite certain limited discrepancies in the data, but he has simply failed to show how he reached his conclusions and how those conclusions would relate to establishing the reasonableness of his choice of an iterative deterministic aggregation methodology.

Mr. Martin's sources also emphasize consideration of how the matched data will be used and the extent to which errors in matching will be acceptable in that context. Mr. Martin was aware of two ways in which the aggregated data would be used. He was initially hired, among other things, to "develop a methodology to identify Net Winners and Net Losers" in order "to assist in the claims process" and "to enable the Trustee to seek recoveries on behalf of the Net Losers." See Martin Rpt. 3-4. Yet, from his stated assumption about how his aggregation results would be used-to start a conversation that could lead to negotiation with a participant- Mr. Martin seems to have failed to appreciate a fundamental difference between the claims administration process as implemented in the main cases and the trustee's recovery-seeking process in these adversary proceedings. In the claims process, anyone who wished to file a claim and who entered certain minimum information during the electronic claims submission process was presented with aggregation results during that process. The claimant could then make adjustments and provide additional data in response to the aggregation results, before submitting a claim, in effect engaging in the conversation that was started by the aggregation results, as Mr. Martin alluded to in his testimony. In these adversary proceedings, however, an alleged TelexFree participant's status as a defendant would be determined based upon Mr. Martin's aggregation results showing that the person was a net winner, and, because those aggregation results would be used to present the trustee's prima facie case, alleged net winners would have no ability to dispute their status and avoid liability other than by being forced to put on a legal defense challenging their liability. Defendants who failed to do so would risk a default judgment. Thus, in the context of these adversary proceedings, Mr. Martin's aggregation process would not merely be starting a conversation but also could be ending it.

As Mr. Dennis indicates, data quality and how matched data will be used can involve overlapping considerations. Mr. Dennis quotes from a source that Mr. Martin listed as one upon which he had relied (but did not discuss or reference directly in his report).

Poor quality (eg if variables are missing, indecipherable, inaccurate, incomplete, inconstant, inconsistent) could lead to records not being linked - missed links - or being linked to wrong records - false links. The impact of these two types of errors may not be equal (eg a missed link may be more harmful than a false link), so this needs to be taken into account when designing a data linkage strategy, especially if the linking has legal or healthcare implications.
Harold Kroeze et al., Centre of Excellence on Data Warehousing (European Comm'n), Statistical Data Warehouse Design Manual 109 (2017), https://ec.europa.eu/eurostat/cros/system/files/s-dwh_design_manual_v1.pdf; see also Dennis Rpt. ¶ 72; Martin Rpt. App. B.

The need to aggregate user accounts to determine who the named and class defendants should be for these adversary proceedings is generally clear, although the feasibility of that task seems less so. Given how the electronic claims process was designed and functioned, see infra note 28, the need to aggregate user accounts in advance of that process is less clear. In his testimony, Mr. Martin asserted that the need to aggregate existed "right from the beginning" because "the trustee necessarily needed to create a, a claims process in this case." Martin Tr. 110:18-22. He also stated that the aggregation results helped to inform some of the trustee's objections to claims. Martin Tr. 115:17-24.

Mr. Martin testified that, when developing his methodology, he conducted initial interviews with a small nonstatistical sample of individuals whom he believed to be net losers. Some reported that they would be unable to remember the login information or usernames associated with each of their accounts. Mr. Martin thus determined that he would need to develop a system by which participants who wished to assert claims in the bankruptcy cases could identify and retrieve their user account information based upon other information in the account table that they might remember such as an email address or phone number entered when they created their user accounts. He ultimately created an electronic claim submission process that enabled claimants to enter certain information associated with their user accounts to claim those user accounts and, in turn, to claim the transactions associated with them, automatically netting gains and losses into a claim amount. Before submitting a claim electronically, a claimant could manually adjust the claim if, for example, he disagreed that a particular user account or transaction should be included or wished to add a user account or transaction that had been omitted. Thus, a claimant had input into the results of the aggregation methodology before any adversarial process was initiated.

The trustee asserts:

Upon the admission of the methodology, the Trustee can apply the methodology to determine each Class Action Defendant's Net Winnings and that calculation, in conjunction with the Ponzi presumption, will constitute the Trustee's prima facie case. The burden of production will then shift to the Class Action Defendant to offer evidence to rebut the amount of the Net Winnings attributable to him/her. However, a Class Action Defendant cannot challenge the admissibility of the methodology.
Pl.'s Resp. to Defs.' Mot. to Exclude 18; see also id. at 29-30.

Having not appreciated the significant distinction between the two ways in which the matched data would be used, Mr. Martin could not have factored that distinction into his consideration of what would be an acceptable error rate in his aggregation process. He chose a method that he believed would be "reasonably accurate" in order to get the conversation started. His description is consistent with having considered the claims process only. Thus, he has not shown that in choosing his iterative deterministic method he properly considered, for the purposes of these adversary proceedings, how the results would be used and what level of error would be acceptable.

Under these circumstances, I cannot conclude that Mr. Martin adequately considered relevant factors in choosing his methodology and therefore that his opinion based upon that methodology is reliable.

e. Methodology Application

Assuming that Mr. Martin's selection of an iterative deterministic method was appropriate, the defendants' dispute that he reliably applied that method in developing his opinion. They assert that Mr. Martin's application of his aggregation methodology was subjective, inadequately tested, subject to bias, and created through trial and error without documentation, which would be necessary to enable a meaningful review. See Defs.' Mot. 20-22. Thus, I must consider whether Mr. Martin has established that he reliably applied his chosen methodology to the facts of these proceedings. See Fed. R. Evid. 702(d).

As noted, Mr. Martin ultimately developed and used a 13-step procedure. Each step (or iteration) used data from the name field combined with data from one or more of the six other selected fields, with such data having first been cleaned and standardized to some degree. Some steps used only portions or certain combinations of data from the specified fields. Each step identified exact data matches among all user account records (a deterministic method - specifically, a multistep or iterative deterministic method). After the first step of initial matches, each subsequent step's resulting matches were compared and matched to prior steps' results, forming an ever-increasing aggregation of user accounts assigned to individual participants. If the data fields used in any step were changed or any steps reordered, the aggregation results could be different. See Martin Tr. 83:6-13, 84:13-85:12; 222:12-24. In the end, approximately 14.99 million user account records from the account table had either been aggregated or stood alone. The results were then used in determining whether individual participants were net winners or net losers based upon the transaction data associated with their user accounts.

As noted, there were approximately 17 million user account records. Of those, only about 14.99 million user account records, including Ympactus user accounts, were subjected to the aggregation process. After the aggregation process was complete, Mr. Martin excluded from the results the user accounts believed to be from Ympactus. Martin Rpt. ¶¶ 35-37, Addendum A 2. But see Pl.'s Resp. to Defs.' Mot. to Exclude 9 (stating incorrectly that Ympactus user accounts were removed before aggregation). The reason for including the Ympactus user accounts in the aggregation process was not provided. Presumably, they were included because they might improve the accuracy of the matching process by offering more connections between user accounts, including those of TelexFree, as it appears that some people participated in both schemes. The Ympactus user accounts are not factored into whether a TelexFree participant is an alleged net winner or net loser. That is, those accounts were included in the aggregation process but excluded in the final calculation. Assuming that their initial inclusion was to improve matching, it seems that Mr. Martin separately made an inconsistent choice as to another group of user account records. He excluded from the aggregation process entirely more than 2 million user account records that were not connected to a paid invoice because such records "would have no impact on Net Equity," that is, whether a participant was a net winner or net loser. Martin Rpt. ¶¶ 35 & n.4, 37. Yet, it would seem that those records, had they been treated like the Ympactus records, could have had an impact on matching quality.

In his original report, Mr. Martin describes at length how his multi-step process functioned, including a hypothetical example to demonstrate it. See Martin Rpt. ¶¶ 54, 57-64, Addendum A. He likewise described the process in his testimony. Martin Tr. 86:1-105:18. He repeatedly noted that "thousands of hours" had been devoted to developing the process over more than a year, with some portions alone requiring "hundreds of hours . . . of analysis." Martin Tr. 74:5-11, 74:18-76:24. He stated that the process had been developed through trial and error and that no specific number of steps had been mandated by the chosen method. See Martin Tr. 72:16-22, 136:21-137:8, 141:5-142:2, 144:3-8. Rather, seeking to "aggregate sufficiently," Mr. Martin testified that he continued creating additional steps until in his judgment the law of diminishing returns indicated that he should stop. See Martin Tr. 72:22, 142:3-10.

As was the case with respect to the name field, Mr. Martin has provided limited information about how he selected the other six fields of user-entered data for inclusion in the various steps and why he rejected others. See Martin Rpt. ¶¶ 51 (stating that "Common Identifiers were analyzed to determine which data fields should be used"), 55 (stating that after selecting the name field "[a]dditional fields were selected based on the likelihood that their inclusion combined with other fields, would accurately aggregate User Accounts by Participants"); Martin Tr. 54:20-56:3, 71:13-72:6, 77:8-81:22, 130:2-14, 159:14-160:15. Mr. Martin's reports and testimony are essentially devoid of any explanation as to how and why the specific combinations of fields' data were selected for each step. See, e.g., Martin Rpt. ¶¶ 67-68; Martin Tr. 65:7-67:22. Mr. Martin confirmed that the order in which the steps were performed mattered and that there was "a reason for the order." Martin Tr. 83:10-13. He agreed that changing the order would alter the outcome, but he never explained why his specific ordering of the steps he chose was most appropriate and never supported such a conclusion other than to suggest that analysis had indicated it to be so. See Martin Tr. 76:8-16, 77:1-79:25, 84:13-85:12.

Mr. Martin stressed that "analysis" and "testing" were done throughout the development phase of his aggregation process. E.g., Martin Tr. 66:3 ("significant amount of analysis"), 72:22-74:4 ("determined through iteration and analysis"), 75:23-76:2 ("time intensive and analysis driven"), 76:10-16 ("continuously analyze the data"), 77:8-11 ("tremendous amount of analysis"), 77:23-78:7 ("a lot of analysis"), 78:12-16 ("significant amount of analysis"), 78:21 ("ran various analyses"), 95:6-8 ("significant amount of analysis"), 109:3-20 ("run analytical procedures"), 130:8-14 ("significant amount of testing"). Yet, he often provided only theoretical or notional examples and did not discuss the specific details such as the type of analysis or test, the procedure employed, or the results achieved. See, e.g., Martin Tr. 62:24-63:22, 73:4-74:4, 76:10-79:25, 95:6-20, 108:1-109:20, 139:19-140:12, 142:6-7. He conceded that he does not have records documenting these efforts and their results. See Martin Tr. 137:9-14. In developing his method, Mr. Martin also relied upon information gained through conversations with a nonstatistical sample of net losers about their experiences in creating user accounts and participating in the TelexFree scheme, but he had no formal notes from those conversations and no comprehensive list of those with whom he spoke. Martin Tr. 78:4-7, 95:11-16, 134:21-136:7, 142:11-143:13. To the extent that Mr. Martin manually reviewed or visually inspected aggregation results while developing his method, it is unclear whether he reviewed only results related to those net losers with whom he spoke. See Martin Rpt. ¶¶ 65, 67-68; Martin Tr. 95:11-20, 137:15-25. Mr. Martin has not reported specific observations from his manual reviews.

Reacting to Mr. Martin's reports and testimony, Mr. Dennis noted:

[W]hat struck me . . . was the, the amount of, of trial and error, iterative, manual processes that were guided so directly by, by the net loser population and, really importantly, for which we don't have any documentation or support, right? We don't, we don't know sitting here today how many people he talked to, who he talked to, whether they were large net losers or small net losers. We don't know how that caused a change in, in the aggregation process, whether that verified something or, or negated a step. All we know is he, he talked to people and as a result, it somehow either confirmed or didn't confirm his analysis. And what we heard is that . . . he would run a[n] iteration by looking at the data, then he would look at it again and, and say, at some sample population, we don't know what or how many, and say, "Yeah, this either looks right or it doesn't look right[.]" . . . . "We need to go back to the drawing board," or, "This is probably okay."
And he went through that process 13 times and then . . . he said, "I'm going to stop here not because it's perfect, but because it's law of diminishing returns." Said, "I'm stopping here [be]cause it's diminishing returns," not to say that a 14th step couldn't have gotten, gotten it better. And I think for those reasons I find it subjective . . . . To state that, sitting here today, if you had to start from scratch, he would actually get to those same 13 steps in the same way exactly again, I think would be a challenge.
Dennis Tr. 267:15-268:18.

Mr. Martin has asserted that his 13-step aggregation method is valid because "more than 95%" of claimants who entered certain minimum information in the claims administration process in the main chapter 11 cases "accepted the User Accounts identified by the Aggregation Algorithm." Martin Rpt. ¶ 69; see also Martin Reply Rpt. ¶ 18. Like his use of this reasoning to support his choice of methodology, Mr. Martin's reliance on the claims administration process to support his 13-step process here is flawed. See Dennis Rpt. ¶¶ 62-71. The acceptance rate in the claims process might provide some indication as to whether Mr. Martin's aggregation method seemed to be acceptable to the population of individuals who submitted claims, overwhelmingly net losers and only a subset of them. Mr. Martin has provided nothing, however, to support extrapolating any acceptability of the method in the claims process to a conclusion about the method's validity in aggregating the user accounts of net winners for the purpose of suing them. There is no indication that Mr. Martin considered whether there might be statistically significant differences between the net loser and net winner populations when he was developing his aggregation process in which he consulted a nonstatistical sample of net losers. See Martin Tr. 136:4-7, 142:11-144:8; see also Dennis Rpt. ¶¶ 67, 69; Dennis Tr. 251:18-252:5, 268:22-274:9 (discussing potential for "selection bias" and its impact).

Without the details from the thousands of hours spent through the trial-and-error development of his 13-step process, there is no ability to reasonably assess whether Mr. Martin's decisions-cleaning and standardizing data, choosing which field to start with and focus on, choosing other fields to include, choosing how to combine data from those fields, choosing to use those combinations in 13 steps, and choosing a specific order for those steps-were made in a manner that was inconsistent with standard practice in data analysis and might have impacted net winners. If Mr. Martin had presented such details, then whether he reliably applied relevant principles and methods in reaching his opinion could be assessed. As it stands, I have only Mr. Martin's general descriptions and personal assurances, which are not enough.

I am not suggesting here that Mr. Martin needed to have shown his work at such a granular level as to provide, for example, every calculation made. Mr. Martin was capable of explaining his actions and decisions in much more detail, as he did in explaining how he identified and dealt with the Ympactus user accounts.

IV. Conclusion

The trustee has not shown by a preponderance of the evidence the reliability of his expert's opinion as to the selection and application of his method for aggregating user accounts to determine in these adversary proceedings the identities and gains of the net winners in the TelexFree scheme. Thus, the expert's opinion is inadmissible. A separate order shall enter accordingly in each adversary proceeding.

The defendants also raise arguments beyond simply addressing the reliability of Mr. Martin's selection and application of an aggregation methodology, including arguments that relate to Mr. Martin's assumptions and decisions after the aggregation process was complete, when he set out to calculate the gains and losses (net equity) of each alleged participant. Having determined that the reliability of Mr. Martin's aggregation methodology has not been established and thus his expert opinion cannot be admitted, it is unnecessary to address the defendants' additional arguments, which they may choose to raise in the future, if appropriate.


Summaries of

In re Telexfree, LLC

United States Bankruptcy Court, District of Massachusetts
Jun 22, 2021
No. 14-40987-MSH (Bankr. D. Mass. Jun. 22, 2021)
Case details for

In re Telexfree, LLC

Case Details

Full title:In re: TELEXFREE, LLC, et al. Reorganized Debtors v. BENJAMIN ARGUETA et…

Court:United States Bankruptcy Court, District of Massachusetts

Date published: Jun 22, 2021

Citations

No. 14-40987-MSH (Bankr. D. Mass. Jun. 22, 2021)