Datafaker, an alternative to Production Data

A few days ago, we’ve released our first 1.x version of Datafaker. Datafaker, a modern and more up to date port of Javafaker, is a library to generate fake random data which looks like real data.

What is Datafaker

Datafaker is a Java/Kotlin library. It’s has been created because the original project, Javafaker, was stuck on an older version of Java, and was hardly accepting PRs. (At the moment of writing, there are more than a 100 open PRs for the Javafaker project, and has several long open bugs which haven’t been addressed). Since I didn’t want to let that effort which had been put into those PRs go to waste, I created Datafaker, which has (almost) all the open PRs merged into Datafaker, plus several extra issues fixed.

Datafaker vs Javafaker

Schematically, this is how Datafaker compares to Javafaker at the moment:

DescriptionDatafakerJavafaker
LanguageKotlin/Java 11+Java 6
Providers100+~80
DependenciesVery minimal (since 1.1.0)Guava, Apache Commons, etc
Last release3 Jan 202211 Feb 2020

When to use Datafaker

Datafaker is a great library to use when needing real looking test data. For example, unit tests are a great example, or populating database tables, or generating CSV files. With Datafaker, it’s trivial to generate real looking data, such as credit card numbers, phone numbers, but also medical data, such as ICD-10-CM and ICD-10-PCS data.

Generating data, as an alternative to anonymizing production data, has a few benefits. One of the biggest benefits of generating data is that data will never result into PII (Personally identifiable information) and PHI (Protected Health information) issues, since the data is fake, and doesn’t contain real patients, medical record numbers, or other identifying information.

Another great benefit is that it’s easy to generate an almost unlimited set of data. While its sometimes hard to get a large set of anonymized production data, generating a million or more PII/PHI records can be easily accomplished by using Datafaker.

A last notable feature is that generated data can be localized. So, when you for example need American phone numbers and French phone numbers, Datafaker can easily generate those numbers by passing in the the locale of the desired language or country, and the data will be formatted accordingly.

How to use Datafaker

Using Datafaker is quite straightforward. It can be used from Maven or Gradle by declaring the dependency in the pom.xml or build.gradle. An example can be seen below:

<dependency>
    <groupId>net.datafaker</groupId>
    <artifactId>datafaker</artifactId>
    <version>1.0.0</version>
</dependency>

After importing the library, you can use Datafaker to generate fake data in the following way:

import net.datafaker.Faker;

Faker faker = new Faker();

String name = faker.name().fullName(); // Miss Samanta Schmidt
String firstName = faker.name().firstName(); // Emory
String lastName = faker.name().lastName(); // Barton

String streetAddress = faker.address().streetAddress(); // 60018 Sawayn Brooks Suite 449

This will generate names similar to the names mentioned in the code comments. There are many more options described in the documentation on how to use Datafaker, and there’s an overview of all the fake data providers, which will be updated when more data providers are released.

I hope you find this library useful, or if you find an issue, please let me know in the Datafaker Github issue tracker or leave a comment below.

Older Post
Newer Post

One Reply

Leave a Reply

Your email address will not be published.